<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jhunterj.com &#187; sql</title>
	<atom:link href="http://jhunterj.com/tag/sql/feed/" rel="self" type="application/rss+xml" />
	<link>http://jhunterj.com</link>
	<description>J. Hunter Johnson—I&#039;m just this geek you (should) know.</description>
	<lastBuildDate>Sat, 14 Mar 2015 12:10:39 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=3.9.40</generator>
	<item>
		<title>Transforming mostly static data to date ranges</title>
		<link>http://jhunterj.com/2013/04/06/transforming-mostly-static-data-to-date-ranges/</link>
		<comments>http://jhunterj.com/2013/04/06/transforming-mostly-static-data-to-date-ranges/#comments</comments>
		<pubDate>Sat, 06 Apr 2013 13:59:27 +0000</pubDate>
		<dc:creator><![CDATA[Hunter]]></dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[transact-sql]]></category>

		<guid isPermaLink="false">http://jhunterj.com/?p=288</guid>
		<description><![CDATA[I recently had to transform a set of data from typical date &#38; measurement to a more compact date range &#38; measurement format. The data in this case was very static: as the date incremented, the measurement was much more <a class="more-link" href="http://jhunterj.com/2013/04/06/transforming-mostly-static-data-to-date-ranges/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<div id="attachment_293" style="width: 310px" class="wp-caption alignright"><a href="http://www.flickr.com/photos/husseinabdallah/4035530069/" target="_blank"><img class="size-medium wp-image-293" alt="A cityscape showing flat peaks and troughs" src="http://jhunterj.com/wp-content/uploads/2013/04/FlatPeaksAndTroughs-300x228.jpg" width="300" height="228" /></a><p class="wp-caption-text">For data that has flat plateaus and canyons, like this cityscape. Derived from a <a href="http://creativecommons.org/licenses/by/2.0/" target="_blank">CC-BY-2.0</a> image by abdallah.</p></div>
<p>I recently had to transform a set of data from typical date &amp; measurement to a more compact date range &amp; measurement format. The data in this case was very static: as the date incremented, the measurement was much more likely to be remain the same than it was to change. So storing the starting date and ending date for each measurement takes less space than storing each date&#8217;s measurement separately. Sure, it makes some subsequent  queries more convoluted, but let&#8217;s say that you found this post because you also need a similar transformation.</p>
<p>I stumbled at first by subconsciously assuming that a measurement would not repeat once its range was ended. This assumption works if your measurements never decrease (or if they never increase), say for the total number of copies of a book printed. They&#8217;re printed in batches, and most days no new copies are printed. If your data does meet that criterion, the query is simple (and should be portable from Microsoft Transact-SQL, where I wrote it):</p>
<pre>SELECT d.[group],
       MIN(d.[date]) AS [start_date],
       MAX(d.[date]) AS [end_date],
       d.[measurement]
   FROM mydata d
   GROUP BY d.[group], d.[measurement]
   ORDER BY d.[group], MIN(d.[date])</pre>
<p>So for data like this:</p>
<table border="1">
<tbody>
<tr>
<td>group</td>
<td>date</td>
<td>measurement</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-01</td>
<td>12</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-02</td>
<td>12</td>
</tr>
<tr>
<td style="text-align: center;" colspan="3">…</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-12</td>
<td>12</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-13</td>
<td>18</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-14</td>
<td>18</td>
</tr>
<tr>
<td style="text-align: center;" colspan="3">…</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-31</td>
<td>18</td>
</tr>
</tbody>
</table>
<p>this generates the desired output:</p>
<table border="1">
<tbody>
<tr>
<td>group</td>
<td>start_date</td>
<td>end_date</td>
<td>measurement</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-01</td>
<td>2013-03-12</td>
<td>12</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-13</td>
<td>2013-03-31</td>
<td>18</td>
</tr>
</tbody>
</table>
<p>Unfortunately, my data did not meet this criterion, and my results from that query had overlapping ranges, which was quite incorrect:</p>
<table border="1">
<tbody>
<tr>
<td>group</td>
<td>date</td>
<td>measurement</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-01</td>
<td>12</td>
</tr>
<tr>
<td style="text-align: center;" colspan="3">…</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-07</td>
<td>12</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-08</td>
<td>15</td>
</tr>
<tr>
<td style="text-align: center;" colspan="3">…</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-11</td>
<td>15</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-12</td>
<td>12</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-13</td>
<td>18</td>
</tr>
<tr>
<td style="text-align: center;" colspan="3">…</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-22</td>
<td>18</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-23</td>
<td>21</td>
</tr>
<tr>
<td style="text-align: center;" colspan="3">…</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-27</td>
<td>21</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-28</td>
<td>18</td>
</tr>
<tr>
<td style="text-align: center;" colspan="3">…</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-31</td>
<td>18</td>
</tr>
</tbody>
</table>
<p>yields:</p>
<table border="1">
<tbody>
<tr>
<td>group</td>
<td>start_date</td>
<td>end_date</td>
<td>measurement</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-01</td>
<td>2013-03-12</td>
<td>12</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-08</td>
<td>2013-03-11</td>
<td>15</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-13</td>
<td>2013-03-31</td>
<td>18</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-23</td>
<td>2013-03-27</td>
<td>21</td>
</tr>
</tbody>
</table>
<p>I solved it by counting off my start dates and end dates, and then for each group and measurement placing the first start date with the first end date, the second with the second, and so on. This still seems like overkill; if you know an optimization I missed, please add your comment below!</p>
<p>In Microsoft&#8217;s Transact SQL, counting off like that involves the <a title="OVER Clause (Transact-SQL)" href="http://msdn.microsoft.com/en-us/library/ms189461(v=sql.90).aspx" target="_blank">OVER clause</a> and <a title="CASE (Transact-SQL)" href="http://msdn.microsoft.com/en-us/library/ms181765(v=sql.90).aspx" target="_blank">CASE expressions</a> against some <a title="Using Outer Joins" href="http://msdn.microsoft.com/en-us/library/ms187518(v=sql.90).aspx" target="_blank">LEFT JOINs</a>. I LEFT JOIN the table against itself twice, once on the previous date and once on the subsequent date. Finding the NULLs on those joins (in the OVER + CASE constructs) allows me to count the starts and ends of each block of measurements. I also need a future date to move all of the &#8220;middle&#8221; dates out of order to the end—and they get thrown out by the <code>WHERE COALESCE([start_date], [end_date]) IS NOT NULL</code> part later.</p>
<pre>/* any date well after all of the dates in the database will do */
DECLARE @futuredate DATE = '2100-01-01';

SELECT t.[group],
       MIN(t.[start_date]) AS [start_date],
       MIN(t.[end_date]) AS [end_date],
       t.[measurement]
   FROM (SELECT d.[group],
                CASE 
                   WHEN d2.[date] IS NULL THEN 
                      ROW_NUMBER()
                         OVER (PARTITION BY d.[group]
                               ORDER BY CASE
                                           WHEN d2.[date] IS NULL THEN d.[date]
                                           ELSE @futuredate
                                        END)
                   ELSE NULL
                END AS start_seq,
                CASE 
                   WHEN d3.[date] IS NULL THEN 
                      ROW_NUMBER()
                         OVER (PARTITION BY d.[group]
                               ORDER BY CASE
                                           WHEN d3.[date] IS NULL THEN d.[date]
                                           ELSE @futuredate
                                        END)
                   ELSE NULL
                END AS end_seq,
                CASE
                   WHEN d2.[date] IS NULL THEN d.[date]
                   ELSE NULL
                END AS [start_date],
                CASE
                   WHEN d3.[date] IS NULL THEN d.[date]
                   ELSE NULL
                END AS [end_date],
                d.[measurement]

            FROM mydata d
               LEFT JOIN mydata d2 ON d2.[group] = d.[group]
                                      AND d2.[date] = DATEADD(DD, -1, d.[date])
                                      AND d2.[measurement] = d.[measurement]
               LEFT JOIN mydata d3 ON d3.[group] = d.[group]
                                      AND d3.[date] = DATEADD(DD, 1, d.[date])
                                      AND d3.[measurement] = d.[measurement]

        ) t
   WHERE COALESCE([start_date], [end_date]) IS NOT NULL
   GROUP BY [group], COALESCE([start_seq], [end_seq]), [measurement]
   ORDER BY [group], [start_date]</pre>
<p>which gives me the correct results:</p>
<table border="1">
<tbody>
<tr>
<td>group</td>
<td>start_date</td>
<td>end_date</td>
<td>measurement</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-01</td>
<td>2013-03-07</td>
<td>12</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-08</td>
<td>2013-03-11</td>
<td>15</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-12</td>
<td>2013-03-12</td>
<td>12</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-13</td>
<td>2013-03-22</td>
<td>18</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-23</td>
<td>2013-03-27</td>
<td>21</td>
</tr>
<tr>
<td>Alpha</td>
<td>2013-03-28</td>
<td>2013-03-31</td>
<td>18</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-01</td>
<td>2013-03-12</td>
<td>12</td>
</tr>
<tr>
<td>Beta</td>
<td>2013-03-13</td>
<td>2013-03-31</td>
<td>18</td>
</tr>
</tbody>
</table>
<p>I&#8217;m still making an assumption, that a measurement was stored in the data for every date. If the measurements are not taken that rigorously, you will need to accommodate that gap with a generated table of the appropriate dates to join against. If you&#8217;re up against that, let me know and I&#8217;ll do a follow-up post.</p>
<p style="text-align: right;">—jhunterj</p>
]]></content:encoded>
			<wfw:commentRss>http://jhunterj.com/2013/04/06/transforming-mostly-static-data-to-date-ranges/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Decoding SQL&#8217;s Decode</title>
		<link>http://jhunterj.com/2013/02/05/decoding-sqls-decode/</link>
		<comments>http://jhunterj.com/2013/02/05/decoding-sqls-decode/#comments</comments>
		<pubDate>Tue, 05 Feb 2013 12:32:29 +0000</pubDate>
		<dc:creator><![CDATA[Hunter]]></dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[decode]]></category>
		<category><![CDATA[oracle]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://jhunterj.com/?p=205</guid>
		<description><![CDATA[I came across a SQL function I was not familiar with: decode. I looked it up and immediately replaced it with a CASE statement (in addition to other code cleanup). I&#8217;m afraid I don&#8217;t understand the existence of Oracle SQL&#8217;s <a class="more-link" href="http://jhunterj.com/2013/02/05/decoding-sqls-decode/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I came across a SQL function I was not familiar with: decode. I looked it up and immediately replaced it with a CASE statement (in addition to other code cleanup). I&#8217;m afraid I don&#8217;t understand the existence of Oracle SQL&#8217;s decode() function:</p>
<pre>decode(expression,
       search, result
       [, search , result]...
       [, default]
      )</pre>
<p>seems functionally equivalent to</p>
<pre>CASE expression 
   WHEN search THEN result
   [WHEN search THEN result]...
   [ELSE default]
END</pre>
<p><div style="width: 260px" class="wp-caption alignright"><a href="http://commons.wikimedia.org/wiki/File:Gorilla_Scratching_Head.jpg"><img alt="Gorilla Scratching Head" src="http://upload.wikimedia.org/wikipedia/commons/0/08/Gorilla_Scratching_Head.jpg" width="250" /></a><p class="wp-caption-text">By Steven Straiton (originally posted to Flickr as Gorilla) [<a href="http://creativecommons.org/licenses/by/2.0">CC-BY-2.0</a>], via Wikimedia Commons</p></div>except the CASE version</p>
<ul>
<li>has no limitation of &#8220;only&#8221; 255 total expression + search + result + default parameters</li>
<li>is easier to read</li>
<li>works in PL/SQL context</li>
<li>is ANSI-compliant</li>
</ul>
<p>The only &#8220;benefit&#8221; to decode I could find is that the decode function will attempt to convert all of the results to the type of the first result, while CASE just errors if you mix types. To me, that benefit would simply encourage sloppy coding and keep you from noticing buggy code as quickly.</p>
<p>So, what&#8217;s the point? Are their coders out there who &#8220;get&#8221; the function syntax more readily than the CASE syntax, so it helps their coding efficiency? Or is there a benefit I&#8217;m missing?</p>
<p style="text-align: right;">—jhunterj</p>
]]></content:encoded>
			<wfw:commentRss>http://jhunterj.com/2013/02/05/decoding-sqls-decode/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
