A frequent challenge for transforming time-series data (e.g. weather, meter data) is changing columns representing multiple times of the day to a single column or in OLAP terms what might generically be described as an “Hour of the Day” or “Interval” dimension.
Example input schema:
Date, 00:00, 00:30, 01:00, 01:30, ..., 23:00, 23:30 01-01-2015,0,0,0,1,...,1,0 02-01-2015,0,0,1,2,...,2,1
Example output schema:
Date, Interval, Value 01-01-2015,00:00,0 01-01-2015,00:30,0 ... 02-01-2015,23:00,2 02-01-2015,23:30,1
Luckily, this is quite straightforward with a tool designed for dealing with messy tabular (e.g. CSV) data such as the free, web-based tool OpenRefine.
- Open the input CSV file in OpenRefine:
- Right-click one of the interval / time-based columns and select Transpose > Transpose cells across columns into rows:
- Select all the interval columns and select a new “one column” called Interval to transpose these into. Also select the option to prepend “:” to each new cell (this is useful in a subsequent step):
Measure columns have now become a single column called “Interval” and the number of rows has multiplied greatly:
- Now, use Edit column > Split into several columns to ensure the Interval name and Value figures are separate:
…and Voilà — after renaming the new column, the data is now in a pivoted form with only a single “Value” column containing the measure for the day and interval: