Re-pivoting data using OpenRefine’s Columns to Rows feature

Problem

A frequent challenge for transforming time-series data (e.g. weather, meter data) is changing columns representing multiple times of the day to a single column or in OLAP terms what might generically be described as an “Hour of the Day” or “Interval” dimension.

Example input schema:

Date, 00:00, 00:30, 01:00, 01:30, ..., 23:00, 23:30
01-01-2015,0,0,0,1,...,1,0
02-01-2015,0,0,1,2,...,2,1

Example output schema:

Date, Interval, Value
01-01-2015,00:00,0
01-01-2015,00:30,0
...
02-01-2015,23:00,2
02-01-2015,23:30,1

Luckily, this is quite straightforward with a tool designed for dealing with messy tabular (e.g. CSV) data such as the free, web-based tool OpenRefine.

Solution

  1. Open the input CSV file in OpenRefine:

    OpenRefine Demo - Columns to Rows - Before

  2. Right-click one of the interval / time-based columns and select Transpose > Transpose cells across columns into rows:

    OpenRefine Demo - Columns to Rows - Transpose

  3. Select all the interval columns and select a new “one column” called Interval to transpose these into.  Also select the option to prepend “:” to each new cell (this is useful in a subsequent step):

    OpenRefine Demo - Columns to Rows - Columns to Rows

    Measure columns have now become a single column called “Interval” and the number of rows has multiplied greatly:

    OpenRefine Demo - Columns to Rows - Semicolon character

  4. Now, use Edit column > Split into several columns to ensure the Interval name and Value figures are separate:

    OpenRefine Demo - Columns to Rows - Split into columns

    Select “:” as the separator and click OK:
    OpenRefine Demo - Columns to Rows - Split into columns 2

    …and Voilà — after renaming the new column, the data is now in a pivoted form with only a single “Value” column containing the measure for the day and interval:

    OpenRefine Demo - Columns to Rows - After

Advertisement

Visualising energy consumption profile (by hour of day) using D3.js

With the benefit of smart electricity meters it’s possible to obtain hourly data showing household consumption in KWh. I downloaded this dataset for my own house in CSV format from United Energy’s EnergyEasy portal.

With some massaging, the data can be formatted to a structure which which makes aggregation easier.  The excellent tool OpenRefine made this task easier, effectively unpivoting half-hourly measures which were in many columns into a single column, so that the data looks like this:

Day,Interval,Quantity
2012-01-01,0000,0.05
2012-01-01,0030,0.05
2012-01-01,0100,0.044
2012-01-01,0130,0.05
2012-01-01,0200,0.044
[...]
2013-12-31,2130,0.025
2013-12-31,2200,0.019
2013-12-31,2230,0.025
2013-12-31,2300,0.025
2013-12-31,2330,0.025

Using the D3 Javascript visualisation library, it’s possible to create an interactive visualisation which can interrogate this data in arbitrary ways to find patterns and answer basic questions about household energy consumption.  For example:

  1. During which hours of the day is the highest average energy consumption? Is this different in summer vs winter?  Has this changed from 2012 to 2013?
  2. Has the minimum energy consumption overnight changed?  Is the new (and slightly annoying) energy saving power board purchased in mid 2013 doing its job to reduce standby power use?
  3. During which hours of the day is power usage the most variable?

Features

  • Selectable date range – e.g. to compare a rolling 12 month period. This uses a “context” graphics section in D3.js with brush functionality to trigger realtime recalculation of data in the “focus” section when a user selects a range using their mouse.  The live update of the hourly consumption profile means it’s easy to see trends over time in the “focus” area of the screen (shown in the following point):

    D3 - Visualising hourly energy consumption profile - Brush
    D3 selectable time range using “brush” technique
  • Plotting of max / min / mean / standard deviation of KWh consumption per hour of the day:

    D3 - Visualising hourly energy consumption profile - Mean Min Max StdDev
    D3 Mean Max Min and Standard Deviation calculations for each half-hourly time interval of the day
  • “Snapshotting” of date range – e.g. to compare two consecutive years in an interactive way:

    D3 snapshot time comparison
    D3 snapshot time comparison

Demo

Check out a live example here:
http://jsfiddle.net/ugxo00bu/7/