Re-pivoting data using OpenRefine’s Columns to Rows feature

Problem

A frequent challenge for transforming time-series data (e.g. weather, meter data) is changing columns representing multiple times of the day to a single column or in OLAP terms what might generically be described as an “Hour of the Day” or “Interval” dimension.

Example input schema:

Date, 00:00, 00:30, 01:00, 01:30, ..., 23:00, 23:30
01-01-2015,0,0,0,1,...,1,0
02-01-2015,0,0,1,2,...,2,1

Example output schema:

Date, Interval, Value
01-01-2015,00:00,0
01-01-2015,00:30,0
...
02-01-2015,23:00,2
02-01-2015,23:30,1

Luckily, this is quite straightforward with a tool designed for dealing with messy tabular (e.g. CSV) data such as the free, web-based tool OpenRefine.

Solution

  1. Open the input CSV file in OpenRefine:

    OpenRefine Demo - Columns to Rows - Before

  2. Right-click one of the interval / time-based columns and select Transpose > Transpose cells across columns into rows:

    OpenRefine Demo - Columns to Rows - Transpose

  3. Select all the interval columns and select a new “one column” called Interval to transpose these into.¬† Also select the option to prepend “:” to each new cell (this is useful in a subsequent step):

    OpenRefine Demo - Columns to Rows - Columns to Rows

    Measure columns have now become a single column called “Interval” and the number of rows has multiplied greatly:

    OpenRefine Demo - Columns to Rows - Semicolon character

  4. Now, use Edit column > Split into several columns to ensure the Interval name and Value figures are separate:

    OpenRefine Demo - Columns to Rows - Split into columns

    Select “:” as the separator and click OK:
    OpenRefine Demo - Columns to Rows - Split into columns 2

    …and Voil√† — after renaming the new column, the data is now in a pivoted form with only a single “Value” column containing the measure for the day and interval:

    OpenRefine Demo - Columns to Rows - After