Visualising Solar Generation Data in a Custom Histogram using D3.js

Using the “brush” feature of the D3 Javascript library again proves handy for creating an interactive, animated histogram.  This type of visualisation helps to analyse and explore the distribution of time-series data.

For this demo, home solar PV generation data has been obtained from United Energy’s Energy Easy portal in CSV format.  For the sake of convenience in dealing with the raw data which usually comes in half-hourly intervals, this data has been loaded in to a Pentaho data warehouse instance (more details in a later post, perhaps!) and converted to day-by-day figures.

D3 - Interactive Histogram - Preparing Solar PV generation data with Saiku
Preparing Solar PV generation data with Saiku

Analysis can help explore whether solar panels are getting less efficient over time, or even determine what a “good” day of production is like in summer vs winter (by looking at the relevant frequency of each in the histogram).


  • Drag and scroll date region which affects histogram above:
    D3 - Interactive Histogram - Selectable Date Range
  • Changeable histogram buckets:
    D3 - Interactive Histogram - Adjustable Buckets
  • Snap-shotting of one selected date range for visual comparison with another (e.g. summer vs winter comparison):
    D3 - Interactive Histogram - Summer vs Winter Comparison

Key techniques – how it’s achieved:

  1. Data is embedded into the HTML page:

    <pre id=”csvdata”>


    It’s hidden visually using the CSS display:none directive…

    #csvdata {
    display: none;

    …and then read into a Javascript variable for display and computation via D3.

    var raw =“#csvdata”).text();
    var dataset_copy = d3.csv.parse(raw);

  2. Maximum value is found in the selected region of the dataset:

    max = d3.max(dataset, function (d) {    return +d.Quantity;});

    …histogram is created based on maximum value:

    // Generate a histogram using uniformly-spaced bins.
    datasetHist = d3.layout.histogram()

    .range([0, max])
    ( (d) {

    return d.Quantity;

  3. An HTML form input slider is used for selection of number of buckets:

        240px; text-align: right; font-size: 10px;font-family: sans-serif;”>Number of buckets: 10


    Value of slider is passed into Javascript (to variable “buckets”) via this code:“#nBuckets”).on(“input”, function () {
    buckets = +this.value;“#nBuckets-value”).text(+this.value);

Demo / code:

Try out the demo and check out the code here:

Re-pivoting data using OpenRefine’s Columns to Rows feature


A frequent challenge for transforming time-series data (e.g. weather, meter data) is changing columns representing multiple times of the day to a single column or in OLAP terms what might generically be described as an “Hour of the Day” or “Interval” dimension.

Example input schema:

Date, 00:00, 00:30, 01:00, 01:30, ..., 23:00, 23:30

Example output schema:

Date, Interval, Value

Luckily, this is quite straightforward with a tool designed for dealing with messy tabular (e.g. CSV) data such as the free, web-based tool OpenRefine.


  1. Open the input CSV file in OpenRefine:

    OpenRefine Demo - Columns to Rows - Before

  2. Right-click one of the interval / time-based columns and select Transpose > Transpose cells across columns into rows:

    OpenRefine Demo - Columns to Rows - Transpose

  3. Select all the interval columns and select a new “one column” called Interval to transpose these into.  Also select the option to prepend “:” to each new cell (this is useful in a subsequent step):

    OpenRefine Demo - Columns to Rows - Columns to Rows

    Measure columns have now become a single column called “Interval” and the number of rows has multiplied greatly:

    OpenRefine Demo - Columns to Rows - Semicolon character

  4. Now, use Edit column > Split into several columns to ensure the Interval name and Value figures are separate:

    OpenRefine Demo - Columns to Rows - Split into columns

    Select “:” as the separator and click OK:
    OpenRefine Demo - Columns to Rows - Split into columns 2

    …and Voilà — after renaming the new column, the data is now in a pivoted form with only a single “Value” column containing the measure for the day and interval:

    OpenRefine Demo - Columns to Rows - After

Visualising energy consumption profile (by hour of day) using D3.js

With the benefit of smart electricity meters it’s possible to obtain hourly data showing household consumption in KWh. I downloaded this dataset for my own house in CSV format from United Energy’s EnergyEasy portal.

With some massaging, the data can be formatted to a structure which which makes aggregation easier.  The excellent tool OpenRefine made this task easier, effectively unpivoting half-hourly measures which were in many columns into a single column, so that the data looks like this:


Using the D3 Javascript visualisation library, it’s possible to create an interactive visualisation which can interrogate this data in arbitrary ways to find patterns and answer basic questions about household energy consumption.  For example:

  1. During which hours of the day is the highest average energy consumption? Is this different in summer vs winter?  Has this changed from 2012 to 2013?
  2. Has the minimum energy consumption overnight changed?  Is the new (and slightly annoying) energy saving power board purchased in mid 2013 doing its job to reduce standby power use?
  3. During which hours of the day is power usage the most variable?


  • Selectable date range – e.g. to compare a rolling 12 month period. This uses a “context” graphics section in D3.js with brush functionality to trigger realtime recalculation of data in the “focus” section when a user selects a range using their mouse.  The live update of the hourly consumption profile means it’s easy to see trends over time in the “focus” area of the screen (shown in the following point):

    D3 - Visualising hourly energy consumption profile - Brush
    D3 selectable time range using “brush” technique
  • Plotting of max / min / mean / standard deviation of KWh consumption per hour of the day:

    D3 - Visualising hourly energy consumption profile - Mean Min Max StdDev
    D3 Mean Max Min and Standard Deviation calculations for each half-hourly time interval of the day
  • “Snapshotting” of date range – e.g. to compare two consecutive years in an interactive way:

    D3 snapshot time comparison
    D3 snapshot time comparison


Check out a live example here:

Problem starting HBASE master on Hadoop with Cloudera

After formatting the Hadoop HDFS Namenode and trying to restart the Hadoop cluster in Cloudera I encountered thisfatal error on the HBASE master, preventing HBASE from starting at all:

Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(
at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(
at org.apache.hadoop.hbase.master.TableNamespaceManager.start(
at org.apache.hadoop.hbase.master.HMaster.initNamespace(
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(

After unsuccessfully trying to fix this error by removing the /hbase directory on HDFS, I stumbled across the solution to clear the /hbase directory via a the Zookeeper service client:

azureuser@master:/$ zookeeper-client
Connecting to localhost:2181
2015-01-24 02:17:31,535 [myid:] – INFO  [main:Environment@100] – Client environment:zookeeper.version=3.4.5-cdh5.3.0–1, built on 12/17/2014 02:46 GMT
2015-01-24 02:17:31,540 [myid:] – INFO  [main:Environment@100] – Client
2015-01-24 02:17:31,737 [myid:] – INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] – Session establishment complete on server localhost/, sessionid = 0x14b19b8aa4f000b, negotiated timeout = 30000
[zk: localhost:2181(CONNECTED) 0]

WatchedEvent state:SyncConnected type:None path:null

[zk: localhost:2181(CONNECTED) 1] rmr /hbase
[zk: localhost:2181(CONNECTED) 2] quit
2015-01-24 02:17:56,579 [myid:] – INFO  [main:ZooKeeper@684] – Session: 0x14b19b8aa4f000b closed
2015-01-24 02:17:56,579 [myid:] – INFO  [main-EventThread:ClientCnxn$EventThread@512] – EventThread shut down

Which fixed the issue! Restarting HBASE via Cloudera now succeeds without the above error.

Seems like Zookeeper has a filesystem-like concept called znodes: which is not to be confused with HDFS in a Hadoop cluster.

Using Mondrian’s CurrentDateMember to show current day’s data in MDX

Let’s say we have the following MDX query to show data for a particular date (in this case the quantity measure of the cube Electricity):

SET [~ROWS] AS {[Time].[Day].[2014-01-01]}
NON EMPTY {[Measures].[Quantity]} ON COLUMNS,
FROM [Electricity]

Works OK:

Saiku MDX - Current Date Member 1

But what if we want the date to be dynamic, reflecting today’s date?

We can change the MDX to use Mondiran’s CurrentDateMember function:

SET [~ROWS] AS {CurrentDateMember([Time], """[Time].[Day]""\.[yyyy-mm-dd]")}
NON EMPTY {[Measures].[Quantity]} ON COLUMNS,
FROM [Electricity]

Now the date selected is today’s date (Note: date members in the time hierarchy are in the format of yyyy-mm-dd):

Saiku MDX - Current Date Member 2

Extra tip – rolling date range up to current day
We can also change the MDX query to select a rolling 365-day date range (ending with the current day):

SET [~ROWS] AS {CurrentDateMember([Time], """[Time].[Day]""\.[yyyy-mm-dd]").Lag(356):CurrentDateMember([Time], """[Time].[Day]""\.[yyyy-mm-dd]")}

Now a rolling 365-day date range is shown:

Saiku MDX - Current Date Member 3