For this demo, home solar PV generation data has been obtained from United Energy’s Energy Easy portal in CSV format. For the sake of convenience in dealing with the raw data which usually comes in half-hourly intervals, this data has been loaded in to a Pentaho data warehouse instance (more details in a later post, perhaps!) and converted to day-by-day figures.
Analysis can help explore whether solar panels are getting less efficient over time, or even determine what a “good” day of production is like in summer vs winter (by looking at the relevant frequency of each in the histogram).
Drag and scroll date region which affects histogram above:
Changeable histogram buckets:
Snap-shotting of one selected date range for visual comparison with another (e.g. summer vs winter comparison):
A frequent challenge for transforming time-series data (e.g. weather, meter data) is changing columns representing multiple times of the day to a single column or in OLAP terms what might generically be described as an “Hour of the Day” or “Interval” dimension.
With the benefit of smart electricity meters it’s possible to obtain hourly data showing household consumption in KWh. I downloaded this dataset for my own house in CSV format from United Energy’s EnergyEasy portal.
With some massaging, the data can be formatted to a structure which which makes aggregation easier. The excellent tool OpenRefine made this task easier, effectively unpivoting half-hourly measures which were in many columns into a single column, so that the data looks like this:
During which hours of the day is the highest average energy consumption? Is this different in summer vs winter? Has this changed from 2012 to 2013?
Has the minimum energy consumption overnight changed? Is the new (and slightly annoying) energy saving power board purchased in mid 2013 doing its job to reduce standby power use?
During which hours of the day is power usage the most variable?
Selectable date range – e.g. to compare a rolling 12 month period. This uses a “context” graphics section in D3.js with brush functionality to trigger realtime recalculation of data in the “focus” section when a user selects a range using their mouse. The live update of the hourly consumption profile means it’s easy to see trends over time in the “focus” area of the screen (shown in the following point):
Plotting of max / min / mean / standard deviation of KWh consumption per hour of the day:
“Snapshotting” of date range – e.g. to compare two consecutive years in an interactive way:
After formatting the Hadoop HDFS Namenode and trying to restart the Hadoop cluster in Cloudera I encountered thisfatal error on the HBASE master, preventing HBASE from starting at all:
Unhandled exception. Starting shutdown.
After unsuccessfully trying to fix this error by removing the /hbase directory on HDFS, I stumbled across the solution to clear the /hbase directory via a the Zookeeper service client:
Connecting to localhost:2181
2015-01-24 02:17:31,535 [myid:] – INFO [main:Environment@100] – Client environment:zookeeper.version=3.4.5-cdh5.3.0–1, built on 12/17/2014 02:46 GMT
2015-01-24 02:17:31,540 [myid:] – INFO [main:Environment@100] – Client environment:host.name=master.hadoopnet
2015-01-24 02:17:31,737 [myid:] – INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] – Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x14b19b8aa4f000b, negotiated timeout = 30000
[zk: localhost:2181(CONNECTED) 0]