Creating a virtual solar PV plug for EV charging – Part 2

In Part 1 we explored the idea of using a Smart Plug as well as home solar monitoring to save money when charging a plug in hybrid car.

This post details a technical approach so the plug only turns on when excess solar is available.

The code

See here for the code on GitHub:

https://github.com/niftimus/SmartPlugAutomate

Notes:

  • The code is experimental and proof of concept only – it has not been fully tested
  • The code runs as a Linux service
  • It features a web UI
  • It checks home energy consumption and decides whether to turn the plug on or off based on a threshold

The logic

For each check interval the code checks the current state of the plug and decides whether to:

  • Do nothing
  • Leave on
  • Leave off
  • Turn on
  • Turn off

Here’s a flowchart showing the decision-making process:

The Web UI

The features

  • Ability to disable / enable automatic control
    • This is useful where the plug needs to be manually controlled via its physical button
  • Configurable Min power threshold
    • This is useful where it’s acceptable to use some grid power as well as solar (e.g. partly cloudy weekends with cheaper electricity rates)
  • Minimum on / off buffer periods to reduce switching (e.g. for devices which do not benefit from being powered on and off continually)
  • Monitoring messages to see how many times the switch has been controlled and its last state
  • Overall net ( W )
    • Useful for seeing current net household energy consumption
  • Automatic recovery if the plug, solar monitoring API or Wifi network goes offline temporarily

The result

So far this solution works great.

On a partially cloudy day, the plug automatically turns on or off once excess solar drops below the min power threshold. Similarly, the plug will turn off when household consumption is high – for example, during the heating cycle of a washing machine / dishwasher or when an electric kettle is used.

We got an interesting email from our electricity retailer after setting up this solution:

Solar health status email from electricity retailer. This shows the solution is working in increasing self-consumption.
Email from our electricity retailer

The message indicates we have successfully boosted our self-consumption – i.e. more solar energy is being self-consumed rather than being exported to the grid, giving the appearance to the retailer that the solar PV system is underperforming. Success!

Conclusion

This is not quite as good as having a home battery or a dedicated (and much more refined) device like the Zappi, however it comes close. It is a great way to boost self-consumption of excess solar PV energy using software and a low-cost smart plug. With around a year of weekly charging, this solution can pay for the cost of the smart plug by reducing the effective cost of electricity.

Useful queries for the Hive metastore

Hive metastore tables

The Hive metastore stores metadata about objects within Hive.  Usually this metastore sits within a relational database such as MySQL.

Sometimes it’s useful to query the Hive metastore directly to find out what databases, tables and views exist in Hive and how they’re defined. For example, say we want to expose a report to users about how many Hive tables are currently in a Hadoop cluster.  Or perhaps we want to run a script which performs some bulk operation on all tables in a particular Hive database.

Luckily, it’s easy to query the metastore using a tool such as MySQL Workbench using appropriate connectors – e.g. MySQL JDBC drivers.

Here’s a rough database diagram showing how the Hive metastore hangs together:

Hive metastore database diagram (ERD)
Hive metastore database diagram (from HDP 2.3, click here for full screen)

Handy metastore SQL queries

Show all Hive databases

SELECT * FROM hive.DBS;

Output:

DB_ID DESC DB_LOCATION_URI NAME OWNER_NAME OWNER_TYPE
1 Default Hive database hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse default public ROLE
6 NULL hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/xademo.db xademo hive USER

 List tables in a given database

SELECT t.* FROM hive.TBLS t
 JOIN hive.DBS d
 ON t.DB_ID = d.DB_ID
 WHERE d.NAME = 'default';

Output:

TBL_ID CREATE_TIME DB_ID LAST_ACCESS_TIME OWNER RETENTION SD_ID TBL_NAME TBL_TYPE VIEW_EXPANDED_TEXT VIEW_ORIGINAL_TEXT LINK_TARGET_ID
1 1439988377 1 0 hue 0 1 sample_07 MANAGED_TABLE NULL NULL NULL
2 1439988387 1 0 hue 0 2 sample_08 MANAGED_TABLE NULL NULL NULL

Show the storage location of a given table

SELECT s.* FROM hive.TBLS t
JOIN hive.DBS d
ON t.DB_ID = d.DB_ID
JOIN hive.SDS s
ON t.SD_ID = s.SD_ID
WHERE TBL_NAME = 'sample_07'
AND d.NAME='default';

Output:

SD_ID CD_ID INPUT_FORMAT IS_COMPRESSED IS_STOREDASSUBDIRECTORIES LOCATION NUM_BUCKETS OUTPUT_FORMAT SERDE_ID
1 1 org.apache.hadoop.mapred.TextInputFormat 0 0 hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/sample_07 -1 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 1

Find out how a given view has been defined

SELECT t.* FROM hive.TBLS t
JOIN hive.DBS d
ON t.DB_ID = d.DB_ID
WHERE TBL_NAME = 'vw_sample_07'
AND d.NAME='default';

Output:

TBL_ID CREATE_TIME DB_ID LAST_ACCESS_TIME OWNER RETENTION SD_ID TBL_NAME TBL_TYPE VIEW_EXPANDED_TEXT VIEW_ORIGINAL_TEXT LINK_TARGET_ID
31 1471788438 1 0 hue 0 31 vw_sample_07 VIRTUAL_VIEW select count(*) from `default`.`sample_07` select count(*) from default.sample_07 NULL

Get column names, types and comments of a given table

SELECT c.* FROM hive.TBLS t
 JOIN hive.DBS d
 ON t.DB_ID = d.DB_ID
 JOIN hive.SDS s
 ON t.SD_ID = s.SD_ID
 JOIN hive.COLUMNS_V2 c
 ON s.CD_ID = c.CD_ID
 WHERE TBL_NAME = 'sample_07'
 AND d.NAME='default'
 ORDER by INTEGER_IDX;

Output:

CD_ID COMMENT COLUMN_NAME TYPE_NAME INTEGER_IDX
1 NULL code string 0
1 NULL description string 1
1 NULL total_emp int 2
1 NULL salary int 3

Conclusion

It’s possible to query metadata from the Hive metastore which can be handy for understanding what data is available in a Hive instance.  It’s also possible to edit this information too, although this would usually be inadvisable as the schema of the metastore may be subject to change between different Hive versions, and the results of modifying Hive internals could be unexpected at best, and catastrophic at worst.

Creating a custom Hive input format and record reader to read fixed format flat files

Apache Hive is great for enabling SQL-like queryability over flat files.  This is trivial in the case of tabular formatted files such as CSV files where we can set custom row and field delimiters out-of-the-box (e.g. Hive’s inbuilt CSV serde).  Even more complex files can be read and converted to a desired row and column format using Hive’s regexp_extract() function, so long as we can operate on a single row at a time.

The problem

What if the rows we want in Hive aren’t rows in the input files?  That is, we need to read the file as a whole and decode it to produce the output we want to see in Hive.

An example is the Australian Bureau of Meteorology’s ASCII Grid format.  These files are fixed file formats with a header section which effectively describes how to read the file.  In the data section, each data row corresponds to a row of latitude on a map (with starting coordinates identified in the header) and similarly each column defines a line of longitude.  Read as a whole, the file contains a grid of readings of particular weather observations – e.g. rainfall for a given time period:

BOM-grid-data
Example decoding of ASCII grid format file

To read this data in Hive it might be possible to define a table which hard-codes column values to their corresponding longitude, but this leaves the problem of reading simlarly formatted files with a different geographical granularity or different starting position on the globe.  Similarly, we may struggle at the Hive query language layer to determine the appropriate latitude of a given data row in the file.  This is because the header contains the required metadata as to which row in the file corresponds to a certain latitude.

To make the grid data easier to consume in Hive we may wish to transform files into a format such as this:

BOM-grid-data-desired_tabular_format
Geospatial data – desired tabular format

This means we can query a file by filtering on particular lat / long combinations.  One way to transform the file into this format is via creating custom Hive InputFormat and Record Reader Java classes that we can use at query time.

InputFormat / RecordReader vs SerDe

A key distinction when creating custom classes to use with Hive is the following:

  • InputFormat and RecordReader – takes files as input – generates rows
  • SerDe – takes rows as input – generates columns

Here, ASCII grid formatted files cannot be de-serialised row-by-row because there is important information in the header about what each row contains (i.e. the latitude of a given row is dependent on its position in the file and also information in the header), so a SerDe is likely not the best option.  Instead, an InputFormat Java class can be written to convert the input ASCII grid formatted files into the desired tabular format above, making it possibe to query via Hive via arbitrary lat / long coordinates.

Creating a custom InputFormat

An InputFormat compatible with Hive can be created by creating classes which implement and extend standard mapred library classes:

  1. CustomTextInputFormat.java –  extends FixedLengthInputFormat.  Returns a CustomTextRecordReader which plugs in to Hive at runtime behind the scenes.
  2. CustomTextRecordReader.java – implements mapred RecordReader<LongWritable, BytesWritable>.  Reads and decompresses (if required) files off the Hadoop filesystem. Calls ReadASCIIGridFile to do the actual transformation.
  3. ReadASCIIGridFile.java – contains a static class which does the transformation from input (a byte array – ASCII grid formatted) to output (a byte array – Hive row format)

Notes:

  • Code uses the mapred rather than mapreduce API of Hadoop, as Hive only supports mapred style InputFormat objects
  • CustomTextInputFormat.java sets all files to be non-splittable.  This is done because a file must be read in full with its header to properly convert to the target format.
  • The CustomTextRecordReader copes with compressed input files by utilising the org.apache.hadoop.io.compress.CompressionCodec class to decompress any input files which are compressed.  This is advantageous for ASCII grid formatted files which lend themselves well to compression (e.g. via GZIP) before being uploaded to HDFS.

Compiling the custom InputFormat

Copy text out of the above files and save to a folder on the filesystem.  Build these and then and compile into a JAR file (note – a Java JDK must be installed to run the compilation, preferably the same version as the target Hadoop system is running):

cd </path/where/java/files/are/stored>
mkdir build
javac -d ./build/ -cp "/usr/hdp/2.3.0.0-2557/hadoop/lib/*:/usr/hdp/2.3.0.0-2557/hadoop-mapreduce/*" *.java
cd build
jar cvf CustomTextInputFormat.jar *
cp CustomTextInputFormat.jar /tmp/

Note – the classpath in the javac command assumes that necessary Hadoop library jar files are installed  in certain locations.  The locations mentioned are for the Hortonworks HDP 2.3.0 Sandbox VM, but can be changed to suit other versions / distributions.

Using the custom InputFormat with Hive

Run hive at the command line.

In the Hive session, add the newly created InputFormat JAR:

hive> add jar /tmp/CustomTextInputFormat.jar;
Added [/tmp/CustomTextInputFormat.jar] to class path
Added resources: [/tmp/CustomTextInputFormat.jar]

Create an external table on an HDFS directory containing ASCII grid formatted files:

hive> create external table default.test_ascii
(
lat1 float, long1 float, lat2 float, long2 float, measurement float
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT 'com.analyticsanvil.custominputformat.CustomTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/tmp/'; 
OK
Time taken: 13.192 seconds

Run a test query on the external table:

hive> select * from default.test_ascii limit 10;
OK
-43.975002      112.025 -43.925007      112.075 -999.0
-43.975002      112.075 -43.925007      112.12499       -999.0
-43.975002      112.12499       -43.925007      112.17499       -999.0
-43.975002      112.17499       -43.925007      112.22498       -999.0
-43.975002      112.22498       -43.925007      112.27498       -999.0
-43.975002      112.27498       -43.925007      112.324974      -999.0
-43.975002      112.324974      -43.925007      112.37497       -999.0
-43.975002      112.37497       -43.925007      112.424965      -999.0
-43.975002      112.424965      -43.925007      112.47496       -999.0
-43.975002      112.47496       -43.925007      112.524956      -999.0
Time taken: 3.94 seconds, Fetched: 10 row(s)

The query returns data in the desired tabular format.

Conclusion

By creating an InputFormat Java class which reads and transforms fixed format files at the time of Hive querying, we can effectively convert data into forms better which are better suited to analytical purposes.

Similarly, writing a custom input format allows almost any data to be read by invoking a custom Java class on each mapper, translating the input into tabular format for use in Hive.  In the above example ReadASCIIGridFile.java converts an ASCII grid formatted file to a long list of lat / long combinations and readings, but equally, a new Java class could be coded to read more exotic forms of input data – e.g. MP3 audio files, JPEGs or other types of binary file.  So long as a developer knows how to code the translation in Java, input formats can be converted into Hive queryable tabular data on the fly and in parallel (e.g. MP3 files with timestamps and spectral / frequency analysis).

It’s worth noting, further improvements can be made to the Java code above.  For example – more efficient parsing of input files (currently using regular expressions), better error checking, memory utilisation and a mechanism to combine input files for a given input split to improve performance.