Useful date formulas for Hive

Hive comes with some handy functions for transforming dates.  These can be helpful when working with date dimension tables and performing time-based comparisons and aggregations. e.g. Convert a native Hive date formatted date string: date_format(myDate,’dd-MM-yyyy’) Return the week number (within the year) of a particular date – i.e. first week of the year is 1,… Continue reading Useful date formulas for Hive

Managing Yarn memory with multiple Hive users

Out of the box (e.g. a standard Hortonworks HDP 2.2 install), Hive does not come configured optimally to manage multiple users running queries simultaneously.  This means it is possible for a single Hive query to use up all available Yarn memory, preventing other users from running a query simultaneously. This high memory consumption can be… Continue reading Managing Yarn memory with multiple Hive users

Visualising energy consumption profile (by hour of day) using D3.js

With the benefit of smart electricity meters it’s possible to obtain hourly data showing household consumption in KWh. I downloaded this dataset for my own house in CSV format from United Energy’s EnergyEasy portal. With some massaging, the data can be formatted to a structure which which makes aggregation easier.  The excellent tool OpenRefine made… Continue reading Visualising energy consumption profile (by hour of day) using D3.js

Using Mondrian’s CurrentDateMember to show current day’s data in MDX

Let’s say we have the following MDX query to show data for a particular date (in this case the quantity measure of the cube Electricity): WITH SET [~ROWS] AS {[Time].[Day].[2014-01-01]} SELECT NON EMPTY {[Measures].[Quantity]} ON COLUMNS, NON EMPTY [~ROWS] ON ROWS FROM [Electricity] Works OK: But what if we want the date to be dynamic,… Continue reading Using Mondrian’s CurrentDateMember to show current day’s data in MDX