Avoiding “add jar” to load custom SerDe when using Excel or Beeswax on Hortonworks Hadoop

Intro – analysing tweets with Hive

Following various tutorial examples online (e.g. Hortonworks – How To Refine and Visualize Sentiment Data and Microsoft – Analyze Twitter data using Hive in HDInsight) it is possible to expose semi structured Twitter feed data in tabular format via Hadoop and Hive.  Once the data is available in Hive it is possible to visualise this via Excel and the Power BI toolset.

Even after getting this data from Twitter’s stream API or a small sample from Twitter’s API testing development console, a custom SerDe is required for Hive to be able to read an external table based on JSON files.  This SerDe takes the form of a Jar file which must be made available whenever the hive query is executed.  Usually via this command:

add jar json-serde-1.3-jar-with-dependencies.jar;

select * from vw_tweets […]

PS – The jar in question in Roberto Congiu’s JSON SerDe (binaries available here, and CDH5 version appears to be compatible with Hortonworks).

The Problem?

In the case of client tools such as Hive and Beeswax, it’s necessary to execute the “add jar” command before querying any tables based on the custom SerDe, otherwise, errors such as this occur:

hive> select * from vw_tweets_hashtagsPerDay limit 2;

FAILED: RuntimeException MetaException(message:java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found)

Similarly, errors such as this occur when using the Hortonworks Hive ODBC driver in Excel 2013 with Powerquery, and it is not otherwise possible to manually load the JAR in the Excel ODBC connection query:

DataSource.Error: ODBC: ERROR [HY000] [Hortonworks][HiveODBC] (35) Error from Hive: error code: ‘40000’ error message: ‘Error while compiling statement: FAILED: RuntimeException MetaException(message:java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found)’.
Details:
Message=ERROR [HY000] [Hortonworks][HiveODBC] (35) Error from Hive: error code: ‘40000’ error message: ‘Error while compiling statement: FAILED: RuntimeException MetaException(message:java.lang.ClassNotFoundException Class org.openx.data.jsonserde.JsonSerDe not found)’.
ErrorCode=-2146232009

The solution

It is possible to fix the above issues by loading the SerDe JAR file to a location on the local filesystem of the Hadoop node running the Hiveserver2, e.g.:

/home/hive/json-serde-1.3-jar-with-dependencies.jar

Two config changes must then be made to the cluster:

  1. Add this line to the end of hive-env.sh:
    export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH}:/home/hive/json-serde-1.3-jar-with-dependencies.jar

    hive-env

  2. Add this config to the custom hive-site.xml file:

    hive.aux.jars.path = file:///home/hive/json-serde-1.3-jar-with-dependencies.jar
    hive-aux-jars-path

  3. Restart Hive components.

Finally – confirm ODBC is working by querying your Hive table / view in Excel or Beeswax!  It should work without first running the “add jar” command, because the custom SerDe responsible for reading the JSON files is now loaded automatically when a session is opened.

Advertisements

2 thoughts on “Avoiding “add jar” to load custom SerDe when using Excel or Beeswax on Hortonworks Hadoop

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s