Creating a custom Hive input format and record reader to read fixed format flat files

Apache Hive is great for enabling SQL-like queryability over flat files.  This is trivial in the case of tabular formatted files such as CSV files where we can set custom row and field delimiters out-of-the-box (e.g. Hive’s inbuilt CSV serde).  Even more complex files can be read and converted to a desired row and column… Continue reading Creating a custom Hive input format and record reader to read fixed format flat files

Avoiding “add jar” to load custom SerDe when using Excel or Beeswax on Hortonworks Hadoop

Intro – analysing tweets with Hive Following various tutorial examples online (e.g. Hortonworks – How To Refine and Visualize Sentiment Data and Microsoft – Analyze Twitter data using Hive in HDInsight) it is possible to expose semi structured Twitter feed data in tabular format via Hadoop and Hive.  Once the data is available in Hive… Continue reading Avoiding “add jar” to load custom SerDe when using Excel or Beeswax on Hortonworks Hadoop