Permission denied and org.apache.hadoop.util.DiskChecker$DiskErrorException errors after Kerberising Hadoop cluster

Background Kerberizing a Hadoop cluster enables a properly authorised user to access the cluster without entering of username / password details.  For example (after running a kinit command and starting the beeline JDBC client): beeline>  !connect jdbc:hive2://hdplinux1.company.internal:10000/default;principal=hive/hdplinux1.company.internal@COMPANY.INTERNAL; Connecting to jdbc:hive2://hdplinux1.company.internal:10000/default;principal=hive/hdplinux1.company.internal@COMPANY.INTERNAL; Enter username for jdbc:hive2://hdplinux1.company.internal:10000/default;principal=hive/hdplinux1.company.internal@COMPANY.INTERNAL;: myusername Enter password for jdbc:hive2://hdplinux1.company.internal:10000/default;principal=hive/hdplinux1.company.internal@COMPANY.INTERNAL;: ************ Connected to: Apache Hive (version… Continue reading Permission denied and org.apache.hadoop.util.DiskChecker$DiskErrorException errors after Kerberising Hadoop cluster

Using Azure Blob storage with Hadoop

Cloud providers such as Amazon (AWS) and Microsoft (Azure) provide fault-tolerant distributed storage services which can literally “take the load” off a Hadoop installation, providing some compelling advantages.  In the case of Microsoft Azure’s blob storage, however, this is not without its pitfalls. With the release of Hadoop version 2.7.0 (and vendor packaged versions such… Continue reading Using Azure Blob storage with Hadoop

Sparkling-water – keeping the web UI alive

Spark is a great way to make use of the available RAM on a Hadoop cluster to run fast in-memory analysis and queries, and H2O is a great project for running distributed machine learning algorithms on data stored in Hadoop.  Together they form “Sparkling Water” (Spark + H2O, obviously!). Easy to follow instructions for setting… Continue reading Sparkling-water – keeping the web UI alive

Avoiding “add jar” to load custom SerDe when using Excel or Beeswax on Hortonworks Hadoop

Intro – analysing tweets with Hive Following various tutorial examples online (e.g. Hortonworks – How To Refine and Visualize Sentiment Data and Microsoft – Analyze Twitter data using Hive in HDInsight) it is possible to expose semi structured Twitter feed data in tabular format via Hadoop and Hive.  Once the data is available in Hive… Continue reading Avoiding “add jar” to load custom SerDe when using Excel or Beeswax on Hortonworks Hadoop

Problem starting HBASE master on Hadoop with Cloudera

After formatting the Hadoop HDFS Namenode and trying to restart the Hadoop cluster in Cloudera I encountered thisfatal error on the HBASE master, preventing HBASE from starting at all: Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.TableExistsException: hbase:namespace at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:133) at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:232) at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1069) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:942) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:613) at java.lang.Thread.run(Thread.java:745) After unsuccessfully trying to fix this… Continue reading Problem starting HBASE master on Hadoop with Cloudera