Managing Yarn memory with multiple Hive users

Out of the box (e.g. a standard Hortonworks HDP 2.2 install), Hive does not come configured optimally to manage multiple users running queries simultaneously.  This means it is possible for a single Hive query to use up all available Yarn memory, preventing other users from running a query simultaneously. This high memory consumption can be… Continue reading Managing Yarn memory with multiple Hive users

Sparkling-water – keeping the web UI alive

Spark is a great way to make use of the available RAM on a Hadoop cluster to run fast in-memory analysis and queries, and H2O is a great project for running distributed machine learning algorithms on data stored in Hadoop.  Together they form “Sparkling Water” (Spark + H2O, obviously!). Easy to follow instructions for setting… Continue reading Sparkling-water – keeping the web UI alive

Problem starting HBASE master on Hadoop with Cloudera

After formatting the Hadoop HDFS Namenode and trying to restart the Hadoop cluster in Cloudera I encountered thisfatal error on the HBASE master, preventing HBASE from starting at all: Unhandled exception. Starting shutdown. org.apache.hadoop.hbase.TableExistsException: hbase:namespace at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:133) at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:232) at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1069) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:942) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:613) at java.lang.Thread.run(Thread.java:745) After unsuccessfully trying to fix this… Continue reading Problem starting HBASE master on Hadoop with Cloudera