Useful queries for the Hive metastore

The Hive metastore stores metadata about objects within Hive.  Usually this metastore sits within a relational database such as MySQL.

Sometimes it’s useful to query the Hive metastore directly to find out what databases, tables and views exist in Hive and how they’re defined. For example, say we want to expose a report to users about how many Hive tables are currently in a Hadoop cluster.  Or perhaps we want to run a script which performs some bulk operation on all tables in a particular Hive database.

Luckily, it’s easy to query the metastore using a tool such as MySQL Workbench using appropriate connectors – e.g. MySQL JDBC drivers.

Here’s a rough database diagram showing how the Hive metastore hangs together:

Hive metastore database diagram (ERD)
Hive metastore database diagram (from HDP 2.3, click here for full screen)

Handy metastore SQL queries

Show all Hive databases

SELECT * FROM hive.DBS;

Output:

DB_ID DESC DB_LOCATION_URI NAME OWNER_NAME OWNER_TYPE
1 Default Hive database hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse default public ROLE
6 NULL hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/xademo.db xademo hive USER

 List tables in a given database

SELECT t.* FROM hive.TBLS t
 JOIN hive.DBS d
 ON t.DB_ID = d.DB_ID
 WHERE d.NAME = 'default';

Output:

TBL_ID CREATE_TIME DB_ID LAST_ACCESS_TIME OWNER RETENTION SD_ID TBL_NAME TBL_TYPE VIEW_EXPANDED_TEXT VIEW_ORIGINAL_TEXT LINK_TARGET_ID
1 1439988377 1 0 hue 0 1 sample_07 MANAGED_TABLE NULL NULL NULL
2 1439988387 1 0 hue 0 2 sample_08 MANAGED_TABLE NULL NULL NULL

Show the storage location of a given table

SELECT s.* FROM hive.TBLS t
JOIN hive.DBS d
ON t.DB_ID = d.DB_ID
JOIN hive.SDS s
ON t.SD_ID = s.SD_ID
WHERE TBL_NAME = 'sample_07'
AND d.NAME='default';

Output:

SD_ID CD_ID INPUT_FORMAT IS_COMPRESSED IS_STOREDASSUBDIRECTORIES LOCATION NUM_BUCKETS OUTPUT_FORMAT SERDE_ID
1 1 org.apache.hadoop.mapred.TextInputFormat 0 0 hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/sample_07 -1 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 1

Find out how a given view has been defined

SELECT t.* FROM hive.TBLS t
JOIN hive.DBS d
ON t.DB_ID = d.DB_ID
WHERE TBL_NAME = 'vw_sample_07'
AND d.NAME='default';

Output:

TBL_ID CREATE_TIME DB_ID LAST_ACCESS_TIME OWNER RETENTION SD_ID TBL_NAME TBL_TYPE VIEW_EXPANDED_TEXT VIEW_ORIGINAL_TEXT LINK_TARGET_ID
31 1471788438 1 0 hue 0 31 vw_sample_07 VIRTUAL_VIEW select count(*) from `default`.`sample_07` select count(*) from default.sample_07 NULL

Get column names, types and comments of a given table

SELECT c.* FROM hive.TBLS t
 JOIN hive.DBS d
 ON t.DB_ID = d.DB_ID
 JOIN hive.SDS s
 ON t.SD_ID = s.SD_ID
 JOIN hive.COLUMNS_V2 c
 ON s.CD_ID = c.CD_ID
 WHERE TBL_NAME = 'sample_07'
 AND d.NAME='default'
 ORDER by INTEGER_IDX;

Output:

CD_ID COMMENT COLUMN_NAME TYPE_NAME INTEGER_IDX
1 NULL code string 0
1 NULL description string 1
1 NULL total_emp int 2
1 NULL salary int 3

Conclusion

It’s possible to query metadata from the Hive metastore which can be handy for understanding what data is available in a Hive instance.  It’s also possible to edit this information too, although this would usually be inadvisable as the schema of the metastore may be subject to change between different Hive versions, and the results of modifying Hive internals could be unexpected at best, and catastrophic at worst.

Advertisements

2 thoughts on “Useful queries for the Hive metastore

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s