Useful queries for the Hive metastore

The Hive metastore stores metadata about objects within Hive. Usually this metastore sits within a relational database such as MySQL.

Sometimes it’s useful to query the Hive metastore directly to find out what databases, tables and views exist in Hive and how they’re defined. For example, say we want to expose a report to users about how many Hive tables are currently in a Hadoop cluster. Or perhaps we want to run a script which performs some bulk operation on all tables in a particular Hive database.

Luckily, it’s easy to query the metastore using a tool such as MySQL Workbench using appropriate connectors – e.g. MySQL JDBC drivers.

Here’s a rough database diagram showing how the Hive metastore hangs together:

: Hive metastore database diagram (from HDP 2.3, click here for full screen)

Handy metastore SQL queries

Show all Hive databases

SELECT * FROM hive.DBS;

Output:

DB_ID	DESC	DB_LOCATION_URI	NAME	OWNER_NAME	OWNER_TYPE
1	Default Hive database	hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse	default	public	ROLE
6	NULL	hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/xademo.db	xademo	hive	USER

List tables in a given database

SELECT t.* FROM hive.TBLS t
 JOIN hive.DBS d
 ON t.DB_ID = d.DB_ID
 WHERE d.NAME = 'default';

Output:

TBL_ID	CREATE_TIME	DB_ID	LAST_ACCESS_TIME	OWNER	RETENTION	SD_ID	TBL_NAME	TBL_TYPE	VIEW_EXPANDED_TEXT	VIEW_ORIGINAL_TEXT	LINK_TARGET_ID
1	1439988377	1	0	hue	0	1	sample_07	MANAGED_TABLE	NULL	NULL	NULL
2	1439988387	1	0	hue	0	2	sample_08	MANAGED_TABLE	NULL	NULL	NULL

Show the storage location of a given table

SELECT s.* FROM hive.TBLS t
JOIN hive.DBS d
ON t.DB_ID = d.DB_ID
JOIN hive.SDS s
ON t.SD_ID = s.SD_ID
WHERE TBL_NAME = 'sample_07'
AND d.NAME='default';

Output:

SD_ID	CD_ID	INPUT_FORMAT	IS_COMPRESSED	IS_STOREDASSUBDIRECTORIES	LOCATION	NUM_BUCKETS	OUTPUT_FORMAT	SERDE_ID
1	1	org.apache.hadoop.mapred.TextInputFormat	0	0	hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/sample_07	-1	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	1

Find out how a given view has been defined

SELECT t.* FROM hive.TBLS t
JOIN hive.DBS d
ON t.DB_ID = d.DB_ID
WHERE TBL_NAME = 'vw_sample_07'
AND d.NAME='default';

Output:

TBL_ID	CREATE_TIME	DB_ID	LAST_ACCESS_TIME	OWNER	RETENTION	SD_ID	TBL_NAME	TBL_TYPE	VIEW_EXPANDED_TEXT	VIEW_ORIGINAL_TEXT	LINK_TARGET_ID
31	1471788438	1	0	hue	0	31	vw_sample_07	VIRTUAL_VIEW	select count(*) from `default`.`sample_07`	select count(*) from default.sample_07	NULL

Get column names, types and comments of a given table

SELECT c.* FROM hive.TBLS t
 JOIN hive.DBS d
 ON t.DB_ID = d.DB_ID
 JOIN hive.SDS s
 ON t.SD_ID = s.SD_ID
 JOIN hive.COLUMNS_V2 c
 ON s.CD_ID = c.CD_ID
 WHERE TBL_NAME = 'sample_07'
 AND d.NAME='default'
 ORDER by INTEGER_IDX;

Output:

CD_ID	COMMENT	COLUMN_NAME	TYPE_NAME	INTEGER_IDX
1	NULL	code	string	0
1	NULL	description	string	1
1	NULL	total_emp	int	2
1	NULL	salary	int	3

Conclusion

It’s possible to query metadata from the Hive metastore which can be handy for understanding what data is available in a Hive instance. It’s also possible to edit this information too, although this would usually be inadvisable as the schema of the metastore may be subject to change between different Hive versions, and the results of modifying Hive internals could be unexpected at best, and catastrophic at worst.

2 thoughts on “Useful queries for the Hive metastore”

Sampat says:

Feb 24, 2017 at 7:00 pm

Could you send the link of the Database diagram @ sampatkumarbsk@gmail.com, AS the picture is not reaDABLE

LikeLiked by 1 person

1. niftimusmaximus says:
  
  Feb 27, 2017 at 9:40 pm
  
  Hi Sampat – thanks for the tip. Have updated article with a link in the caption. Also see here for full res image: https://analyticsanvil.files.wordpress.com/2016/08/hive_metastore_database_diagram.png
  
  LikeLike

	niftimusmaximus on Using Azurite blob storage emu…
	Ganeshbabu on Using Azurite blob storage emu…
	Stephen Froeber (@SR… on Automatically tagging, caption…
	niftimusmaximus on Creating a virtual solar PV pl…
	ashley on Creating a virtual solar PV pl…

Useful queries for the Hive metastore

Handy metastore SQL queries

Show all Hive databases

List tables in a given database

Show the storage location of a given table

Output:

Find out how a given view has been defined

Get column names, types and comments of a given table

Conclusion

Published by niftimusmaximus

2 thoughts on “Useful queries for the Hive metastore”

Leave a comment Cancel reply

Handy metastore SQL queries

Show all Hive databases

List tables in a given database

Show the storage location of a given table

Output:

Find out how a given view has been defined

Get column names, types and comments of a given table

Conclusion

Share this:

Published by niftimusmaximus

2 thoughts on “Useful queries for the Hive metastore”

Leave a comment Cancel reply