The Hive metastore stores metadata about objects within Hive. Usually this metastore sits within a relational database such as MySQL.
Sometimes it’s useful to query the Hive metastore directly to find out what databases, tables and views exist in Hive and how they’re defined. For example, say we want to expose a report to users about how many Hive tables are currently in a Hadoop cluster. Or perhaps we want to run a script which performs some bulk operation on all tables in a particular Hive database.
Luckily, it’s easy to query the metastore using a tool such as MySQL Workbench using appropriate connectors – e.g. MySQL JDBC drivers.
Here’s a rough database diagram showing how the Hive metastore hangs together:
- Hive metastore database diagram (from HDP 2.3, click here for full screen)
Handy metastore SQL queries
Show all Hive databases
SELECT * FROM hive.DBS;
Output:
DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
1 | Default Hive database | hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse | default | public | ROLE |
6 | NULL | hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/xademo.db | xademo | hive | USER |
List tables in a given database
SELECT t.* FROM hive.TBLS t JOIN hive.DBS d ON t.DB_ID = d.DB_ID WHERE d.NAME = 'default';
Output:
TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | LINK_TARGET_ID |
1 | 1439988377 | 1 | 0 | hue | 0 | 1 | sample_07 | MANAGED_TABLE | NULL | NULL | NULL |
2 | 1439988387 | 1 | 0 | hue | 0 | 2 | sample_08 | MANAGED_TABLE | NULL | NULL | NULL |
Show the storage location of a given table
SELECT s.* FROM hive.TBLS t JOIN hive.DBS d ON t.DB_ID = d.DB_ID JOIN hive.SDS s ON t.SD_ID = s.SD_ID WHERE TBL_NAME = 'sample_07' AND d.NAME='default';
Output:
SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID |
1 | 1 | org.apache.hadoop.mapred.TextInputFormat | 0 | 0 | hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/sample_07 | -1 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 1 |
Find out how a given view has been defined
SELECT t.* FROM hive.TBLS t JOIN hive.DBS d ON t.DB_ID = d.DB_ID WHERE TBL_NAME = 'vw_sample_07' AND d.NAME='default';
Output:
TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | LINK_TARGET_ID |
31 | 1471788438 | 1 | 0 | hue | 0 | 31 | vw_sample_07 | VIRTUAL_VIEW | select count(*) from `default`.`sample_07` | select count(*) from default.sample_07 | NULL |
Get column names, types and comments of a given table
SELECT c.* FROM hive.TBLS t JOIN hive.DBS d ON t.DB_ID = d.DB_ID JOIN hive.SDS s ON t.SD_ID = s.SD_ID JOIN hive.COLUMNS_V2 c ON s.CD_ID = c.CD_ID WHERE TBL_NAME = 'sample_07' AND d.NAME='default' ORDER by INTEGER_IDX;
Output:
CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
1 | NULL | code | string | 0 |
1 | NULL | description | string | 1 |
1 | NULL | total_emp | int | 2 |
1 | NULL | salary | int | 3 |
Conclusion
It’s possible to query metadata from the Hive metastore which can be handy for understanding what data is available in a Hive instance. It’s also possible to edit this information too, although this would usually be inadvisable as the schema of the metastore may be subject to change between different Hive versions, and the results of modifying Hive internals could be unexpected at best, and catastrophic at worst.