Hive Metastore Introduction
Hive Metastore is a central repository for Hive metadata. It has 2 components:
- A Service to which the Hive Driver connects to and queries for the database schema.
- A backing database to store the metadata. Currently Hive supports 5 backend databases: Derby, MySQL, MS SQL Server, Oracle and Postgres.
Hive Metastore Modes
There are three modes to configure Hive Metastore:
Embedded Metastore: By default, the Metastore service runs in the same JVM with the hive service. In this case it uses embedded derby database stored on the local file-system. This mode of Hive has a limitation that only one session could be opened at a time as only one embedded Derby database can access the database files on disk. To allow multiple Hive services to connect the Metastore, Derby Is configured as a network server. You can refer to a blog on hive wiki for configuring Derby in server mode.
Local Metastore: Being a data-warehousing framework, a single session for Hive is not preferred. To solve this limitation of Embedded Metastore, a support for Local Metastore was developed. A separate database service runs as a process on same or remote machine. The Metastore service still runs in the same JVM within hive service. Before starting a Hive client, add the JDBC / ODBC driver libraries to the Hive lib folder.
Remote Metastore: There is one more configuration where one or more Metastore servers run as separate processes. This allows multiple Hive Clients to connect to a remote service rather than starting a Metastore service in the same JVM.
A Hive service is configured to use a remote Metastore by adding hive.metastore.uris property to Metastore server URIs. This property holds a comma-separated list of Metastore services. By default, the Hive service will connect to the first URI mentioned in the property. In case of a connection failure, it’ll randomly choose any of the Metastore and will try to reconnect.
The Metastore service and the hive server communicate using the Apache Thrift library. Before starting a Hive client or the Metastore service, add the libraries for the Metastore used to the Hive lib folder. To run a Metastore service on a system, execute the following command:
hive --service metastore -p <port_num>
The next blog will cover the configuration part of Hive Metastore.