Hive Metastore Introduction

Hive Metastore is a central repository for Hive metadata. It has 2 components:

  1. A Service to which the Hive Driver connects to and queries for the database schema.
  2. A backing database to store the metadata. Currently Hive supports 5 backend databases: Derby, MySQL, MS SQL Server, Oracle and Postgres.

Hive Metastore Modes

There are three modes to configure Hive Metastore:

Embedded Metastore: By default, the Metastore service runs in the same JVM with the hive service. In this case it uses embedded derby database stored on the local file-system. This mode of Hive has a limitation that only one session could be opened at a time as only one embedded Derby database can access the database files on disk. To allow multiple Hive services to connect the Metastore, Derby Is configured as a network server. You can refer to a blog on hive wiki for configuring Derby in server mode.

Image Hive Metastore Embedded Mode
Hive Metastore Embedded Mode

 

Local Metastore: Being a data-warehousing framework, a single session for Hive is not preferred. To solve this limitation of Embedded Metastore, a support for Local Metastore was developed. A separate database service runs as a process on same or remote machine. The Metastore service still runs in the same JVM within hive service. Before starting a Hive client, add the JDBC / ODBC driver libraries to the Hive lib folder.

Image Hive Metastore Local Mode
Hive Metastore Local Mode

 

Remote Metastore: There is one more configuration where one or more Metastore servers run as separate processes. This allows multiple Hive Clients to connect to a remote service rather than starting a Metastore service in the same JVM.

A Hive service is configured to use a remote Metastore by adding hive.metastore.uris property to Metastore server URIs. This property holds a comma-separated list of Metastore services. By default, the Hive service will connect to the first URI mentioned in the property. In case of a connection failure, it’ll randomly choose any of the Metastore and will try to reconnect.

The Metastore service and the hive server communicate using the Apache Thrift library. Before starting a Hive client or the Metastore service, add the libraries for the Metastore used to the Hive lib folder. To run a Metastore service on a system, execute the following command:

    hive --service metastore -p <port_num>
Image Hive Metastore Remote Mode
Hive Metastore Remote Mode

 

The next blog will cover the configuration part of Hive Metastore.

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *