Hive is a Hadoop-based data warehouse framework. It is used to extract, transform, and load data and manage metadata in big data business scenarios.

Background information

For information about the compatibility of EMR versions with Hadoop and Hive versions, see Overview.

Hive components

Component Description
HiveServer2 A HiveQL-based query server. It receives SQL requests from a JDBC client based on a Thrift protocol or HTTP. HiveServer2 supports multi-client concurrency and authentication.
Hive MetaStore The metadata management component. It stores the metadata of databases and tables for other engines. For example, both Spark and Presto use this module to manage metadata.
Hive Client The Hive client. It is used to submit SQL jobs and convert the jobs to MapReduce, Tez, or Spark jobs based on the execution engine specified on the client. This component is installed on all nodes of an EMR cluster.

Hive syntax

EMR retains the syntax and use methods of open source components to the greatest extent. EMR Hive is completely compatible with the syntax of Apache Hive.

For more information about Apache Hive, visit the Apache Hive official website.

Quick start

  • For more information about how to use Hive, see Configure a Hive job.
  • For more information about how to use Hive in Zeppelin, see Zeppelin.