Hive is a Hadoop-based data warehouse framework. It is used to extract, transform, and load data and manage metadata in big data business scenarios.

Background information

For more information about the compatibility of E-MapReduce (EMR) versions with Hadoop and Hive versions, see Overview.

Hive components

Component Description
HiveServer2 A HiveQL-based query server. It receives SQL requests from a Java Database Connectivity (JDBC) client based on the Thrift or HTTP protocol. HiveServer2 supports multi-client concurrency and authentication.
Hive MetaStore The metadata management component. It stores the metadata of objects such as databases and tables for other engines. For example, both Spark and Presto use this component to manage metadata.
Hive Client The Hive client. It is used to submit SQL jobs and convert the jobs to MapReduce, Tez, or Spark jobs based on the execution engine specified on the client. This component is installed on all nodes of an EMR cluster.

Hive syntax

To ensure user experience, EMR retains the syntax of open source components to the greatest extent. EMR Hive is completely compatible with the syntax of Apache Hive.

For more information about Apache Hive, visit the Apache Hive official website.