This topic describes the mappings between E-MapReduce (EMR) versions and Hive versions, and describes the enhanced features of Hive in EMR.

The following table describes the enhanced features of Hive in EMR.
EMR version Hive version Enhanced feature
EMR V5.2.1 Hive 3.1.2
  • The issue that the output of the show create table command based on Data Lake Formation (DLF) metadata is inaccurate is fixed.
  • The default parameters of Hive are optimized to improve the performance of Hive jobs.
  • In the EMR console, the parameter names on the hive-env tab of the Configure tab for the Hive service are changed to uppercase. This facilitates the use of the parameters.
  • The issue that user-defined functions (UDFs) cause HiveServer2 memory leak is fixed.
  • The error that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is fixed.
EMR V4.9.0 Hive 3.1.2 Access control of Hive metastores can be configured in Ranger.
EMR V4.8.0 Hive 3.1.2
  • Some default configurations are optimized.
  • Performance is optimized. The cost-based optimization (CBO) feature is enhanced.
  • LDAP authentication can be enabled or disabled with a click.

    For more information, see Manage LDAP authentication.

EMR V4.6.0 Hive 3.1.2
  • Metadata from Alibaba Cloud Data Lake Formation (DLF) in an HCatalog table is supported.
  • Hive metadata and job running information can be sent to DataWorks.
EMR V4.5.0 Hive 3.1.2
  • Metadata of Alibaba Cloud DLF is supported.
  • Ownership-related permissions of Ranger are supported.
EMR V4.4.1 Hive 3.1.2 Default parameter settings are optimized.
EMR V4.4.0 Hive 3.1.2
  • Hive is updated to 3.1.2.
  • JindoFS is optimized.
  • Metastore consistency check (MSCK) is optimized.
  • The Jindo Job Committer in an HCatalog table is supported.
  • Has dependencies are updated.
EMR V4.3.0 Hive 3.1.1 Custom deployment is supported.
EMR V3.36.1 Hive 2.3.8
  • Hive is updated to 2.3.8.
  • The issue that the output of the show create table command based on Data Lake Formation (DLF) metadata is inaccurate is fixed.
  • The default parameters of Hive are optimized to improve the performance of Hive jobs.
  • In the EMR console, the parameter names on the hive-env tab of the Configure tab for the Hive service are changed to uppercase. This facilitates the use of the parameters.
  • The error that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is fixed.
EMR V3.35.0 Hive 2.3.7 Community issues related to fetch tasks are fixed.
EMR V3.34.0 Hive 2.3.7
  • Some default configurations are optimized.
  • Performance is optimized. The cost-based optimization (CBO) feature is enhanced.
  • LDAP authentication can be enabled or disabled with a click.

    For more information, see Manage LDAP authentication.

  • Calcite is updated to 1.12.0.
  • The hive.security.authorization.sqlstd.confwhitelist.append parameter is added.
EMR V3.33.0 Hive 2.3.7
  • Hive is updated to 2.3.7.
  • Metadata from Alibaba Cloud Data Lake Formation (DLF) in an HCatalog table is supported.
  • Hive metadata and job running information can be sent to DataWorks.
EMR V3.32.0 Hive 2.3.5
  • The connection leak issue of the HiveServer connection pool is fixed.
  • The data collection feature of JindoTable can be enabled or disabled.
  • The performance of ADD COLUMN is optimized.
  • The issue that causes data read from Hudi tables to be invalid is fixed.
  • The default configurations can be adjusted based on the sizes of cluster nodes.
EMR V3.30.0 Hive 2.3.5
  • Metadata from Alibaba Cloud DLF is supported.
  • The issue caused when you read an empty Delta table directory and write data into a dummy file is fixed.
  • Has dependencies are updated to 2.0.1.
EMR V3.29.0 Hive 2.3.5
  • Hive is updated to 2.3.5.6.0.
  • A third-party metastore is supported.
  • The datalake metastore-client parameter is added.
EMR V3.28.0 Hive 2.3.5 Delta 0.6.0 is supported.
EMR V3.27.2 Hive 2.3.5
  • The magic committer in an HCatalog table is supported.
  • Some outdated default configurations are removed.
EMR V3.26.3 Hive 2.3.5 The direct committer in an HCatalog table is supported.
EMR V3.25.0 Hive 2.3.5 The issue that causes execution failures of MapReduce tasks in automatic local mode is fixed.
EMR V3.24.0 Hive 2.3.5
  • SQL statement compatibility can be checked.
  • Hive 2.3.5 and Hadoop 2.8.5 are released as a combination.
  • When Hive is restarted, the content in hiveserver2-site.xml is not synchronized to hive-site.xml in the spark-conf folder.
  • The MSCK command can be used to add incremental directories.
  • The bug triggered by the reuse of a Tez container in Hive is fixed.
  • The MSCK command can be used to optimize column directories.
EMR V3.23.0 Hive 2.3.5
  • Removed Hive hooks configured in earlier versions of Hive.
  • Supports using multiple COUNT(DISTINCT) for hive.groupby.skew in data optimization.
  • Fixed the issue of data loss when joining tables with different bucket versions.
EMR versions earlier than EMR V3.23.0 Hive 2.x Data from external databases is stored on Hive metastores. The clusters that use the same Hive metastore share the data in the Hive metastore.