All Products
Search
Document Center

E-MapReduce:Release notes for EMR 5.x series

Last Updated:Oct 28, 2025

This topic describes the release dates and updates for the EMR 5.x series. For more information about the components supported in each version, see Distributions.

EMR-5.21.x

Release date

Version

Date

EMR-5.21.0

October 27, 2025

Updates

Service

Change

Hive

  • Adds a profile mechanism. This mechanism automatically detects the file format of data lake storage, such as ORC, and applies the optimized buffer and pre-read parameters of JindoSDK.

  • Introduces an ORC stripe prefetch mechanism. This mechanism enables parallel computing and I/O operations when you process medium to large ORC files. It asynchronously prefetches subsequent stripes while processing the current stripe to improve throughput.

  • Supports ORC vectorized read. When you read index data from ORC files or perform predicate pushdown, many scattered and non-consecutive file ranges are generated. Vectorized read sends batch requests to significantly improve throughput.

  • Integrates the JindoSDK batch metadata API. This API processes metadata requests, such as getFileStatus, in batches to improve metadata request throughput.

Spark

  • Adds a profile mechanism. This mechanism automatically detects the file format of data lake storage, such as ORC, and applies the optimized buffer and pre-read parameters of JindoSDK.

  • Introduces an ORC stripe prefetch mechanism. This mechanism enables parallel computing and I/O operations when you process medium to large ORC files. It asynchronously prefetches subsequent stripes while processing the current stripe to improve throughput.

  • Supports parallel pre-open for small files. This feature automatically detects small file query scenarios and pre-opens a batch of files in parallel. This greatly reduces I/O latency caused by frequent open operations.

  • Supports ORC vectorized read. When you read index data from ORC files or perform predicate pushdown, many scattered and non-consecutive file ranges are generated. Vectorized read sends batch requests to significantly improve throughput.

Tez

Supports parallel pre-open for small files. This feature automatically detects small file query scenarios and pre-opens a batch of files in parallel. This greatly reduces I/O latency caused by frequent open operations.

Ranger

  • Jindoauth Server supports custom RAM roles for client users to access OSS.

  • Fixes a missing dependency issue in the Ranger-yarn-plugin.

Paimon

Upgraded to version 1-ali-16.3.

JindoCache

Upgraded to version 6.10.1.

Deltalake

Added the component. The version is 3.2.1.

Release version information

DataLake cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

Hive

3.1.3

Spark2

2.4.8

Spark3

3.5.3

Tez

0.10.2

Trino

422

Deltalake

3.2.1

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

YARN

3.2.1

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

DLF-Auth

2.0.2

Presto

0.283

Zookeeper

3.8.4

Sqoop

1.4.7

Knox

1.5.0

Celeborn

0.5.2

JindoCache

6.10.1

Paimon

1-ali-16.3

OLAP clusters

Service

Version

StarRocks2

2.5.22

StarRocks3

3.2.11

Doris

2.1.4

ClickHouse

23.3.13.6

Zookeeper

3.8.4

DataFlow cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

YARN

3.2.1

OpenLDAP

2.4.46

Zookeeper

3.8.4

Knox

1.5.0

Flink

1.17.2

Paimon

1-ali-6.2

DataServing cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Zookeeper

3.8.4

Knox

1.5.0

HBase

2.6.3

JindoCache

6.8.2

Phoenix

5.2.1

Custom cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

Hive

3.1.3

Spark2

2.4.8

Spark3

3.5.3

Tez

0.10.2

Trino

422

Deltalake

3.2.1

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

YARN

3.2.1

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

DLF-Auth

2.0.2

Presto

0.283

StarRocks2

2.5.22

StarRocks3

3.2.11

Zookeeper

3.8.4

Sqoop

1.4.7

Knox

1.5.0

Celeborn

0.5.2

Flink

1.17.2

HBase

2.6.3

JindoCache

6.10.1

Paimon

1-ali-16.3

Phoenix

5.2.1

EMR-5.20.x

Release date

Version

Date

EMR-5.20.0

July 10, 2025

Updates

Service

Change

Hive

Optimizes the performance of adding fields to partitioned tables.

YARN

Optimizes global scheduling performance to prevent certain application behaviors from degrading cluster scheduling performance.

Release version information

DataLake cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

Hive

3.1.3

Spark2

2.4.8

Spark3

3.5.3

Tez

0.10.2

Trino

422

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

YARN

3.2.1

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

DLF-Auth

2.0.2

Presto

0.283

Zookeeper

3.8.4

Sqoop

1.4.7

Knox

1.5.0

Celeborn

0.5.2

JindoCache

6.8.2

Paimon

1-ali-6.2

OLAP Clusters

Service

Version

StarRocks2

2.5.22

StarRocks3

3.2.11

Doris

2.1.4

ClickHouse

23.3.13.6

Zookeeper

3.8.4

DataFlow cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

YARN

3.2.1

OpenLDAP

2.4.46

Zookeeper

3.8.4

Knox

1.5.0

Flink

1.17.2

Paimon

1-ali-6.2

DataServing cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Zookeeper

3.8.4

Knox

1.5.0

HBase

2.6.3

JindoCache

6.8.2

Phoenix

5.2.1

Custom cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

Hive

3.1.3

Spark2

2.4.8

Spark3

3.5.3

Tez

0.10.2

Trino

422

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

YARN

3.2.1

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

DLF-Auth

2.0.2

Presto

0.283

StarRocks2

2.5.22

StarRocks3

3.2.11

Zookeeper

3.8.4

Sqoop

1.4.7

Knox

1.5.0

Celeborn

0.5.2

Flink

1.17.2

HBase

2.6.3

JindoCache

6.8.2

Paimon

1-ali-6.2

Phoenix

5.2.1

EMR-5.19.x

Release date

Version

Date

EMR-5.19.0

April 24, 2025

Updates

Service

Change

Trino

Fixes an issue where LDAP is unavailable.

YARN

  • Improves resource allocation efficiency through global scheduling optimization.

  • Adds metric monitoring for HTTP services.

  • Fixes an open source bug (YARN-10213).

HBase

  • Upgraded to version 2.6.3.

  • Changes the default runtime environment to Java 11.

  • Changes the default garbage collector to G1.

Phoenix

Upgraded to version 5.2.1.

JindoCache

Upgraded to version 6.8.2.

StarRocks

Supports the creation of clusters with decoupled storage and compute.

EMRHOOK

  • Adds support for Spark 3.5.

  • Supports data lineage tracking for Paimon tables.

  • Enhances stability.

Release version information

DataLake cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

Hive

3.1.3

Spark2

2.4.8

Spark3

3.5.3

Tez

0.10.2

Trino

422

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

YARN

3.2.1

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

DLF-Auth

2.0.2

Presto

0.283

Zookeeper

3.8.4

Sqoop

1.4.7

Knox

1.5.0

Celeborn

0.5.2

JindoCache

6.8.2

Paimon

1-ali-6.2

OLAP clusters

Service

Version

StarRocks2

2.5.22

StarRocks3

3.2.11

Doris

2.1.4

ClickHouse

23.3.13.6

Zookeeper

3.8.4

DataFlow cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

YARN

3.2.1

OpenLDAP

2.4.46

Zookeeper

3.8.4

Knox

1.5.0

Flink

1.17.2

Paimon

1-ali-6.2

DataServing cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Zookeeper

3.8.4

Knox

1.5.0

HBase

2.6.3

JindoCache

6.8.2

Phoenix

5.2.1

Custom cluster

Service

Version

Hadoop-Common

3.2.1

HDFS

3.2.1

OSS-HDFS

1.0.0

Hive

3.1.3

Spark2

2.4.8

Spark3

3.5.3

Tez

0.10.2

Trino

422

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

YARN

3.2.1

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

DLF-Auth

2.0.2

Presto

0.283

StarRocks2

2.5.22

StarRocks3

3.2.11

Zookeeper

3.8.4

Sqoop

1.4.7

Knox

1.5.0

Celeborn

0.5.2

Flink

1.17.2

HBase

2.6.3

JindoCache

6.8.2

Paimon

1-ali-6.2

Phoenix

5.2.1

EMR-5.18.x

Release dates

Version

Date

EMR-5.18.1

December 18, 2024

EMR-5.18.0 (New purchases not supported)

December 4, 2024

Updates

Service

Change

Spark3

  • Upgraded to version 3.5.3.

  • Fixes a configuration issue that occurs during Spark scale-out.

Trino

Fixes an issue where connections fail after LDAP is enabled.

Presto

Zookeeper

Supports adding custom configurations.

Ranger

Replaces the existing Spark 3 Ranger plugin with the version provided by the open source Kyuubi project.

Hudi

Upgraded to version 0.15.0.

Celeborn

Upgraded to version 0.5.2.

Paimon

Upgraded to version 1.0-ali-1.

JindoCache

Upgraded to version 6.5.3.

StarRocks3

Upgraded to version 3.2.11.

StarRocks2

Upgraded to version 2.5.22.

Impala

The service is offline. Use a recommended service as an alternative, or install the corresponding service yourself.

For Impala, use Presto, Trino, ClickHouse, or StarRocks as an alternative.

Kudu

Kafka

Kafka-Manager

EMR-5.17.x

Release dates

Version

Date

EMR-5.17.4

December 18, 2024

EMR-5.17.3 (New purchases not supported)

November 29, 2024

EMR-5.17.2 (New purchases not supported)

August 29, 2024

EMR-5.17.1 (New purchases not supported)

June 21, 2024

EMR-5.17.0 (New purchases not supported)

April 23, 2024

Updates

EMR-5.17.4

Service

Change

JindoCache

Upgraded to version 6.5.3.

StarRocks2

Upgraded to version 2.5.22.

StarRocks3

Upgraded to version 3.2.11.

EMR-5.17.3

Service

Change

JindoSDK

Upgrades JindoSDK to resolve a deadlock issue.

EMR-5.17.2

Service

Change

JindoCache

  • Upgraded to version 6.5.1.

  • Improves the read and write performance of Distributed Hash Table (DHT).

Spark

  • Fixes an issue where partition folders cannot be deleted.

  • Fixes a Hive package dependency issue to ensure that client operations do not interrupt the connection to metaStoreClient.

Trino

  • Fixes an issue where some modified configurations might be unexpectedly restored during scale-out.

  • Supports querying data on high-security OSS-HDFS.

  • Fixes a service abnormality issue that occurs after DLF-AUTH is enabled.

Presto

Supports querying data on high-security OSS-HDFS.

HDFS

Fixes an issue where the memory of NameNode and DataNode cannot be modified.

HBaseHDFS

YARN

  • ResourceManager supports sending timeline events in batches to improve processing capabilities.

  • Fixes a logic issue in container and resource processing in ResourceManager.

Zookeeper

  • Fixes an issue where the memory configuration of a node group cannot be modified.

  • Supports refactoring the log configuration file.

Impala

Fixes an issue where customer configurations are modified during elastic scaling.

Ranger

Supports the new JindoSDK kernel to effectively reduce CPU utilization.

Knox

Fixes an issue where component URL access fails when there is only one Master-Extend node.

Kafka

Fixes a startup issue with Kafka Connect clusters.

StarRocks

Fixes an issue where new BE nodes are not visible after a scale-out.

Doris

Upgraded to version 2.1.4.

Paimon

Upgraded to version 0.9-ali-7.

EMRHOOK

Supports parsing data lineage for MaxCompute tables.

EMR-5.17.1

Service

Change

Spark

Supports deploying Master-Extend node groups.

Hive

Kyuubi

Paimon

Replaces the Flink dependency from the VVR version with the community version and supports DLF Catalog.

Knox

Uses JDK 8 for packaging.

Flink

Restores the DLF configurations and dependencies that were removed in EMR-5.17.0.

EMR-5.17.0

Service

Change

Spark

Spark3 is upgraded to version 3.4.2.

Celeborn

Upgraded to version 0.4.0.

Doris

Upgraded to version 2.1.0.

StarRocks

  • StarRocks2 is upgraded to version 2.5.18.

  • StarRocks3 is upgraded to version 3.2.4.

DeltaLake

Upgraded to version 3.0.0.

Iceberg

Upgraded to version 1.5.0.

Zookeeper

Upgraded to version 3.8.4.

JindoCache

Upgraded to version 6.2.5.

Flink

Upgraded to version 1.17.2.

EMR-5.16.x

Release date

Version

Date

EMR-5.16.0

February 19, 2024

Updates

Service

Change

Hudi

Upgraded to version 0.14.0.

Flume

Upgraded to version 1.11.0.

Kyuubi

Upgraded to version 1.7.3.

Impala

Upgraded to version 4.3.0.

Celeborn

Upgraded to version 0.3.2.

JindoCache

Upgraded to version 6.2.0.

Paimon

Upgraded to version 0.7-ali-1.

Kafka

Upgraded to version 3.6.1.

StarRocks

  • StarRocks2 is upgraded to version 2.5.13.

  • StarRocks3 is upgraded to version 3.1.5.

Spark

Fixes the Commons Text vulnerability.

Ranger

  • Vulnerabilities in the Commons Text library are fixed.

  • The path matching permission bypass vulnerability in the Spring Security framework is fixed.

  • The forward/include authentication bypass vulnerability in the Spring Security framework is fixed.

  • The identity authentication bypass vulnerability in a special matching mode in Spring Framework is fixed.

  • The interval at which Ranger obtains user information from the Lightweight Directory Access Protocol (LDAP) server and updates the user information can be modified.

EMR-5.15.x

Release dates

Version

Date

EMR-5.15.1

November 16, 2023

EMR-5.15.0 (New purchases not supported)

October 27, 2023

Updates

Service

Change

JindoCache

Adds the service. The version is 6.1.1.

JindoData

JindoData cannot be selected. Use the new JindoCache service for caching and the DLF-Auth service for authentication.

Spark

Removes jdo-related configurations from hive-site.xml.

HBase

Adds a configuration item to let you select the HBase Thrift Server version, v1 or v2, as needed.

StarRocks

Upgrades StarRocks2 to version 2.5.10.

Doris

Upgrades Doris to version 1.2.7.

Celeborn

Upgrades Celeborn to version 0.3.1.

Paimon

Upgrades Paimon to version 0.6-ali-2.

ClickHouse

Upgrades ClickHouse to version 23.3.13.6.

EMR-5.14.x

Release date

Version

Date

EMR-5.14.2

August 17, 2023

Updates

Service

Change

Trino

  • Fixes an issue where the Paimon connector fails to query HDFS tables.

  • Fixes an issue where worker monitoring metrics cannot be read.

Presto

  • Upgraded to version 0.283.

  • Fixes an issue where worker monitoring metrics cannot be read.

ClickHouse

Grants all permissions to the default user by default.

StarRocks

  • Renames the previous StarRocks version to StarRocks2.

  • Adds StarRocks3. The version is 3.1.2. By default, clusters are created in coupled storage and compute mode. Decoupled storage and compute mode is not supported.

Celeborn

Upgraded to version 0.3.0.

EMR-5.13.x

Release date

Version

Date

EMR-5.13.0

August 3, 2023

Updates

Service

Change

Hudi

Upgraded to version 0.13.1.

Paimon

Upgraded to version 0.5-ali-1.

StarRocks

Upgraded to version 2.5.8.

JindoData

Upgraded to version 4.6.11.

Trino

  • Upgraded to version 422.

  • The Hudi connector supports querying Merge On Read (MOR) tables.

  • Improves the error message for dynamic UDF loading.

EMR-5.12.x

Release dates

Version

Date

EMR-5.12.1

July 13, 2023

EMR-5.12.0 (New purchases not supported)

June 1, 2023

Updates

EMR-5.12.1

Service

Change

Spark

  • Spark History Server supports using OSS-HDFS for storage by default.

  • The Spark 3 native engine supports using OSS and OSS-HDFS for storage.

Hive

Hive warehouse supports using OSS-HDFS for storage by default.

OSS-HDFS

Adds the service.

YARN

Supports using OSS-HDFS for storage by default.

HBase

  • HBase HFile data supports using OSS-HDFS for storage by default.

  • HBase WAL logs support using OSS-HDFS for storage.

EMR-5.12.0

Service

Change

Kyuubi

Upgraded to version 1.7.1.

Celeborn

Upgraded to version 0.2.2.

Paimon

  • Flink-Table-Store is renamed to Paimon.

  • Upgraded to version 0.4-ali-1.

StarRocks

Upgraded to version 2.5.5.

Doris

Upgraded to version 1.2.4.

ClickHouse

Upgraded to version 23.3.2.37.

Trino

Provides a simple event listener by default to obtain audit logs.

Phoenix

Supports Hive on Phoenix.

EMR-5.11.x

Release dates

Version

Date

EMR-5.11.1

April 3, 2023

EMR-5.11.0 (New purchases not supported)

February 28, 2023

Updates

EMR-5.11.1

Service

Change

ClickHouse

Upgraded to version 22.8.14.53.

Trino

Adds the odps.properties connector to support queries on MaxCompute.

JindoData

Upgraded to version 4.6.5.

JindoSDK

Upgraded to version 4.6.5.

Flink-Table-Store

Upgraded to version 0.3-ali-2.

YARN

Supports Node Labels management.

EMR-5.11.0

Service

Change

Iceberg

Upgraded to version 1.1.0.

Hudi

  • Upgraded to version 0.12.2.

  • Supports CDC.

DeltaLake

  • Upgraded to version 2.2.0.

  • Supports recording Vacuum operations in the transaction log.

Kudu

Upgraded to version 1.16.0.

Clickhouse

The ZooKeeper service must be selected when you install the ClickHouse service.

Celeborn

  • RSS is renamed to Celeborn.

  • The version of Celeborn is 0.2.0.

Presto

Adds the service. The kernel is community Facebook PrestoDB 0.278.3. The default HTTP port is 8889, and the default HTTPS port is 7779.

StarRocks

Upgraded to version 2.5.1.

Doris

Upgraded to version 1.2.1.

Kafka-Manager

Upgraded to version 3.0.0.6.

Impala

Upgraded to version 4.2.0.

OpenLDAP

Upgraded to version 2.4.46.

HBase

  • Supports JDK 11.

  • Supports ThriftServer2.

  • The default value of the hbase.block.data.cachecompressed parameter is changed to true.

Flink-Table-Store

Adds the service. The version is based on community version 0.3.

JindoData

Upgraded to version 4.6.4.

EMR-5.10.x

Release date

EMR-5.10.0 December 1, 2022

Updates

Service

Change

Iceberg

Upgraded to version 0.14.1.

Flink

Upgraded to Flink 1.15-vvr-6.0.2, which corresponds to the community Flink 1.15 major version.

Kafka

  • Supports LDAP user logon authentication and authorization.

  • Supports user group authorization.

Trino

  • EMR Presto is renamed to its official community name, Trino.

  • Supports Ranger and DLF AUTH.

  • Fixes an issue where connections to worker nodes fail after one-click LDAP enablement.

JindoSDK

Upgraded to version 4.6.2.

JindoData

Upgraded to version 4.6.2.

HBase

  • Supports Ranger.

  • Fixes an issue where OSS-HDFS cannot be selected as the storage mode when adding the service.

YARN

ACLs are enabled by default in high-security mode.

Starrocks

Upgraded to version 2.4.1.

Doris

Upgraded to version 1.1.5.

Hudi

The console supports configuring hudi-defaults.conf.

Ranger

  • Upgraded to version 2.3.0.

  • Supports integration with Trino, YARN, HBase, and Kafka.

DLF-Auth

  • Upgraded to version 2.0.2.

  • Supports Trino and Impala.

OpenLDAP

Integrates with the Nslcd component.

Kudu

Kudu Tserver can no longer be installed in Task node groups.

Spark

Upgraded to version 3.3.1.

Tez

Upgraded to version 0.10.2.

Kyuubi

Upgraded to version 1.6.0.

EMR-5.9.x

Release dates

Version

Date

EMR-5.9.1

November 08, 2022

EMR-5.9.0 (New purchases not supported)

October 14, 2022

Updates

EMR-5.9.1

Service

Change

Kerberos

Supports connecting to an external KDC on EMR.

Kafka

Adds a configuration item for startup commands that allows users to customize the startup parameters for the service.

JindoData

  • Upgraded to version 4.6.0.

  • Supports rewriting OSS-HDFS access paths.

Flink

Upgraded to version 1.13_vvr_4.0.15.

RSS

Upgraded to version 0.1.4.

EMR-5.9.0

Service

Change

Spark

  • Upgraded to version 3.3.

  • Supports enabling Kerberos authentication.

Hudi

  • Upgraded to version 0.12.0.

  • Supports Spark 3.3.

  • Supports using a cloud-based MetaStore to host metadata and enabling the acceleration feature. For more information, see Instructions on how to use Hudi MetaStore.

Flink

  • Supports enabling Kerberos authentication.

  • Supports automatic connection with Data Lake Formation (DLF).

Iceberg

  • Upgraded to version 0.14.0.

  • Supports Spark 3.3.

  • Supports enabling Kerberos authentication.

JindoData

  • Upgraded to version 4.5.1.

  • Supports AccessKey-free access to Alibaba Cloud resources.

Hadoop-Common and HDFS

  • Supports enabling Kerberos authentication.

  • Fixes security vulnerability CVE-2022-25168.

Knox

Integrates with Ranger. You can access the Ranger UI from the Access Links And Ports tab.

HBase

  • Upgraded to version 2.4.9.

  • Supports enabling Kerberos authentication.

  • Supports group configuration.

RSS

  • Upgraded to version 0.1.2.

  • Supports enabling Kerberos authentication.

Doris

  • Upgraded to version 1.1.2.

  • Supports enabling Kerberos authentication.

StarRocks

  • Upgraded to version 2.3.2.

  • Supports enabling Kerberos authentication.

Kafka

  • Upgraded to version 2.13_3.2.1.

  • Supports enabling Kerberos authentication.

DeltaLake

  • Supports upgrading to version 2.1.0.

  • Supports Spark 3.3.

  • Supports enabling Kerberos authentication.

Impala

  • Supports creating views in DLF.

  • Supports enabling Kerberos authentication.

Kudu

Adds the component. The version is 1.14.0.

YARN, Ranger, Hive, Kyuubi, Tez, Zookeeper, DLF-Auth, Phoenix, Sqoop, and Presto

Support enabling Kerberos authentication.

EMR-5.8.x

Release date

EMR-5.8.0 August 5, 2022

Updates

Service

Change

Spark

Supports one-click integration with LDAP.

Hive

Supports one-click integration with LDAP.

Presto

  • Upgraded to community version 389.

    Uses the standalone Delta Lake and Hudi connectors provided by the community.

    • The Delta Lake connector in this version does not support Time Travel and Z-Order.

    • The Hudi connector in this version does not support querying MOR tables.

  • Supports one-click integration with LDAP.

DeltaLake

  • Integrates with DLF for automated lake table management.

  • Fixes an issue where partition information cannot be automatically synchronized in CTAS scenarios.

  • The optimize and vacuum commands support returning metric information.

Hudi

Upgraded to version 0.11.1.

HadoopCommon

Adds the component. This resolves the issue where HDFS, YARN, and JindoSDK configurations overwrite each other.

YARN

Enhances the elastic scaling feature.

Ranger

  • Supports both Spark 2 and Spark 3.

  • Ranger Usersync supports one-click integration with LDAP.

Kafka

Adds the component. The version is 2.12-2.4.1.

HBase

Adds the component. The version is 2.3.4.

Phoenix

Adds the component. The version is 5.1.2.

Doris

Upgraded to version 1.1.1.

StarRocks

  • Upgraded to version 2.3.0.

  • The primary key model supports the complete DELETE WHERE syntax and persistence of the primary key index to reduce memory usage.

ClickHouse

  • Upgraded to version 22.3.8.39.

  • Fixes an out-of-memory issue when reading large files from OSS.

EMR-5.6.x

Release date

EMR-5.6.0 April 21, 2022

Updates

Service

Change

JindoData

Adds the component. The version is 4.3.0.

JindoSDK

Upgraded to version 4.3.0.

Spark

Upgraded to version 3.2.1.

Hive

Fixes a bug where commits are repeated after Speculation is enabled in Tez.

Presto

Fixes a bug where the Presto service fails to start after it is added to a Hadoop cluster that has been initialized.

DeltaLake

DML supports subqueries.

Hudi

Upgraded to version 0.10.1.

Iceberg

Upgraded to version 0.13.1.

YARN

Adds a feature configuration to restrict ApplicationMasters (AMs) to run only on CORE group nodes.

HBase

Fixes a bug in the HBase 2.3.4 kernel.

Zookeeper

Optimizes JVM parameter configurations.

Impala

Adapts to JindoSDK 4.3.0.

Sqoop

Upgrades the PostgreSQL version.

Zeppelin

Fixes an issue where the JDBC Interpreter fails to start.

Ranger

The Ranger 1.2.0 Spark plugin supports Delta and Hudi.

Flume

Adapts to JindoSDK 4.3.0.

Oozie

Upgrades Log4j to version 2.17.2.

DLF-Auth

Upgraded to version 2.0.0.

EMR-5.5.x

Release dates

Version

Date

EMR-5.5.1

March 25, 2022

EMR-5.5.0 (New purchases not supported)

February 15, 2022

Updates

EMR-5.5.1

Note

Only OLAP clusters in the new console support this version.

Service

Change

Clickhouse

Modifies the default values of some parameters.

StarRocks

Upgraded to version 2.1.1.

EMR-5.5.0

Service

Change

SmartData

The component is offline.

BIGBOOT

RSS

  • Upgrades the ESS service to RSS.

  • Enhances the features and stability of the service.

JindoSDK

  • Upgrades the architecture to JindoData.

  • EMR integrates with JindoSDK 4.0 for the first time, supporting services such as OSS and OSS-HDFS.

Spark

  • The COUNT DISTINCT function supports IF statements and optimizes the usage of CASE WHEN.

    Set the spark.sql.optimizer.rewriteConditionalDistinctAggregates parameter to true.

  • Shuffle Hash Join supports fallback to Sort Merge Join.

    Set the spark.sql.join.preferSortMergeJoin parameter to false, and set the spark.sql.join.enableShuffledHashJoinFallback parameter to true.

  • Supports automatic merging of small files for non-dynamic partitions.

    Set the spark.sql.adaptive.merge.output.small.files.enabled parameter to true.

  • The concurrency is automatically adjusted for scenarios such as GroupingSet and Distinct.

    Set the spark.sql.execution.optimizeExpand parameter to true.

  • Optimizes Hive on Spark.

  • Supports Time Travel syntax.

  • Adapts to JindoSDK.

Tez

Adapts to JindoSDK.

Hive

  • Optimizes the batch deletion of Hive Jindo.

  • Optimizes the HiveServer2 OOM issue.

  • Optimizes Hive on Spark.

  • Adapts to JindoSDK.

Presto

  • Upgrades Presto to community version 358.

  • Adds MySQL, Iceberg, Hudi, Phoenix, Kudu, and Delta connectors by default and updates the default configurations.

  • Supports data lake analytics.

  • Supports dynamic UDF loading.

  • Adapts to JindoSDK.

Delta Lake

  • Version upgrade

    • Upgraded to version 1.1.0, which is compatible with Spark 3.2.0.

    • All commercial features are migrated to version 1.1.0.

  • Metadata management

    • Optimizes the synchronization of metadata modifications to the metastore.

    • Automatically reports table statistics (dataProfiling) to the metastore.

  • SQL

    • Supports Time Travel syntax.

    • Supports DropPartition SQL syntax.

    • Supports dynamic partition overwrites using SQL.

    • Supports ADD COLUMN operations at specified positions (FIRST and AFTER).

  • Table management enhancements

    • Supports and enables dynamic adjustment of file sizes based on table sizes by default.

    • Supports and enables automatic Vacuum by default. Concurrent Vacuum is also supported.

    • Optimizes the logic for automatic compaction. This feature is disabled by default.

    • Adds Z-order syntax and accelerates the Z-order process.

Hudi

  • Upgraded to version 0.10.0.

  • Supports Spark 3.2.0.

  • Supports JindoFS Block mode.

HDFS

Adapts to JindoSDK.

YARN

  • Adapts to RSS memory configurations.

  • Adapts to JindoSDK.

Flume

Adapts to JindoSDK.

Impala

Adapts to JindoSDK.

Ranger

  • Supports Spark 3.2.0.

  • Supports Presto 358.

HBase

  • Fixes issues with default parameters.

  • Fixes an issue with the GC log date format.

Clickhouse

Iceberg

  • Upgraded to version 0.13.0.

  • Supports Presto 358.

DLF-Auth

  • Supports Spark 3.2.0.

  • Supports Presto 358.

EMR-5.4.x

Release dates

Version

Date

EMR-5.4.3

December 2021

EMR-5.4.2 (New purchases not supported)

December 2021

EMR-5.4.1 (New purchases not supported)

November 2021

EMR-5.4.0 (New purchases not supported)

October 2021

Updates

EMR-5.4.3

This release fixes the Log4j security vulnerabilities in all related components. For more information, see Vulnerability Announcement | Apache Log4j2 Arbitrary Code Execution Vulnerability.

Service

Change

Presto

Fixes the Log4j vulnerability in the Elasticsearch connector.

DLF Metastore

  • Changes the default setting for Metastore logs from enabled to disabled.

  • Fixes an error caused by an excessively long URI for Metastore gettablestats.

Delta Lake

Fixes an issue with synchronizing schema changes to the Metastore.

Sqoop

Fixes an issue where precision is lost for the Decimal type when Sqoop imports HCatalog tables.

EMR-5.4.2

Service

Change

SmartData

  • SmartData is updated to 3.8.0. For more information, see SmartData 3.8.X overview.

  • Authentication and authorization based on Kerberos and Ranger can be used to manage permissions on data in OSS.

EMR-5.4.1

Service

Change

SmartData

Upgrades SmartData to version 3.7.3. For more information, see Introduction to SmartData 3.7.x.

Oozie

Fixes an issue where the Jetty server for Oozie fails to start due to a JAR package conflict in an HA environment.

Impala

Fixes a no such method error that occurs when querying DLF metadata tables.

DLF-Auth

Upgrades DLF-Auth to version 1.0.1.

EMR-5.4.0

Service

Change

SmartData

Upgrades SmartData to version 3.7.2. For more information, see Introduction to SmartData 3.7.x.

Spark

  • Upgrades Spark to version 3.1.2.

  • In Spark 3.x, the Distinct computing performance is optimized for Spark SQL. The optimization feature is triggered if an aggregation operator contains multiple count(distinct case ... when ...) methods.

  • The array-index out of bounds error that is returned when some required statistics for Adaptive Query Execution (AQE) are missing is fixed.

  • Errors related to AQE and data caching in specific scenarios are fixed.

Hive

The batch metadata optimization feature is supported for Hive on JindoFS (Block). This feature is disabled by default.

Presto

Delta tables support StorageHandler queries.

DeltaLake

  • Upgrades DeltaLake to version 1.0.0.

  • Unifies delta-connectors for Hive 2 and Hive 3.

  • Fixes an error that occurs when delta-connectors query multi-level partitioned tables.

  • Supports SQL syntax for multiple features, such as DataSkipping, Optimize, and Zorder.

  • Supports synchronizing metadata to the MetaStore.

Hudi

  • Hudi is updated to 0.9.0.

  • The issue about the compatibility of sql.extension between Delta Lake and Hudi is fixed.

Note

Supports Spark 3.1.2.

HDFS

The default parameter for NameNode reserved capacity is automatically increased. This ensures that the NameNode promptly enters safe mode when disk space is insufficient.

Storm

The component is offline.

Zeppelin

Upgrades Zeppelin to community version 0.10.0.

Hue

  • Fixes an issue where the YARN Job Browser fails to display or stop jobs in some cases.

  • Enables the YARN Job Browser in the default configurations.

  • Supports the Presto protocol in the default configurations.

Druid

The following issue is fixed: After a server is unexpectedly shut down, the related node fails to restart because a PID file is not deleted.

ClickHouse

  • Updates the default configurations.

  • Supports cluster scale-out.

  • Supports the MetaChecker feature.

  • Supports reading data using the OSS table engine and OSS table function.

Iceberg

  • Upgrades Iceberg to version 0.12.0-1.0.1.

  • Fixes an error with Hive Runtime dependencies.

Knox

The issue that the first access to the Spark UI fails is fixed.

DLF-Auth

The service is added.

The permissions of using Hive or Spark to access DLF can be configured. The service version is 1.0.0.

EMR-5.3.x

Release dates

Version

Date

EMR-5.3.1

September 2021

EMR-5.3.0 (New purchases not supported)

August 2021

Updates

EMR-5.3.1

Service

Change

SmartData

Upgraded SmartData to version 3.7.1.

Hue

Fixed an issue where Impala could not be used in high-security clusters.

Kudu

Added support for Kerberos.

HBase

  • Fixed an issue where restarting HBase in high-security clusters took too long.

  • Fixed an issue where the integration of Spark 3.1.1 with HBase failed.

  • Optimized the graceful stop process.

EMR-5.3.0

Service

Changes

SmartData

Upgraded SmartData to version 3.7.0.

Spark

Fixed a compatibility issue with Delta Lake.

Hive

Hive on JindoFS (Block mode) supports the batch metadata optimization feature.

This feature is disabled by default.

DeltaLake

  • Added support for the DeltaLake partition feature.

  • Fixed a compatibility issue between the desc detail command and Spark version 3.1.1.

YARN

  • Added appId, CPU, and Memory resource usage information to the node Containers REST API.

  • Fixed an issue where ApplicationMaster (AM) logs on released Auto Scaling nodes could not be viewed.

  • Fixed an issue where historical data in the State Store caused the cluster to become unavailable.

  • Added support for cleaning up released nodes after they are decommissioned by Auto Scaling.

  • Improved the graceful decommission logic for Auto Scaling. A node is now marked as offline only after the NodeManager (NM) process ends.

Zookeeper

Upgraded to community version 3.6.3.

Flink

  • Added the SmartData component.

  • Fixed an issue where password-free access to OSS was not possible when you submit jobs to a DataFlow-Flink cluster using Secure Shell (SSH).

Impala

Fixed an issue where deleting an OSS partition directory directly caused a directory listing loop.

Hue

Fixed a user interface (UI) display issue that occurred when Hue was integrated with Oozie.

Kudu

Upgraded to community version 1.14.0.

Clickhouse

The component version is 21.3.13.9.

Iceberg

Added the Iceberg component. The component version is 0.12.0.

EMR-5.2.x

Release date

EMR-5.2.1 July 16, 2021

What's new

Service

Change

SmartData

Upgraded SmartData to version 3.6.1. For more information, see Introduction to SmartData 3.6.x.

Hive

  • Fixed an issue where the show create table command returned incorrect results when Data Lake Formation (DLF) metadata was used.

  • Optimized default Hive parameters to improve job performance.

  • Changed the names of configuration items on the hive-env tab of the Hive service Configuration page in the E-MapReduce console to uppercase for ease of use.

  • Fixed a memory leak in HiveServer2 caused by a User-Defined Function (UDF).

  • Improved the error message that is displayed when you write data to a Hive table if the file system is inconsistent with the MetaStore.

HDFS

Added support for the Zstandard (ZSTD) compression format.

Delta Lake

  • Upgraded Delta Lake to version 0.8.0.

  • Added support for Spark 3.

Flink

Upgraded Flink to version 1.12-vvr-3.0.2.

Hudi

  • Upgraded Hudi to version 0.8.0.

  • Added support for integration with Spark SQL.

Spark

Important

Spark 3.1.1 in EMR-5.2.1 is not compatible with Kudu 1.11.1.

  • Added support for the Delta Lake and Hudi data lake formats.

  • Added support for Remote Shuffle Service.

  • Added support for Livy.

  • Optimized the names of configuration items on the spark-defaults tab of the Spark service Configuration page in the E-MapReduce console.

  • Optimized features such as Cost-Based Optimization (CBO), Dynamic Partition Pruning (DPP), and Z-Order. Performance is improved by 50% compared with the open source Spark 3 version.

  • Added support for data sources such as Alibaba Cloud Log Service, DataHub, and Message Queue for Apache RocketMQ (ONS).

Tez

Optimized default Tez parameters to improve job performance.

Ranger

  • Fixed a warning error in Spark logs that occurred when Ranger was enabled.

  • Fixed an issue where automatic user synchronization failed after connecting to LDAP.

Knox

  • Added support for the Kudu component.

  • Added support for the Hbase component.

Kafka

  • Added support for the Cruise Control component to provide a balancing feature for Kafka clusters.

  • Introduced a hot-swapping feature for Kafka disks. You can replace faulty disks without stopping or starting a broker.

  • Modified the default values of some parameters.

Phoenix

Fixed an issue where a "JDBC Driver not found" error was reported when Hive and Spark SQL were used to access Phoenix tables.

ESS (EMR Remote Shuffle Service)

Added support for Spark 3.