Release notes (EMR 5.x series) - E-MapReduce - Alibaba Cloud Documentation Center

This topic describes the release dates and updates for the EMR 5.x series. For more information about the components supported in each version, see Distributions.

EMR-5.21.x

Release date

Version	Date
EMR-5.21.0	October 27, 2025

Updates

Service	Change
Hive	Adds a profile mechanism. This mechanism automatically detects the file format of data lake storage, such as ORC, and applies the optimized buffer and pre-read parameters of JindoSDK. Introduces an ORC stripe prefetch mechanism. This mechanism enables parallel computing and I/O operations when you process medium to large ORC files. It asynchronously prefetches subsequent stripes while processing the current stripe to improve throughput. Supports ORC vectorized read. When you read index data from ORC files or perform predicate pushdown, many scattered and non-consecutive file ranges are generated. Vectorized read sends batch requests to significantly improve throughput. Integrates the JindoSDK batch metadata API. This API processes metadata requests, such as getFileStatus, in batches to improve metadata request throughput.
Spark	Adds a profile mechanism. This mechanism automatically detects the file format of data lake storage, such as ORC, and applies the optimized buffer and pre-read parameters of JindoSDK. Introduces an ORC stripe prefetch mechanism. This mechanism enables parallel computing and I/O operations when you process medium to large ORC files. It asynchronously prefetches subsequent stripes while processing the current stripe to improve throughput. Supports parallel pre-open for small files. This feature automatically detects small file query scenarios and pre-opens a batch of files in parallel. This greatly reduces I/O latency caused by frequent open operations. Supports ORC vectorized read. When you read index data from ORC files or perform predicate pushdown, many scattered and non-consecutive file ranges are generated. Vectorized read sends batch requests to significantly improve throughput.
Tez	Supports parallel pre-open for small files. This feature automatically detects small file query scenarios and pre-opens a batch of files in parallel. This greatly reduces I/O latency caused by frequent open operations.
Ranger	Jindoauth Server supports custom RAM roles for client users to access OSS. Fixes a missing dependency issue in the Ranger-yarn-plugin.
Paimon	Upgraded to version 1-ali-16.3.
JindoCache	Upgraded to version 6.10.1.
Deltalake	Added the component. The version is 3.2.1.

Release version information

DataLake cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
Hive	3.1.3
Spark2	2.4.8
Spark3	3.5.3
Tez	0.10.2
Trino	422
Deltalake	3.2.1
Hudi	0.15.0
Iceberg	1.5.0
Flume	1.11.0
Kyuubi	1.9.2
YARN	3.2.1
OpenLDAP	2.4.46
Ranger	2.3.0
Ranger-plugin	1.0.0
DLF-Auth	2.0.2
Presto	0.283
Zookeeper	3.8.4
Sqoop	1.4.7
Knox	1.5.0
Celeborn	0.5.2
JindoCache	6.10.1
Paimon	1-ali-16.3

OLAP clusters

Service	Version
StarRocks2	2.5.22
StarRocks3	3.2.11
Doris	2.1.4
ClickHouse	23.3.13.6
Zookeeper	3.8.4

DataFlow cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
YARN	3.2.1
OpenLDAP	2.4.46
Zookeeper	3.8.4
Knox	1.5.0
Flink	1.17.2
Paimon	1-ali-6.2

DataServing cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
OpenLDAP	2.4.46
Ranger	2.3.0
Ranger-plugin	1.0.0
Zookeeper	3.8.4
Knox	1.5.0
HBase	2.6.3
JindoCache	6.8.2
Phoenix	5.2.1

Custom cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
Hive	3.1.3
Spark2	2.4.8
Spark3	3.5.3
Tez	0.10.2
Trino	422
Deltalake	3.2.1
Hudi	0.15.0
Iceberg	1.5.0
Flume	1.11.0
Kyuubi	1.9.2
YARN	3.2.1
OpenLDAP	2.4.46
Ranger	2.3.0
Ranger-plugin	1.0.0
DLF-Auth	2.0.2
Presto	0.283
StarRocks2	2.5.22
StarRocks3	3.2.11
Zookeeper	3.8.4
Sqoop	1.4.7
Knox	1.5.0
Celeborn	0.5.2
Flink	1.17.2
HBase	2.6.3
JindoCache	6.10.1
Paimon	1-ali-16.3
Phoenix	5.2.1

EMR-5.20.x

Release date

Version	Date
EMR-5.20.0	July 10, 2025

Updates

Service	Change
Hive	Optimizes the performance of adding fields to partitioned tables.
YARN	Optimizes global scheduling performance to prevent certain application behaviors from degrading cluster scheduling performance.

Release version information

DataLake cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
Hive	3.1.3
Spark2	2.4.8
Spark3	3.5.3
Tez	0.10.2
Trino	422
Hudi	0.15.0
Iceberg	1.5.0
Flume	1.11.0
Kyuubi	1.9.2
YARN	3.2.1
OpenLDAP	2.4.46
Ranger	2.3.0
Ranger-plugin	1.0.0
DLF-Auth	2.0.2
Presto	0.283
Zookeeper	3.8.4
Sqoop	1.4.7
Knox	1.5.0
Celeborn	0.5.2
JindoCache	6.8.2
Paimon	1-ali-6.2

OLAP Clusters

Service	Version
StarRocks2	2.5.22
StarRocks3	3.2.11
Doris	2.1.4
ClickHouse	23.3.13.6
Zookeeper	3.8.4

DataFlow cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
YARN	3.2.1
OpenLDAP	2.4.46
Zookeeper	3.8.4
Knox	1.5.0
Flink	1.17.2
Paimon	1-ali-6.2

DataServing cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
OpenLDAP	2.4.46
Ranger	2.3.0
Ranger-plugin	1.0.0
Zookeeper	3.8.4
Knox	1.5.0
HBase	2.6.3
JindoCache	6.8.2
Phoenix	5.2.1

Custom cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
Hive	3.1.3
Spark2	2.4.8
Spark3	3.5.3
Tez	0.10.2
Trino	422
Hudi	0.15.0
Iceberg	1.5.0
Flume	1.11.0
Kyuubi	1.9.2
YARN	3.2.1
OpenLDAP	2.4.46
Ranger	2.3.0
Ranger-plugin	1.0.0
DLF-Auth	2.0.2
Presto	0.283
StarRocks2	2.5.22
StarRocks3	3.2.11
Zookeeper	3.8.4
Sqoop	1.4.7
Knox	1.5.0
Celeborn	0.5.2
Flink	1.17.2
HBase	2.6.3
JindoCache	6.8.2
Paimon	1-ali-6.2
Phoenix	5.2.1

EMR-5.19.x

Release date

Version	Date
EMR-5.19.0	April 24, 2025

Updates

Service	Change
Trino	Fixes an issue where LDAP is unavailable.
YARN	Improves resource allocation efficiency through global scheduling optimization. Adds metric monitoring for HTTP services. Fixes an open source bug (YARN-10213).
HBase	Upgraded to version 2.6.3. Changes the default runtime environment to Java 11. Changes the default garbage collector to G1.
Phoenix	Upgraded to version 5.2.1.
JindoCache	Upgraded to version 6.8.2.
StarRocks	Supports the creation of clusters with decoupled storage and compute.
EMRHOOK	Adds support for Spark 3.5. Supports data lineage tracking for Paimon tables. Enhances stability.

Release version information

DataLake cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
Hive	3.1.3
Spark2	2.4.8
Spark3	3.5.3
Tez	0.10.2
Trino	422
Hudi	0.15.0
Iceberg	1.5.0
Flume	1.11.0
Kyuubi	1.9.2
YARN	3.2.1
OpenLDAP	2.4.46
Ranger	2.3.0
Ranger-plugin	1.0.0
DLF-Auth	2.0.2
Presto	0.283
Zookeeper	3.8.4
Sqoop	1.4.7
Knox	1.5.0
Celeborn	0.5.2
JindoCache	6.8.2
Paimon	1-ali-6.2

OLAP clusters

Service	Version
StarRocks2	2.5.22
StarRocks3	3.2.11
Doris	2.1.4
ClickHouse	23.3.13.6
Zookeeper	3.8.4

DataFlow cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
YARN	3.2.1
OpenLDAP	2.4.46
Zookeeper	3.8.4
Knox	1.5.0
Flink	1.17.2
Paimon	1-ali-6.2

DataServing cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
OpenLDAP	2.4.46
Ranger	2.3.0
Ranger-plugin	1.0.0
Zookeeper	3.8.4
Knox	1.5.0
HBase	2.6.3
JindoCache	6.8.2
Phoenix	5.2.1

Custom cluster

Service	Version
Hadoop-Common	3.2.1
HDFS	3.2.1
OSS-HDFS	1.0.0
Hive	3.1.3
Spark2	2.4.8
Spark3	3.5.3
Tez	0.10.2
Trino	422
Hudi	0.15.0
Iceberg	1.5.0
Flume	1.11.0
Kyuubi	1.9.2
YARN	3.2.1
OpenLDAP	2.4.46
Ranger	2.3.0
Ranger-plugin	1.0.0
DLF-Auth	2.0.2
Presto	0.283
StarRocks2	2.5.22
StarRocks3	3.2.11
Zookeeper	3.8.4
Sqoop	1.4.7
Knox	1.5.0
Celeborn	0.5.2
Flink	1.17.2
HBase	2.6.3
JindoCache	6.8.2
Paimon	1-ali-6.2
Phoenix	5.2.1

EMR-5.18.x

Release dates

Version	Date
EMR-5.18.1	December 18, 2024
EMR-5.18.0 (New purchases not supported)	December 4, 2024

Updates

Service	Change
Spark3	Upgraded to version 3.5.3. Fixes a configuration issue that occurs during Spark scale-out.
Trino	Fixes an issue where connections fail after LDAP is enabled.
Presto
Zookeeper	Supports adding custom configurations.
Ranger	Replaces the existing Spark 3 Ranger plugin with the version provided by the open source Kyuubi project.
Hudi	Upgraded to version 0.15.0.
Celeborn	Upgraded to version 0.5.2.
Paimon	Upgraded to version 1.0-ali-1.
JindoCache	Upgraded to version 6.5.3.
StarRocks3	Upgraded to version 3.2.11.
StarRocks2	Upgraded to version 2.5.22.
Impala	The service is offline. Use a recommended service as an alternative, or install the corresponding service yourself. For Impala, use Presto, Trino, ClickHouse, or StarRocks as an alternative.
Kudu
Kafka
Kafka-Manager

EMR-5.17.x

Release dates

Version	Date
EMR-5.17.4	December 18, 2024
EMR-5.17.3 (New purchases not supported)	November 29, 2024
EMR-5.17.2 (New purchases not supported)	August 29, 2024
EMR-5.17.1 (New purchases not supported)	June 21, 2024
EMR-5.17.0 (New purchases not supported)	April 23, 2024

Updates

EMR-5.17.4

Service	Change
JindoCache	Upgraded to version 6.5.3.
StarRocks2	Upgraded to version 2.5.22.
StarRocks3	Upgraded to version 3.2.11.

EMR-5.17.3

Service	Change
JindoSDK	Upgrades JindoSDK to resolve a deadlock issue.

EMR-5.17.2

Service	Change
JindoCache	Upgraded to version 6.5.1. Improves the read and write performance of Distributed Hash Table (DHT).
Spark	Fixes an issue where partition folders cannot be deleted. Fixes a Hive package dependency issue to ensure that client operations do not interrupt the connection to metaStoreClient.
Trino	Fixes an issue where some modified configurations might be unexpectedly restored during scale-out. Supports querying data on high-security OSS-HDFS. Fixes a service abnormality issue that occurs after DLF-AUTH is enabled.
Presto	Supports querying data on high-security OSS-HDFS.
HDFS	Fixes an issue where the memory of NameNode and DataNode cannot be modified.
HBaseHDFS
YARN	ResourceManager supports sending timeline events in batches to improve processing capabilities. Fixes a logic issue in container and resource processing in ResourceManager.
Zookeeper	Fixes an issue where the memory configuration of a node group cannot be modified. Supports refactoring the log configuration file.
Impala	Fixes an issue where customer configurations are modified during elastic scaling.
Ranger	Supports the new JindoSDK kernel to effectively reduce CPU utilization.
Knox	Fixes an issue where component URL access fails when there is only one Master-Extend node.
Kafka	Fixes a startup issue with Kafka Connect clusters.
StarRocks	Fixes an issue where new BE nodes are not visible after a scale-out.
Doris	Upgraded to version 2.1.4.
Paimon	Upgraded to version 0.9-ali-7.
EMRHOOK	Supports parsing data lineage for MaxCompute tables.

EMR-5.17.1

Service	Change
Spark	Supports deploying Master-Extend node groups.
Hive
Kyuubi
Paimon	Replaces the Flink dependency from the VVR version with the community version and supports DLF Catalog.
Knox	Uses JDK 8 for packaging.
Flink	Restores the DLF configurations and dependencies that were removed in EMR-5.17.0.

EMR-5.17.0

Service	Change
Spark	Spark3 is upgraded to version 3.4.2.
Celeborn	Upgraded to version 0.4.0.
Doris	Upgraded to version 2.1.0.
StarRocks	StarRocks2 is upgraded to version 2.5.18. StarRocks3 is upgraded to version 3.2.4.
DeltaLake	Upgraded to version 3.0.0.
Iceberg	Upgraded to version 1.5.0.
Zookeeper	Upgraded to version 3.8.4.
JindoCache	Upgraded to version 6.2.5.
Flink	Upgraded to version 1.17.2.

EMR-5.16.x

Release date

Version	Date
EMR-5.16.0	February 19, 2024

Updates

Service	Change
Hudi	Upgraded to version 0.14.0.
Flume	Upgraded to version 1.11.0.
Kyuubi	Upgraded to version 1.7.3.
Impala	Upgraded to version 4.3.0.
Celeborn	Upgraded to version 0.3.2.
JindoCache	Upgraded to version 6.2.0.
Paimon	Upgraded to version 0.7-ali-1.
Kafka	Upgraded to version 3.6.1.
StarRocks	StarRocks2 is upgraded to version 2.5.13. StarRocks3 is upgraded to version 3.1.5.
Spark	Fixes the Commons Text vulnerability.
Ranger	Vulnerabilities in the Commons Text library are fixed. The path matching permission bypass vulnerability in the Spring Security framework is fixed. The forward/include authentication bypass vulnerability in the Spring Security framework is fixed. The identity authentication bypass vulnerability in a special matching mode in Spring Framework is fixed. The interval at which Ranger obtains user information from the Lightweight Directory Access Protocol (LDAP) server and updates the user information can be modified.

EMR-5.15.x

Release dates

Version	Date
EMR-5.15.1	November 16, 2023
EMR-5.15.0 (New purchases not supported)	October 27, 2023

Updates

Service	Change
JindoCache	Adds the service. The version is 6.1.1.
JindoData	JindoData cannot be selected. Use the new JindoCache service for caching and the DLF-Auth service for authentication.
Spark	Removes `jdo`-related configurations from hive-site.xml.
HBase	Adds a configuration item to let you select the HBase Thrift Server version, v1 or v2, as needed.
StarRocks	Upgrades StarRocks2 to version 2.5.10.
Doris	Upgrades Doris to version 1.2.7.
Celeborn	Upgrades Celeborn to version 0.3.1.
Paimon	Upgrades Paimon to version 0.6-ali-2.
ClickHouse	Upgrades ClickHouse to version 23.3.13.6.

EMR-5.14.x

Release date

Version	Date
EMR-5.14.2	August 17, 2023

Updates

Service	Change
Trino	Fixes an issue where the Paimon connector fails to query HDFS tables. Fixes an issue where worker monitoring metrics cannot be read.
Presto	Upgraded to version 0.283. Fixes an issue where worker monitoring metrics cannot be read.
ClickHouse	Grants all permissions to the default user by default.
StarRocks	Renames the previous StarRocks version to StarRocks2. Adds StarRocks3. The version is 3.1.2. By default, clusters are created in coupled storage and compute mode. Decoupled storage and compute mode is not supported.
Celeborn	Upgraded to version 0.3.0.

EMR-5.13.x

Release date

Version	Date
EMR-5.13.0	August 3, 2023

Updates

Service	Change
Hudi	Upgraded to version 0.13.1.
Paimon	Upgraded to version 0.5-ali-1.
StarRocks	Upgraded to version 2.5.8.
JindoData	Upgraded to version 4.6.11.
Trino	Upgraded to version 422. The Hudi connector supports querying Merge On Read (MOR) tables. Improves the error message for dynamic UDF loading.

EMR-5.12.x

Release dates

Version	Date
EMR-5.12.1	July 13, 2023
EMR-5.12.0 (New purchases not supported)	June 1, 2023

Updates

EMR-5.12.1

Service	Change
Spark	Spark History Server supports using OSS-HDFS for storage by default. The Spark 3 native engine supports using OSS and OSS-HDFS for storage.
Hive	Hive warehouse supports using OSS-HDFS for storage by default.
OSS-HDFS	Adds the service.
YARN	Supports using OSS-HDFS for storage by default.
HBase	HBase HFile data supports using OSS-HDFS for storage by default. HBase WAL logs support using OSS-HDFS for storage.

EMR-5.12.0

Service	Change
Kyuubi	Upgraded to version 1.7.1.
Celeborn	Upgraded to version 0.2.2.
Paimon	Flink-Table-Store is renamed to Paimon. Upgraded to version 0.4-ali-1.
StarRocks	Upgraded to version 2.5.5.
Doris	Upgraded to version 1.2.4.
ClickHouse	Upgraded to version 23.3.2.37.
Trino	Provides a simple event listener by default to obtain audit logs.
Phoenix	Supports Hive on Phoenix.

EMR-5.11.x

Release dates

Version	Date
EMR-5.11.1	April 3, 2023
EMR-5.11.0 (New purchases not supported)	February 28, 2023

Updates

EMR-5.11.1

Service	Change
ClickHouse	Upgraded to version 22.8.14.53.
Trino	Adds the odps.properties connector to support queries on MaxCompute.
JindoData	Upgraded to version 4.6.5.
JindoSDK	Upgraded to version 4.6.5.
Flink-Table-Store	Upgraded to version 0.3-ali-2.
YARN	Supports Node Labels management.

EMR-5.11.0

Service	Change
Iceberg	Upgraded to version 1.1.0.
Hudi	Upgraded to version 0.12.2. Supports CDC.
DeltaLake	Upgraded to version 2.2.0. Supports recording Vacuum operations in the transaction log.
Kudu	Upgraded to version 1.16.0.
Clickhouse	The ZooKeeper service must be selected when you install the ClickHouse service.
Celeborn	RSS is renamed to Celeborn. The version of Celeborn is 0.2.0.
Presto	Adds the service. The kernel is community Facebook PrestoDB 0.278.3. The default HTTP port is 8889, and the default HTTPS port is 7779.
StarRocks	Upgraded to version 2.5.1.
Doris	Upgraded to version 1.2.1.
Kafka-Manager	Upgraded to version 3.0.0.6.
Impala	Upgraded to version 4.2.0.
OpenLDAP	Upgraded to version 2.4.46.
HBase	Supports JDK 11. Supports ThriftServer2. The default value of the hbase.block.data.cachecompressed parameter is changed to true.
Flink-Table-Store	Adds the service. The version is based on community version 0.3.
JindoData	Upgraded to version 4.6.4.

EMR-5.10.x

Release date

EMR-5.10.0 December 1, 2022

Updates

Service	Change
Iceberg	Upgraded to version 0.14.1.
Flink	Upgraded to Flink 1.15-vvr-6.0.2, which corresponds to the community Flink 1.15 major version.
Kafka	Supports LDAP user logon authentication and authorization. Supports user group authorization.
Trino	EMR Presto is renamed to its official community name, Trino. Supports Ranger and DLF AUTH. Fixes an issue where connections to worker nodes fail after one-click LDAP enablement.
JindoSDK	Upgraded to version 4.6.2.
JindoData	Upgraded to version 4.6.2.
HBase	Supports Ranger. Fixes an issue where OSS-HDFS cannot be selected as the storage mode when adding the service.
YARN	ACLs are enabled by default in high-security mode.
Starrocks	Upgraded to version 2.4.1.
Doris	Upgraded to version 1.1.5.
Hudi	The console supports configuring hudi-defaults.conf.
Ranger	Upgraded to version 2.3.0. Supports integration with Trino, YARN, HBase, and Kafka.
DLF-Auth	Upgraded to version 2.0.2. Supports Trino and Impala.
OpenLDAP	Integrates with the Nslcd component.
Kudu	Kudu Tserver can no longer be installed in Task node groups.
Spark	Upgraded to version 3.3.1.
Tez	Upgraded to version 0.10.2.
Kyuubi	Upgraded to version 1.6.0.

EMR-5.9.x

Release dates

Version	Date
EMR-5.9.1	November 08, 2022
EMR-5.9.0 (New purchases not supported)	October 14, 2022

Updates

EMR-5.9.1

Service	Change
Kerberos	Supports connecting to an external KDC on EMR.
Kafka	Adds a configuration item for startup commands that allows users to customize the startup parameters for the service.
JindoData	Upgraded to version 4.6.0. Supports rewriting OSS-HDFS access paths.
Flink	Upgraded to version 1.13_vvr_4.0.15.
RSS	Upgraded to version 0.1.4.

EMR-5.9.0

Service	Change
Spark	Upgraded to version 3.3. Supports enabling Kerberos authentication.
Hudi	Upgraded to version 0.12.0. Supports Spark 3.3. Supports using a cloud-based MetaStore to host metadata and enabling the acceleration feature. For more information, see Instructions on how to use Hudi MetaStore.
Flink	Supports enabling Kerberos authentication. Supports automatic connection with Data Lake Formation (DLF).
Iceberg	Upgraded to version 0.14.0. Supports Spark 3.3. Supports enabling Kerberos authentication.
JindoData	Upgraded to version 4.5.1. Supports AccessKey-free access to Alibaba Cloud resources.
Hadoop-Common and HDFS	Supports enabling Kerberos authentication. Fixes security vulnerability CVE-2022-25168.
Knox	Integrates with Ranger. You can access the Ranger UI from the Access Links And Ports tab.
HBase	Upgraded to version 2.4.9. Supports enabling Kerberos authentication. Supports group configuration.
RSS	Upgraded to version 0.1.2. Supports enabling Kerberos authentication.
Doris	Upgraded to version 1.1.2. Supports enabling Kerberos authentication.
StarRocks	Upgraded to version 2.3.2. Supports enabling Kerberos authentication.
Kafka	Upgraded to version 2.13_3.2.1. Supports enabling Kerberos authentication.
DeltaLake	Supports upgrading to version 2.1.0. Supports Spark 3.3. Supports enabling Kerberos authentication.
Impala	Supports creating views in DLF. Supports enabling Kerberos authentication.
Kudu	Adds the component. The version is 1.14.0.
YARN, Ranger, Hive, Kyuubi, Tez, Zookeeper, DLF-Auth, Phoenix, Sqoop, and Presto	Support enabling Kerberos authentication.

EMR-5.8.x

Release date

EMR-5.8.0 August 5, 2022

Updates

Service	Change
Spark	Supports one-click integration with LDAP.
Hive	Supports one-click integration with LDAP.
Presto	Upgraded to community version 389. Uses the standalone Delta Lake and Hudi connectors provided by the community. The Delta Lake connector in this version does not support Time Travel and Z-Order. The Hudi connector in this version does not support querying MOR tables. Supports one-click integration with LDAP.
DeltaLake	Integrates with DLF for automated lake table management. Fixes an issue where partition information cannot be automatically synchronized in CTAS scenarios. The optimize and vacuum commands support returning metric information.
Hudi	Upgraded to version 0.11.1.
HadoopCommon	Adds the component. This resolves the issue where HDFS, YARN, and JindoSDK configurations overwrite each other.
YARN	Enhances the elastic scaling feature.
Ranger	Supports both Spark 2 and Spark 3. Ranger Usersync supports one-click integration with LDAP.
Kafka	Adds the component. The version is 2.12-2.4.1.
HBase	Adds the component. The version is 2.3.4.
Phoenix	Adds the component. The version is 5.1.2.
Doris	Upgraded to version 1.1.1.
StarRocks	Upgraded to version 2.3.0. The primary key model supports the complete `DELETE WHERE` syntax and persistence of the primary key index to reduce memory usage.
ClickHouse	Upgraded to version 22.3.8.39. Fixes an out-of-memory issue when reading large files from OSS.

EMR-5.6.x

Release date

EMR-5.6.0 April 21, 2022

Updates

Service	Change
JindoData	Adds the component. The version is 4.3.0.
JindoSDK	Upgraded to version 4.3.0.
Spark	Upgraded to version 3.2.1.
Hive	Fixes a bug where commits are repeated after Speculation is enabled in Tez.
Presto	Fixes a bug where the Presto service fails to start after it is added to a Hadoop cluster that has been initialized.
DeltaLake	DML supports subqueries.
Hudi	Upgraded to version 0.10.1.
Iceberg	Upgraded to version 0.13.1.
YARN	Adds a feature configuration to restrict ApplicationMasters (AMs) to run only on CORE group nodes.
HBase	Fixes a bug in the HBase 2.3.4 kernel.
Zookeeper	Optimizes JVM parameter configurations.
Impala	Adapts to JindoSDK 4.3.0.
Sqoop	Upgrades the PostgreSQL version.
Zeppelin	Fixes an issue where the JDBC Interpreter fails to start.
Ranger	The Ranger 1.2.0 Spark plugin supports Delta and Hudi.
Flume	Adapts to JindoSDK 4.3.0.
Oozie	Upgrades Log4j to version 2.17.2.
DLF-Auth	Upgraded to version 2.0.0.

EMR-5.5.x

Release dates

Version	Date
EMR-5.5.1	March 25, 2022
EMR-5.5.0 (New purchases not supported)	February 15, 2022

Updates

EMR-5.5.1

Note

Only OLAP clusters in the new console support this version.

Service	Change
Clickhouse	Modifies the default values of some parameters.
StarRocks	Upgraded to version 2.1.1.

EMR-5.5.0

Service	Change
SmartData	The component is offline.
BIGBOOT	The component is offline.
RSS	Upgrades the ESS service to RSS. Enhances the features and stability of the service.
JindoSDK	Upgrades the architecture to JindoData. EMR integrates with JindoSDK 4.0 for the first time, supporting services such as OSS and OSS-HDFS.
Spark	The COUNT DISTINCT function supports IF statements and optimizes the usage of CASE WHEN. Set the spark.sql.optimizer.rewriteConditionalDistinctAggregates parameter to true. Shuffle Hash Join supports fallback to Sort Merge Join. Set the spark.sql.join.preferSortMergeJoin parameter to false, and set the spark.sql.join.enableShuffledHashJoinFallback parameter to true. Supports automatic merging of small files for non-dynamic partitions. Set the spark.sql.adaptive.merge.output.small.files.enabled parameter to true. The concurrency is automatically adjusted for scenarios such as GroupingSet and Distinct. Set the spark.sql.execution.optimizeExpand parameter to true. Optimizes Hive on Spark. Supports Time Travel syntax. Adapts to JindoSDK.
Tez	Adapts to JindoSDK.
Hive	Optimizes the batch deletion of Hive Jindo. Optimizes the HiveServer2 OOM issue. Optimizes Hive on Spark. Adapts to JindoSDK.
Presto	Upgrades Presto to community version 358. Adds MySQL, Iceberg, Hudi, Phoenix, Kudu, and Delta connectors by default and updates the default configurations. Supports data lake analytics. Supports dynamic UDF loading. Adapts to JindoSDK.
Delta Lake	Version upgrade Upgraded to version 1.1.0, which is compatible with Spark 3.2.0. All commercial features are migrated to version 1.1.0. Metadata management Optimizes the synchronization of metadata modifications to the metastore. Automatically reports table statistics (dataProfiling) to the metastore. SQL Supports Time Travel syntax. Supports DropPartition SQL syntax. Supports dynamic partition overwrites using SQL. Supports ADD COLUMN operations at specified positions (FIRST and AFTER). Table management enhancements Supports and enables dynamic adjustment of file sizes based on table sizes by default. Supports and enables automatic Vacuum by default. Concurrent Vacuum is also supported. Optimizes the logic for automatic compaction. This feature is disabled by default. Adds Z-order syntax and accelerates the Z-order process.
Hudi	Upgraded to version 0.10.0. Supports Spark 3.2.0. Supports JindoFS Block mode.
HDFS	Adapts to JindoSDK.
YARN	Adapts to RSS memory configurations. Adapts to JindoSDK.
Flume	Adapts to JindoSDK.
Impala	Adapts to JindoSDK.
Ranger	Supports Spark 3.2.0. Supports Presto 358.
HBase	Fixes issues with default parameters. Fixes an issue with the GC log date format.
Clickhouse	Adds HDFS and OSS disk types to support hot and cold data separation. For more information, see Use HDFS for hot and cold data separation and Use OSS for hot and cold data separation. In Replicated*MergeTree scenarios, Zero Copy is supported for OSS, HDFS, and S3 disk types. Optimizes the processing logic when the ClickHouse component is stopped.
Iceberg	Upgraded to version 0.13.0. Supports Presto 358.
DLF-Auth	Supports Spark 3.2.0. Supports Presto 358.

EMR-5.4.x

Release dates

Version	Date
EMR-5.4.3	December 2021
EMR-5.4.2 (New purchases not supported)	December 2021
EMR-5.4.1 (New purchases not supported)	November 2021
EMR-5.4.0 (New purchases not supported)	October 2021

Updates

EMR-5.4.3

This release fixes the Log4j security vulnerabilities in all related components. For more information, see Vulnerability Announcement | Apache Log4j2 Arbitrary Code Execution Vulnerability.

Service	Change
Presto	Fixes the Log4j vulnerability in the Elasticsearch connector.
DLF Metastore	Changes the default setting for Metastore logs from enabled to disabled. Fixes an error caused by an excessively long URI for Metastore gettablestats.
Delta Lake	Fixes an issue with synchronizing schema changes to the Metastore.
Sqoop	Fixes an issue where precision is lost for the Decimal type when Sqoop imports HCatalog tables.

EMR-5.4.2

Service	Change
SmartData	SmartData is updated to 3.8.0. For more information, see SmartData 3.8.X overview. Authentication and authorization based on Kerberos and Ranger can be used to manage permissions on data in OSS.

EMR-5.4.1

Service	Change
SmartData	Upgrades SmartData to version 3.7.3. For more information, see Introduction to SmartData 3.7.x.
Oozie	Fixes an issue where the Jetty server for Oozie fails to start due to a JAR package conflict in an HA environment.
Impala	Fixes a `no such method error` that occurs when querying DLF metadata tables.
DLF-Auth	Upgrades DLF-Auth to version 1.0.1.

EMR-5.4.0

Service	Change
SmartData	Upgrades SmartData to version 3.7.2. For more information, see Introduction to SmartData 3.7.x.
Spark	Upgrades Spark to version 3.1.2. In Spark 3.x, the Distinct computing performance is optimized for Spark SQL. The optimization feature is triggered if an aggregation operator contains multiple `count(distinct case ... when ...)` methods. The array-index out of bounds error that is returned when some required statistics for Adaptive Query Execution (AQE) are missing is fixed. Errors related to AQE and data caching in specific scenarios are fixed.
Hive	The batch metadata optimization feature is supported for Hive on JindoFS (Block). This feature is disabled by default.
Presto	Delta tables support StorageHandler queries.
DeltaLake	Upgrades DeltaLake to version 1.0.0. Unifies delta-connectors for Hive 2 and Hive 3. Fixes an error that occurs when delta-connectors query multi-level partitioned tables. Supports SQL syntax for multiple features, such as DataSkipping, Optimize, and Zorder. Supports synchronizing metadata to the MetaStore.
Hudi	Hudi is updated to 0.9.0. The issue about the compatibility of sql.extension between Delta Lake and Hudi is fixed. Note Supports Spark 3.1.2.
HDFS	The default parameter for NameNode reserved capacity is automatically increased. This ensures that the NameNode promptly enters safe mode when disk space is insufficient.
Storm	The component is offline.
Zeppelin	Upgrades Zeppelin to community version 0.10.0.
Hue	Fixes an issue where the YARN Job Browser fails to display or stop jobs in some cases. Enables the YARN Job Browser in the default configurations. Supports the Presto protocol in the default configurations.
Druid	The following issue is fixed: After a server is unexpectedly shut down, the related node fails to restart because a PID file is not deleted.
ClickHouse	Updates the default configurations. Supports cluster scale-out. Supports the MetaChecker feature. Supports reading data using the OSS table engine and OSS table function.
Iceberg	Upgrades Iceberg to version 0.12.0-1.0.1. Fixes an error with Hive Runtime dependencies.
Knox	The issue that the first access to the Spark UI fails is fixed.
DLF-Auth	The service is added. The permissions of using Hive or Spark to access DLF can be configured. The service version is 1.0.0.

EMR-5.3.x

Release dates

Version	Date
EMR-5.3.1	September 2021
EMR-5.3.0 (New purchases not supported)	August 2021

Updates

EMR-5.3.1

Service	Change
SmartData	Upgraded SmartData to version 3.7.1.
Hue	Fixed an issue where Impala could not be used in high-security clusters.
Kudu	Added support for Kerberos.
HBase	Fixed an issue where restarting HBase in high-security clusters took too long. Fixed an issue where the integration of Spark 3.1.1 with HBase failed. Optimized the graceful stop process.

EMR-5.3.0

Service	Changes
SmartData	Upgraded SmartData to version 3.7.0.
Spark	Fixed a compatibility issue with Delta Lake.
Hive	Hive on JindoFS (Block mode) supports the batch metadata optimization feature. This feature is disabled by default.
DeltaLake	Added support for the DeltaLake partition feature. Fixed a compatibility issue between the `desc detail` command and Spark version 3.1.1.
YARN	Added appId, CPU, and Memory resource usage information to the node Containers REST API. Fixed an issue where ApplicationMaster (AM) logs on released Auto Scaling nodes could not be viewed. Fixed an issue where historical data in the State Store caused the cluster to become unavailable. Added support for cleaning up released nodes after they are decommissioned by Auto Scaling. Improved the graceful decommission logic for Auto Scaling. A node is now marked as offline only after the NodeManager (NM) process ends.
Zookeeper	Upgraded to community version 3.6.3.
Flink	Added the SmartData component. Fixed an issue where password-free access to OSS was not possible when you submit jobs to a DataFlow-Flink cluster using Secure Shell (SSH).
Impala	Fixed an issue where deleting an OSS partition directory directly caused a directory listing loop.
Hue	Fixed a user interface (UI) display issue that occurred when Hue was integrated with Oozie.
Kudu	Upgraded to community version 1.14.0.
Clickhouse	The component version is 21.3.13.9.
Iceberg	Added the Iceberg component. The component version is 0.12.0.

EMR-5.2.x

Release date

EMR-5.2.1 July 16, 2021

What's new

Service	Change
SmartData	Upgraded SmartData to version 3.6.1. For more information, see Introduction to SmartData 3.6.x.
Hive	Fixed an issue where the `show create table` command returned incorrect results when Data Lake Formation (DLF) metadata was used. Optimized default Hive parameters to improve job performance. Changed the names of configuration items on the hive-env tab of the Hive service Configuration page in the E-MapReduce console to uppercase for ease of use. Fixed a memory leak in HiveServer2 caused by a User-Defined Function (UDF). Improved the error message that is displayed when you write data to a Hive table if the file system is inconsistent with the MetaStore.
HDFS	Added support for the Zstandard (ZSTD) compression format.
Delta Lake	Upgraded Delta Lake to version 0.8.0. Added support for Spark 3.
Flink	Upgraded Flink to version 1.12-vvr-3.0.2.
Hudi	Upgraded Hudi to version 0.8.0. Added support for integration with Spark SQL.
Spark	Important Spark 3.1.1 in EMR-5.2.1 is not compatible with Kudu 1.11.1. Added support for the Delta Lake and Hudi data lake formats. Added support for Remote Shuffle Service. Added support for Livy. Optimized the names of configuration items on the spark-defaults tab of the Spark service Configuration page in the E-MapReduce console. Optimized features such as Cost-Based Optimization (CBO), Dynamic Partition Pruning (DPP), and Z-Order. Performance is improved by 50% compared with the open source Spark 3 version. Added support for data sources such as Alibaba Cloud Log Service, DataHub, and Message Queue for Apache RocketMQ (ONS).
Tez	Optimized default Tez parameters to improve job performance.
Ranger	Fixed a warning error in Spark logs that occurred when Ranger was enabled. Fixed an issue where automatic user synchronization failed after connecting to LDAP.
Knox	Added support for the Kudu component. Added support for the Hbase component.
Kafka	Added support for the Cruise Control component to provide a balancing feature for Kafka clusters. Introduced a hot-swapping feature for Kafka disks. You can replace faulty disks without stopping or starting a broker. Modified the default values of some parameters.
Phoenix	Fixed an issue where a "JDBC Driver not found" error was reported when Hive and Spark SQL were used to access Phoenix tables.
ESS (EMR Remote Shuffle Service)	Added support for Spark 3.