All Products
Search
Document Center

E-MapReduce:Release notes for EMR 5.x series

Last Updated:Mar 26, 2026

This topic covers the release dates and component updates for E-MapReduce (EMR) 5.x. For the components supported in each version, see Distributions.

EMR-5.21.x

Release date

Version Date
EMR-5.21.0 October 27, 2025

Updates

Service Change
Hive Adds a profile mechanism that automatically detects the file format of data lake storage (such as ORC) and applies the optimized buffer and pre-read parameters of JindoSDK. Introduces an ORC stripe prefetch mechanism that enables parallel computing and I/O operations when you process medium to large ORC files — it asynchronously prefetches subsequent stripes while processing the current stripe to improve throughput. Supports ORC vectorized read: when reading index data from ORC files or performing predicate pushdown, many scattered and non-consecutive file ranges are generated; vectorized read sends batch requests to significantly improve throughput. Integrates the JindoSDK batch metadata API to process metadata requests (such as getFileStatus) in batches, improving metadata request throughput.
Spark Adds a profile mechanism that automatically detects the file format of data lake storage (such as ORC) and applies the optimized buffer and pre-read parameters of JindoSDK. Introduces an ORC stripe prefetch mechanism that enables parallel computing and I/O operations when you process medium to large ORC files — it asynchronously prefetches subsequent stripes while processing the current stripe to improve throughput. Supports parallel pre-open for small files: automatically detects small file query scenarios and pre-opens a batch of files in parallel, greatly reducing I/O latency caused by frequent open operations. Supports ORC vectorized read to significantly improve throughput through batch requests.
Tez Supports parallel pre-open for small files, automatically detecting small file query scenarios and pre-opening files in parallel to reduce I/O latency.
Ranger JindoAuth Server supports custom RAM roles for client users to access Object Storage Service (OSS). Fixes a missing dependency issue in the Ranger-yarn-plugin.
Paimon Upgraded to version 1-ali-16.3.
JindoCache Upgraded to version 6.10.1.
Delta Lake Added the component. Version: 3.2.1.

Release version information

DataLake cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
Hive 3.1.3
Spark2 2.4.8
Spark3 3.5.3
Tez 0.10.2
Trino 422
Delta Lake 3.2.1
Hudi 0.15.0
Iceberg 1.5.0
Flume 1.11.0
Kyuubi 1.9.2
YARN 3.2.1
OpenLDAP 2.4.46
Ranger 2.3.0
Ranger-plugin 1.0.0
DLF-Auth 2.0.2
Presto 0.283
ZooKeeper 3.8.4
Sqoop 1.4.7
Knox 1.5.0
Celeborn 0.5.2
JindoCache 6.10.1
Paimon 1-ali-16.3

OLAP clusters

Service Version
StarRocks2 2.5.22
StarRocks3 3.2.11
Doris 2.1.4
ClickHouse 23.3.13.6
ZooKeeper 3.8.4

DataFlow cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
YARN 3.2.1
OpenLDAP 2.4.46
ZooKeeper 3.8.4
Knox 1.5.0
Flink 1.17.2
Paimon 1-ali-6.2

DataServing cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
OpenLDAP 2.4.46
Ranger 2.3.0
Ranger-plugin 1.0.0
ZooKeeper 3.8.4
Knox 1.5.0
HBase 2.6.3
JindoCache 6.8.2
Phoenix 5.2.1

Custom cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
Hive 3.1.3
Spark2 2.4.8
Spark3 3.5.3
Tez 0.10.2
Trino 422
Delta Lake 3.2.1
Hudi 0.15.0
Iceberg 1.5.0
Flume 1.11.0
Kyuubi 1.9.2
YARN 3.2.1
OpenLDAP 2.4.46
Ranger 2.3.0
Ranger-plugin 1.0.0
DLF-Auth 2.0.2
Presto 0.283
StarRocks2 2.5.22
StarRocks3 3.2.11
ZooKeeper 3.8.4
Sqoop 1.4.7
Knox 1.5.0
Celeborn 0.5.2
Flink 1.17.2
HBase 2.6.3
JindoCache 6.10.1
Paimon 1-ali-16.3
Phoenix 5.2.1

EMR-5.20.x

Release date

Version Date
EMR-5.20.0 July 10, 2025

Updates

Service Change
Hive Optimizes the performance of adding fields to partitioned tables.
YARN Optimizes global scheduling performance to prevent certain application behaviors from degrading cluster scheduling performance.

Release version information

DataLake cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
Hive 3.1.3
Spark2 2.4.8
Spark3 3.5.3
Tez 0.10.2
Trino 422
Hudi 0.15.0
Iceberg 1.5.0
Flume 1.11.0
Kyuubi 1.9.2
YARN 3.2.1
OpenLDAP 2.4.46
Ranger 2.3.0
Ranger-plugin 1.0.0
DLF-Auth 2.0.2
Presto 0.283
ZooKeeper 3.8.4
Sqoop 1.4.7
Knox 1.5.0
Celeborn 0.5.2
JindoCache 6.8.2
Paimon 1-ali-6.2

OLAP Clusters

Service Version
StarRocks2 2.5.22
StarRocks3 3.2.11
Doris 2.1.4
ClickHouse 23.3.13.6
ZooKeeper 3.8.4

DataFlow cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
YARN 3.2.1
OpenLDAP 2.4.46
ZooKeeper 3.8.4
Knox 1.5.0
Flink 1.17.2
Paimon 1-ali-6.2

DataServing cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
OpenLDAP 2.4.46
Ranger 2.3.0
Ranger-plugin 1.0.0
ZooKeeper 3.8.4
Knox 1.5.0
HBase 2.6.3
JindoCache 6.8.2
Phoenix 5.2.1

Custom cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
Hive 3.1.3
Spark2 2.4.8
Spark3 3.5.3
Tez 0.10.2
Trino 422
Hudi 0.15.0
Iceberg 1.5.0
Flume 1.11.0
Kyuubi 1.9.2
YARN 3.2.1
OpenLDAP 2.4.46
Ranger 2.3.0
Ranger-plugin 1.0.0
DLF-Auth 2.0.2
Presto 0.283
StarRocks2 2.5.22
StarRocks3 3.2.11
ZooKeeper 3.8.4
Sqoop 1.4.7
Knox 1.5.0
Celeborn 0.5.2
Flink 1.17.2
HBase 2.6.3
JindoCache 6.8.2
Paimon 1-ali-6.2
Phoenix 5.2.1

EMR-5.19.x

Release date

Version Date
EMR-5.19.0 April 24, 2025

Updates

Service Change
Trino Fixes an issue where LDAP is unavailable.
YARN Improves resource allocation efficiency through global scheduling optimization. Adds metric monitoring for HTTP services. Fixes open source bug YARN-10213.
HBase Upgraded to version 2.6.3. Changes the default runtime environment to Java 11. Changes the default garbage collector to G1.
Phoenix Upgraded to version 5.2.1.
JindoCache Upgraded to version 6.8.2.
StarRocks Supports creating clusters with decoupled storage and compute.
EMRHOOK Adds support for Spark 3.5. Supports data lineage tracking for Paimon tables. Enhances stability.

Release version information

DataLake cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
Hive 3.1.3
Spark2 2.4.8
Spark3 3.5.3
Tez 0.10.2
Trino 422
Hudi 0.15.0
Iceberg 1.5.0
Flume 1.11.0
Kyuubi 1.9.2
YARN 3.2.1
OpenLDAP 2.4.46
Ranger 2.3.0
Ranger-plugin 1.0.0
DLF-Auth 2.0.2
Presto 0.283
ZooKeeper 3.8.4
Sqoop 1.4.7
Knox 1.5.0
Celeborn 0.5.2
JindoCache 6.8.2
Paimon 1-ali-6.2

OLAP clusters

Service Version
StarRocks2 2.5.22
StarRocks3 3.2.11
Doris 2.1.4
ClickHouse 23.3.13.6
ZooKeeper 3.8.4

DataFlow cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
YARN 3.2.1
OpenLDAP 2.4.46
ZooKeeper 3.8.4
Knox 1.5.0
Flink 1.17.2
Paimon 1-ali-6.2

DataServing cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
OpenLDAP 2.4.46
Ranger 2.3.0
Ranger-plugin 1.0.0
ZooKeeper 3.8.4
Knox 1.5.0
HBase 2.6.3
JindoCache 6.8.2
Phoenix 5.2.1

Custom cluster

Service Version
Hadoop-Common 3.2.1
HDFS 3.2.1
OSS-HDFS 1.0.0
Hive 3.1.3
Spark2 2.4.8
Spark3 3.5.3
Tez 0.10.2
Trino 422
Hudi 0.15.0
Iceberg 1.5.0
Flume 1.11.0
Kyuubi 1.9.2
YARN 3.2.1
OpenLDAP 2.4.46
Ranger 2.3.0
Ranger-plugin 1.0.0
DLF-Auth 2.0.2
Presto 0.283
StarRocks2 2.5.22
StarRocks3 3.2.11
ZooKeeper 3.8.4
Sqoop 1.4.7
Knox 1.5.0
Celeborn 0.5.2
Flink 1.17.2
HBase 2.6.3
JindoCache 6.8.2
Paimon 1-ali-6.2
Phoenix 5.2.1

EMR-5.18.x

Release dates

Version Date
EMR-5.18.1 December 18, 2024
EMR-5.18.0 (New purchases not supported) December 4, 2024

Updates

Service Change
Spark3 Upgraded to version 3.5.3. Fixes a configuration issue that occurs during Spark scale-out.
Trino Fixes an issue where connections fail after LDAP is enabled.
ZooKeeper Supports adding custom configurations.
Ranger Replaces the existing Spark 3 Ranger plugin with the version provided by the open source Kyuubi project.
Hudi Upgraded to version 0.15.0.
Celeborn Upgraded to version 0.5.2.
Paimon Upgraded to version 1.0-ali-1.
JindoCache Upgraded to version 6.5.3.
StarRocks3 Upgraded to version 3.2.11.
StarRocks2 Upgraded to version 2.5.22.
Impala The service is offline. Use Presto, Trino, ClickHouse, or StarRocks as an alternative, or install Impala yourself.
Kudu The service is offline.
Kafka The service is offline.
Kafka-Manager The service is offline.

EMR-5.17.x

Release dates

Version Date
EMR-5.17.4 December 18, 2024
EMR-5.17.3 (New purchases not supported) November 29, 2024
EMR-5.17.2 (New purchases not supported) August 29, 2024
EMR-5.17.1 (New purchases not supported) June 21, 2024
EMR-5.17.0 (New purchases not supported) April 23, 2024

Updates

EMR-5.17.4

Service Change
JindoCache Upgraded to version 6.5.3.
StarRocks2 Upgraded to version 2.5.22.
StarRocks3 Upgraded to version 3.2.11.

EMR-5.17.3

Service Change
JindoSDK Upgraded to fix a deadlock issue.

EMR-5.17.2

Service Change
JindoCache Upgraded to version 6.5.1. Improves the read and write performance of Distributed Hash Table (DHT).
Spark Fixes an issue where partition folders cannot be deleted. Fixes a Hive package dependency issue to ensure that client operations do not interrupt the connection to metaStoreClient.
Trino Fixes an issue where some modified configurations might be unexpectedly restored during scale-out. Supports querying data on high-security OSS-HDFS. Fixes a service abnormality that occurs after DLF-Auth is enabled.
Presto Supports querying data on high-security OSS-HDFS.
HDFS Fixes an issue where the memory of NameNode and DataNode cannot be modified.
YARN ResourceManager supports sending timeline events in batches to improve processing throughput. Fixes a logic issue in container and resource processing in ResourceManager.
ZooKeeper Fixes an issue where the memory configuration of a node group cannot be modified. Supports refactoring the log configuration file.
Impala Fixes an issue where customer configurations are modified during elastic scaling.
Ranger Supports the new JindoSDK kernel to reduce CPU utilization.
Knox Fixes an issue where component URL access fails when there is only one Master-Extend node.
Kafka Fixes a startup issue with Kafka Connect clusters.
StarRocks Fixes an issue where new BE nodes are not visible after a scale-out.
Doris Upgraded to version 2.1.4.
Paimon Upgraded to version 0.9-ali-7.
EMRHOOK Supports parsing data lineage for MaxCompute tables.

EMR-5.17.1

Service Change
Spark Supports deploying Master-Extend node groups.
Paimon Replaces the Flink dependency from the VVR version with the community version. Supports Data Lake Formation (DLF) Catalog.
Knox Uses JDK 8 for packaging.
Flink Restores the DLF configurations and dependencies that were removed in EMR-5.17.0.

EMR-5.17.0

Service Change
Spark Spark3 upgraded to version 3.4.2.
Celeborn Upgraded to version 0.4.0.
Doris Upgraded to version 2.1.0.
StarRocks StarRocks2 upgraded to version 2.5.18. StarRocks3 upgraded to version 3.2.4.
Delta Lake Upgraded to version 3.0.0.
Iceberg Upgraded to version 1.5.0.
ZooKeeper Upgraded to version 3.8.4.
JindoCache Upgraded to version 6.2.5.
Flink Upgraded to version 1.17.2.

EMR-5.16.x

Release date

Version Date
EMR-5.16.0 February 19, 2024

Updates

Service Change
Hudi Upgraded to version 0.14.0.
Flume Upgraded to version 1.11.0.
Kyuubi Upgraded to version 1.7.3.
Impala Upgraded to version 4.3.0.
Celeborn Upgraded to version 0.3.2.
JindoCache Upgraded to version 6.2.0.
Paimon Upgraded to version 0.7-ali-1.
Kafka Upgraded to version 3.6.1.
StarRocks StarRocks2 upgraded to version 2.5.13. StarRocks3 upgraded to version 3.1.5.
Spark Fixes the Commons Text vulnerability.
Ranger Fixes vulnerabilities in the Commons Text library. Fixes the path matching permission bypass vulnerability in the Spring Security framework. Fixes the forward/include authentication bypass vulnerability in the Spring Security framework. Fixes the identity authentication bypass vulnerability in a special matching mode in Spring Framework. The interval at which Ranger retrieves user information from the Lightweight Directory Access Protocol (LDAP) server and updates it can now be configured.

EMR-5.15.x

Release dates

Version Date
EMR-5.15.1 November 16, 2023
EMR-5.15.0 (New purchases not supported) October 27, 2023

Updates

Service Change
JindoCache Added the service. Version: 6.1.1.
JindoData JindoData cannot be selected. Use JindoCache for caching and DLF-Auth for authentication.
Spark Removes jdo-related configurations from hive-site.xml.
HBase Adds a configuration item to select the HBase Thrift Server version (v1 or v2).
StarRocks StarRocks2 upgraded to version 2.5.10.
Doris Upgraded to version 1.2.7.
Celeborn Upgraded to version 0.3.1.
Paimon Upgraded to version 0.6-ali-2.
ClickHouse Upgraded to version 23.3.13.6.

EMR-5.14.x

Release date

Version Date
EMR-5.14.2 August 17, 2023

Updates

Service Change
Trino Fixes an issue where the Paimon connector fails to query Hadoop Distributed File System (HDFS) tables. Fixes an issue where worker monitoring metrics cannot be read.
Presto Upgraded to version 0.283. Fixes an issue where worker monitoring metrics cannot be read.
ClickHouse Grants all permissions to the default user by default.
StarRocks Renames the previous StarRocks version to StarRocks2. Adds StarRocks3 at version 3.1.2 — clusters are created in coupled storage and compute mode by default; decoupled storage and compute mode is not supported.
Celeborn Upgraded to version 0.3.0.

EMR-5.13.x

Release date

Version Date
EMR-5.13.0 August 3, 2023

Updates

Service Change
Hudi Upgraded to version 0.13.1.
Paimon Upgraded to version 0.5-ali-1.
StarRocks Upgraded to version 2.5.8.
JindoData Upgraded to version 4.6.11.
Trino Upgraded to version 422. The Hudi connector supports querying Merge On Read (MOR) tables. Improves the error message for dynamic UDF loading.

EMR-5.12.x

Release dates

Version Date
EMR-5.12.1 July 13, 2023
EMR-5.12.0 (New purchases not supported) June 1, 2023

Updates

EMR-5.12.1

Service Change
Spark Spark History Server supports OSS-HDFS for storage by default. The Spark 3 native engine supports OSS and OSS-HDFS for storage.
Hive Hive warehouse supports OSS-HDFS for storage by default.
OSS-HDFS Added the service.
YARN Supports OSS-HDFS for storage by default.
HBase HBase HFile data supports OSS-HDFS for storage by default. HBase WAL logs support OSS-HDFS for storage.

EMR-5.12.0

Service Change
Kyuubi Upgraded to version 1.7.1.
Celeborn Upgraded to version 0.2.2.
Paimon Flink-Table-Store renamed to Paimon. Upgraded to version 0.4-ali-1.
StarRocks Upgraded to version 2.5.5.
Doris Upgraded to version 1.2.4.
ClickHouse Upgraded to version 23.3.2.37.
Trino Provides a simple event listener by default to obtain audit logs.
Phoenix Supports Hive on Phoenix.

EMR-5.11.x

Release dates

Version Date
EMR-5.11.1 April 3, 2023
EMR-5.11.0 (New purchases not supported) February 28, 2023

Updates

EMR-5.11.1

Service Change
ClickHouse Upgraded to version 22.8.14.53.
Trino Adds the odps.properties connector to support queries on MaxCompute.
JindoData Upgraded to version 4.6.5.
JindoSDK Upgraded to version 4.6.5.
Flink-Table-Store Upgraded to version 0.3-ali-2.
YARN Supports Node Labels management.

EMR-5.11.0

Service Change
Iceberg Upgraded to version 1.1.0.
Hudi Upgraded to version 0.12.2. Supports CDC.
Delta Lake Upgraded to version 2.2.0. Supports recording Vacuum operations in the transaction log.
Kudu Upgraded to version 1.16.0.
ClickHouse ZooKeeper must be selected when installing the ClickHouse service.
Celeborn RSS renamed to Celeborn. Version: 0.2.0.
Presto Added the service. Kernel: community Facebook PrestoDB 0.278.3. Default HTTP port: 8889. Default HTTPS port: 7779.
StarRocks Upgraded to version 2.5.1.
Doris Upgraded to version 1.2.1.
Kafka-Manager Upgraded to version 3.0.0.6.
Impala Upgraded to version 4.2.0.
OpenLDAP Upgraded to version 2.4.46.
HBase Supports JDK 11. Supports ThriftServer2. Changes the default value of hbase.block.data.cachecompressed to true.
Flink-Table-Store Added the service. Version based on community version 0.3.
JindoData Upgraded to version 4.6.4.

EMR-5.10.x

Release date

Version Date
EMR-5.10.0 December 1, 2022

Updates

Service Change
Iceberg Upgraded to version 0.14.1.
Flink Upgraded to Flink 1.15-vvr-6.0.2, corresponding to the community Flink 1.15 major version.
Kafka Supports LDAP user logon authentication and authorization. Supports user group authorization.
Trino EMR Presto renamed to Trino (official community name). Supports Ranger and DLF AUTH. Fixes an issue where connections to worker nodes fail after one-click LDAP enablement.
JindoSDK Upgraded to version 4.6.2.
JindoData Upgraded to version 4.6.2.
HBase Supports Ranger. Fixes an issue where OSS-HDFS cannot be selected as the storage mode when adding the service.
YARN ACLs are enabled by default in high-security mode.
StarRocks Upgraded to version 2.4.1.
Doris Upgraded to version 1.1.5.
Hudi The console supports configuring hudi-defaults.conf.
Ranger Upgraded to version 2.3.0. Supports integration with Trino, YARN, HBase, and Kafka.
DLF-Auth Upgraded to version 2.0.2. Supports Trino and Impala.
OpenLDAP Integrates with the Nslcd component.
Kudu Kudu Tserver can no longer be installed in Task node groups.
Spark Upgraded to version 3.3.1.
Tez Upgraded to version 0.10.2.
Kyuubi Upgraded to version 1.6.0.

EMR-5.9.x

Release dates

Version Date
EMR-5.9.1 November 8, 2022
EMR-5.9.0 (New purchases not supported) October 14, 2022

Updates

EMR-5.9.1

Service Change
Kerberos Supports connecting to an external KDC on EMR.
Kafka Adds a configuration item for startup commands, letting you customize startup parameters for the service.
JindoData Upgraded to version 4.6.0. Supports rewriting OSS-HDFS access paths.
Flink Upgraded to version 1.13_vvr_4.0.15.
RSS Upgraded to version 0.1.4.

EMR-5.9.0

Service Change
Spark Upgraded to version 3.3. Supports Kerberos authentication.
Hudi Upgraded to version 0.12.0. Supports Spark 3.3. Supports using a cloud-based MetaStore to host metadata and enabling the acceleration feature. For more information, see Instructions on how to use Hudi MetaStore.
Flink Supports Kerberos authentication. Supports automatic connection with Data Lake Formation (DLF).
Iceberg Upgraded to version 0.14.0. Supports Spark 3.3. Supports Kerberos authentication.
JindoData Upgraded to version 4.5.1. Supports AccessKey-free access to Alibaba Cloud resources.
Hadoop-Common and HDFS Supports Kerberos authentication. Fixes security vulnerability CVE-2022-25168.
Knox Integrates with Ranger. Access the Ranger UI from the Access Links And Ports tab.
HBase Upgraded to version 2.4.9. Supports Kerberos authentication. Supports group configuration.
RSS Upgraded to version 0.1.2. Supports Kerberos authentication.
Doris Upgraded to version 1.1.2. Supports Kerberos authentication.
StarRocks Upgraded to version 2.3.2. Supports Kerberos authentication.
Kafka Upgraded to version 2.13_3.2.1. Supports Kerberos authentication.
Delta Lake Supports upgrading to version 2.1.0. Supports Spark 3.3. Supports Kerberos authentication.
Impala Supports creating views in DLF. Supports Kerberos authentication.
Kudu Added the component. Version: 1.14.0.
YARN, Ranger, Hive, Kyuubi, Tez, ZooKeeper, DLF-Auth, Phoenix, Sqoop, and Presto Support Kerberos authentication.

EMR-5.8.x

Release date

Version Date
EMR-5.8.0 August 5, 2022

Updates

Service Change
Spark Supports one-click integration with LDAP.
Hive Supports one-click integration with LDAP.
Presto Upgraded to community version 389, using the standalone Delta Lake and Hudi connectors provided by the community. Note: the Delta Lake connector does not support Time Travel or Z-Order; the Hudi connector does not support querying MOR tables. Supports one-click integration with LDAP.
Delta Lake Integrates with DLF for automated lake table management. Fixes an issue where partition information cannot be automatically synchronized in CTAS scenarios. The optimize and vacuum commands support returning metric information.
Hudi Upgraded to version 0.11.1.
Hadoop-Common Added the component. Resolves the issue where HDFS, YARN, and JindoSDK configurations overwrite each other.
YARN Enhances the elastic scaling feature.
Ranger Supports both Spark 2 and Spark 3. Ranger Usersync supports one-click integration with LDAP.
Kafka Added the component. Version: 2.12-2.4.1.
HBase Added the component. Version: 2.3.4.
Phoenix Added the component. Version: 5.1.2.
Doris Upgraded to version 1.1.1.
StarRocks Upgraded to version 2.3.0. The primary key model supports the complete DELETE WHERE syntax and persistence of the primary key index to reduce memory usage.
ClickHouse Upgraded to version 22.3.8.39. Fixes an out-of-memory issue when reading large files from OSS.

EMR-5.6.x

Release date

Version Date
EMR-5.6.0 April 21, 2022

Updates

Service Change
JindoData Added the component. Version: 4.3.0.
JindoSDK Upgraded to version 4.3.0.
Spark Upgraded to version 3.2.1.
Hive Fixes a bug where commits are repeated after Speculation is enabled in Tez.
Presto Fixes a bug where the Presto service fails to start after being added to a Hadoop cluster that has already been initialized.
Delta Lake DML supports subqueries.
Hudi Upgraded to version 0.10.1.
Iceberg Upgraded to version 0.13.1.
YARN Adds a feature configuration to restrict ApplicationMasters (AMs) to run only on CORE group nodes.
HBase Fixes a bug in the HBase 2.3.4 kernel.
ZooKeeper Optimizes JVM parameter configurations.
Impala Adapts to JindoSDK 4.3.0.
Sqoop Upgrades the PostgreSQL version.
Zeppelin Fixes an issue where the JDBC Interpreter fails to start.
Ranger The Ranger 1.2.0 Spark plugin supports Delta Lake and Hudi.
Flume Adapts to JindoSDK 4.3.0.
Oozie Upgrades Log4j to version 2.17.2.
DLF-Auth Upgraded to version 2.0.0.

EMR-5.5.x

Release dates

Version Date
EMR-5.5.1 March 25, 2022
EMR-5.5.0 (New purchases not supported) February 15, 2022

Updates

EMR-5.5.1

Only OLAP clusters in the new console support this version.
Service Change
ClickHouse Modifies the default values of some parameters.
StarRocks Upgraded to version 2.1.1.

EMR-5.5.0

Service Change
SmartData The component is offline.
RSS Upgrades the ESS service to RSS. Enhances features and stability.
JindoSDK Upgrades the architecture to JindoData. EMR integrates with JindoSDK 4.0 for the first time, supporting services such as OSS and OSS-HDFS.
Spark The COUNT DISTINCT function supports IF statements and optimizes the use of CASE WHEN (set spark.sql.optimizer.rewriteConditionalDistinctAggregates to true). Shuffle Hash Join supports fallback to Sort Merge Join (set spark.sql.join.preferSortMergeJoin to false and spark.sql.join.enableShuffledHashJoinFallback to true). Supports automatic merging of small files for non-dynamic partitions (set spark.sql.adaptive.merge.output.small.files.enabled to true). Automatically adjusts concurrency for GroupingSet and Distinct scenarios (set spark.sql.execution.optimizeExpand to true). Optimizes Hive on Spark. Supports Time Travel syntax. Adapts to JindoSDK.
Tez Adapts to JindoSDK.
Hive Optimizes the batch deletion of Hive Jindo. Optimizes the HiveServer2 out-of-memory issue. Optimizes Hive on Spark. Adapts to JindoSDK.
Presto Upgraded to community version 358. Adds MySQL, Iceberg, Hudi, Phoenix, Kudu, and Delta connectors by default and updates the default configurations. Supports data lake analytics. Supports dynamic UDF loading. Adapts to JindoSDK.
Delta Lake Upgraded to version 1.1.0, compatible with Spark 3.2.0. All commercial features migrated to version 1.1.0. Optimizes synchronization of metadata modifications to the metastore. Automatically reports table statistics (dataProfiling) to the metastore. Supports Time Travel syntax. Supports DropPartition SQL syntax. Supports dynamic partition overwrites using SQL. Supports ADD COLUMN operations at specified positions (FIRST and AFTER). Supports and enables dynamic adjustment of file sizes based on table sizes by default. Supports and enables automatic Vacuum by default (concurrent Vacuum also supported). Optimizes the logic for automatic compaction (disabled by default). Adds Z-order syntax and accelerates the Z-order process.
Hudi Upgraded to version 0.10.0. Supports Spark 3.2.0. Supports JindoFS Block mode.
HDFS Adapts to JindoSDK.
YARN Adapts to RSS memory configurations. Adapts to JindoSDK.
Flume Adapts to JindoSDK.
Impala Adapts to JindoSDK.
Ranger Supports Spark 3.2.0. Supports Presto 358.
HBase Fixes issues with default parameters. Fixes an issue with the GC log date format.
ClickHouse Adds HDFS and OSS disk types to support hot and cold data separation (see Use HDFS for hot and cold data separation and Use OSS for hot and cold data separation). In Replicated\*MergeTree scenarios, Zero Copy is supported for OSS, HDFS, and S3 disk types. Optimizes the processing logic when the ClickHouse component is stopped.
Iceberg Upgraded to version 0.13.0. Supports Presto 358.
DLF-Auth Supports Spark 3.2.0. Supports Presto 358.

EMR-5.4.x

Release dates

Version Date
EMR-5.4.3 December 2021
EMR-5.4.2 (New purchases not supported) December 2021
EMR-5.4.1 (New purchases not supported) November 2021
EMR-5.4.0 (New purchases not supported) October 2021

Updates

EMR-5.4.3

This release fixes the Log4j security vulnerabilities in all related components. For more information, see Vulnerability Announcement — Apache Log4j2 Arbitrary Code Execution Vulnerability.

Service Change
Presto Fixes the Log4j vulnerability in the Elasticsearch connector.
DLF Metastore Changes the default setting for Metastore logs from enabled to disabled. Fixes an error caused by an excessively long URI for Metastore gettablestats.
Delta Lake Fixes an issue with synchronizing schema changes to the Metastore.
Sqoop Fixes an issue where precision is lost for the Decimal type when Sqoop imports HCatalog tables.

EMR-5.4.2

Service Change
SmartData Updated to version 3.8.0. For more information, see SmartData 3.8.X overview. Authentication and authorization based on Kerberos and Ranger can be used to manage permissions on data in OSS.

EMR-5.4.1

Service Change
SmartData Upgraded to version 3.7.3. For more information, see Introduction to SmartData 3.7.x.
Oozie Fixes an issue where the Jetty server for Oozie fails to start due to a JAR package conflict in an HA environment.
Impala Fixes a no such method error that occurs when querying DLF metadata tables.
DLF-Auth Upgraded to version 1.0.1.

EMR-5.4.0

Service Change
SmartData Upgraded to version 3.7.2. For more information, see Introduction to SmartData 3.7.x.
Spark Upgraded to version 3.1.2. Optimizes Distinct computing performance for Spark SQL when an aggregation operator contains multiple count(distinct case ... when ...) methods. Fixes the array-index out of bounds error when some required statistics for Adaptive Query Execution (AQE) are missing. Fixes errors related to AQE and data caching in specific scenarios.
Hive The batch metadata optimization feature is supported for Hive on JindoFS (Block mode). Disabled by default.
Presto Delta tables support StorageHandler queries.
Delta Lake Upgraded to version 1.0.0. Unifies delta-connectors for Hive 2 and Hive 3. Fixes an error when delta-connectors query multi-level partitioned tables. Supports SQL syntax for DataSkipping, Optimize, and Zorder. Supports synchronizing metadata to the MetaStore.
Hudi Updated to version 0.9.0. Fixes the compatibility issue of sql.extension between Delta Lake and Hudi. Supports Spark 3.1.2.
HDFS The default parameter for NameNode reserved capacity is automatically increased to ensure NameNode promptly enters safe mode when disk space is insufficient.
Storm The component is offline.
Zeppelin Upgraded to community version 0.10.0.
Hue Fixes an issue where the YARN Job Browser fails to display or stop jobs in some cases. Enables the YARN Job Browser in default configurations. Supports the Presto protocol in default configurations.
Druid Fixes an issue where a node fails to restart after an unexpected server shutdown because a PID file is not deleted.
ClickHouse Updates the default configurations. Supports cluster scale-out. Supports the MetaChecker feature. Supports reading data using the OSS table engine and OSS table function.
Iceberg Upgraded to version 0.12.0-1.0.1. Fixes an error with Hive Runtime dependencies.
Knox Fixes an issue where the first access to the Spark UI fails.
DLF-Auth Added the service at version 1.0.0. Supports configuring Hive or Spark permissions to access DLF.

EMR-5.3.x

Release dates

Version Date
EMR-5.3.1 September 2021
EMR-5.3.0 (New purchases not supported) August 2021

Updates

EMR-5.3.1

Service Change
SmartData Upgraded to version 3.7.1.
Hue Fixed an issue where Impala could not be used in high-security clusters.
Kudu Added support for Kerberos.
HBase Fixed an issue where restarting HBase in high-security clusters took too long. Fixed an issue where the integration of Spark 3.1.1 with HBase failed. Optimized the graceful stop process.

EMR-5.3.0

Service Change
SmartData Upgraded to version 3.7.0.
Spark Fixed a compatibility issue with Delta Lake.
Hive Hive on JindoFS (Block mode) supports the batch metadata optimization feature. Disabled by default.
Delta Lake Added support for the DeltaLake partition feature. Fixed a compatibility issue between the desc detail command and Spark 3.1.1.
YARN Added appId, CPU, and memory resource usage information to the node Containers REST API. Fixed an issue where ApplicationMaster (AM) logs on released Auto Scaling nodes could not be viewed. Fixed an issue where historical data in the State Store caused the cluster to become unavailable. Added support for cleaning up released nodes after they are decommissioned by Auto Scaling. Improved the graceful decommission logic for Auto Scaling — a node is marked as offline only after the NodeManager (NM) process ends.
ZooKeeper Upgraded to community version 3.6.3.
Flink Added the SmartData component. Fixed an issue where password-free access to OSS was not possible when submitting jobs to a DataFlow-Flink cluster via SSH.
Impala Fixed an issue where deleting an OSS partition directory directly caused a directory listing loop.
Hue Fixed a UI display issue that occurred when Hue was integrated with Oozie.
Kudu Upgraded to community version 1.14.0.
ClickHouse The component version is 21.3.13.9.
Iceberg Added the Iceberg component. Version: 0.12.0.

EMR-5.2.x

Release date

Version Date
EMR-5.2.1 July 16, 2021

Updates

Service Change
SmartData Upgraded to version 3.6.1. For more information, see Introduction to SmartData 3.6.x.
Hive Fixed an issue where the show create table command returned incorrect results when DLF metadata was used. Optimized default Hive parameters to improve job performance. Changed the names of configuration items on the hive-env tab to uppercase for ease of use. Fixed a memory leak in HiveServer2 caused by a User-Defined Function (UDF). Improved the error message displayed when writing data to a Hive table if the file system is inconsistent with the MetaStore.
HDFS Added support for the Zstandard (ZSTD) compression format.
Delta Lake Upgraded to version 0.8.0. Added support for Spark 3.
Flink Upgraded to version 1.12-vvr-3.0.2.
Hudi Upgraded to version 0.8.0. Added support for integration with Spark SQL.
Spark
Important

Spark 3.1.1 in EMR-5.2.1 is not compatible with Kudu 1.11.1. Supports the Delta Lake and Hudi data lake formats. Supports Remote Shuffle Service. Supports Livy. Optimized the names of configuration items on the spark-defaults tab. Optimized Cost-Based Optimization (CBO), Dynamic Partition Pruning (DPP), and Z-Order — performance is improved by 50% compared with the open source Spark 3 version. Added support for data sources such as Alibaba Cloud Log Service, DataHub, and Message Queue for Apache RocketMQ (ONS).

Tez Optimized default Tez parameters to improve job performance.
Ranger
  • Fixed a warning error in Spark logs that occurred when Ranger was enabled.

  • Fixed an issue where automatic user synchronization failed after connecting to LDAP.

Knox Added support for the Kudu component. Added support for the HBase component.
Kafka Added support for the Cruise Control component for Kafka cluster balancing. Introduced a hot-swapping feature for Kafka disks — replace faulty disks without stopping or starting a broker. Modified the default values of some parameters.
Phoenix Fixed an issue where a "JDBC Driver not found" error was reported when Hive and Spark SQL were used to access Phoenix tables.
ESS (EMR Remote Shuffle Service) Added support for Spark 3.