All Products
Search
Document Center

E-MapReduce:EMR 3.x series release notes

Last Updated:Oct 28, 2025

This topic describes the release dates and update details for the EMR 3.x series. For more information about the components that are supported in each version, see Release versions.

EMR 3.55.x

Release date

Version

Date

EMR-3.55.0

October 27, 2025

Update details

Service

Change

Ranger

  • Jindoauth Server supports custom RAM Roles for client users to access OSS.

  • Fixed a missing dependency in Ranger-yarn-plugin.

Paimon

Upgraded to version 1-ali-16.3.

JindoCache

Upgraded to version 6.10.1.

Release version information

DataLake cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

Hive

2.3.9

Spark2

2.4.8

Spark3

3.4.2

YARN

2.8.5

Trino

422

DeltaLake

3.0.0

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

Tez

0.10.2

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Sqoop

1.4.7

DLF-Auth

2.0.2

Presto

0.283

Zookeeper

3.8.4

Knox

1.5.0

Celeborn

0.5.2

JindoCache

6.10.1

Paimon

1-ali-16.3

OLAP cluster

Service

Version

StarRocks2

2.5.22

StarRocks3

3.2.11

Doris

2.1.4

ClickHouse

23.8.2.7

Zookeeper

3.8.4

DataFlow cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

YARN

2.8.5

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Zookeeper

3.8.4

Knox

1.5.0

Flink

1.17.2

Paimon

1-ali-6.2

DataServing cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Zookeeper

3.8.4

Knox

1.5.0

HBase

1.7.1

JindoCache

6.8.2

Phoenix

4.16.1

Custom cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

Hive

2.3.9

Spark2

2.4.8

Spark3

3.4.2

YARN

2.8.5

Trino

422

DeltaLake

3.0.0

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

Tez

0.10.2

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Sqoop

1.4.7

DLF-Auth

2.0.2

Presto

0.283

StarRocks2

2.5.22

StarRocks3

3.2.11

Zookeeper

3.8.4

Knox

1.5.0

Celeborn

0.5.2

Flink

1.17.2

HBase

1.7.1

JindoCache

6.10.1

Paimon

1-ali-16.3

Phoenix

4.16.1

EMR 3.54.x

Release date

Version

Date

EMR-3.54.0

July 10, 2025

Update details

Service

Change

Hive

Fixed some known bugs.

Tez

Fixed community bugs to improve performance and stability.

Release version information

DataLake cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

Hive

2.3.9

Spark2

2.4.8

Spark3

3.4.2

YARN

2.8.5

Trino

422

DeltaLake

3.0.0

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

Tez

0.10.2

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Sqoop

1.4.7

DLF-Auth

2.0.2

Presto

0.283

Zookeeper

3.8.4

Knox

1.5.0

Celeborn

0.5.2

JindoCache

6.8.2

Paimon

1-ali-6.2

OLAP cluster

Service

Version

StarRocks2

2.5.22

StarRocks3

3.2.11

Doris

2.1.4

ClickHouse

23.8.2.7

Zookeeper

3.8.4

DataFlow cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

YARN

2.8.5

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Zookeeper

3.8.4

Knox

1.5.0

Flink

1.17.2

Paimon

1-ali-6.2

DataServing cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Zookeeper

3.8.4

Knox

1.5.0

HBase

1.7.1

JindoCache

6.8.2

Phoenix

4.16.1

Custom cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

Hive

2.3.9

Spark2

2.4.8

Spark3

3.4.2

YARN

2.8.5

Trino

422

DeltaLake

3.0.0

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

Tez

0.10.2

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Sqoop

1.4.7

DLF-Auth

2.0.2

Presto

0.283

StarRocks2

2.5.22

StarRocks3

3.2.11

Zookeeper

3.8.4

Knox

1.5.0

Celeborn

0.5.2

Flink

1.17.2

HBase

1.7.1

JindoCache

6.8.2

Paimon

1-ali-6.2

Phoenix

4.16.1

EMR 3.53.x

Release date

Version

Date

EMR-3.53.0

April 24, 2025

Update details

Service

Change

Trino

Fixed an issue where LDAP was unavailable.

YARN

Fixed open source bugs (YARN-10213, YARN-6207, and YARN-9339).

StarRocks

Supports the creation of clusters with separated storage and compute resources.

JindoCache

Upgraded to version 6.8.2.

EMRHOOK

Enhanced stability.

Release version information

DataLake cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

Hive

2.3.9

Spark2

2.4.8

Spark3

3.4.2

YARN

2.8.5

Trino

422

DeltaLake

3.0.0

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

Tez

0.10.2

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Sqoop

1.4.7

DLF-Auth

2.0.2

Presto

0.283

Zookeeper

3.8.4

Knox

1.5.0

Celeborn

0.5.2

JindoCache

6.8.2

Paimon

1-ali-6.2

OLAP cluster

Service

Version

StarRocks2

2.5.22

StarRocks3

3.2.11

Doris

2.1.4

ClickHouse

23.8.2.7

Zookeeper

3.8.4

DataFlow cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

YARN

2.8.5

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Zookeeper

3.8.4

Knox

1.5.0

Flink

1.17.2

Paimon

1-ali-6.2

DataServing cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Zookeeper

3.8.4

Knox

1.5.0

HBase

1.7.1

JindoCache

6.8.2

Phoenix

4.16.1

Custom cluster

Service

Version

Hadoop-Common

2.8.5

HDFS

2.8.5

OSS-HDFS

1.0.0

Hive

2.3.9

Spark2

2.4.8

Spark3

3.4.2

YARN

2.8.5

Trino

422

DeltaLake

3.0.0

Hudi

0.15.0

Iceberg

1.5.0

Flume

1.11.0

Kyuubi

1.9.2

Tez

0.10.2

OpenLDAP

2.4.46

Ranger

2.3.0

Ranger-plugin

1.0.0

Sqoop

1.4.7

DLF-Auth

2.0.2

Presto

0.283

StarRocks2

2.5.22

StarRocks3

3.2.11

Zookeeper

3.8.4

Knox

1.5.0

Celeborn

0.5.2

Flink

1.17.2

HBase

1.7.1

JindoCache

6.8.2

Paimon

1-ali-6.2

Phoenix

4.16.1

EMR 3.52.x

Release date

Version

Date

EMR-3.52.1

December 18, 2024

EMR-3.52.0 (New purchases are not supported)

December 4, 2024

Update details

Service

Change

Spark

  • Fixed a configuration issue that occurred during scale-out.

  • Fixed an issue where SASL connections occasionally failed in Kerberos clusters.

Hive

Fixed a configuration issue that occurred during scale-out.

Trino

Resolved an issue where connections failed after LDAP was enabled.

Presto

Zookeeper

Supports adding custom configurations.

Ranger

Replaced the existing Spark 3 Ranger plugin with the version provided by the open source Kyuubi project.

Hudi

Upgraded to version 0.15.0.

Celeborn

Upgraded to version 0.5.2.

JindoCache

Upgraded to version 6.5.3.

StarRocks3

Upgraded to version 3.2.11.

Kyuubi

Upgraded to version 1.9.2.

StarRocks2

Upgraded to version 2.5.22.

Impala

The service is unavailable. You can use the recommended service as an alternative or manually install the corresponding service.

You can replace Impala with Presto, Trino, ClickHouse, or StarRocks.

Kudu

Kafka

Kafka-Manager

EMR 3.51.x

Release date

Version

Date

EMR-3.51.4

December 18, 2024

EMR-3.51.3 (New purchases are not supported)

November 29, 2024

EMR-3.51.2 (New purchases are not supported)

August 29, 2024

EMR-3.51.1 (New purchases are not supported)

June 21, 2024

EMR-3.51.0 (New purchases are not supported)

April 23, 2024

Update details

EMR-3.51.4

Service

Change

JindoCache

Upgraded to version 6.5.3.

StarRocks2

Upgraded to version 2.5.22.

StarRocks3

Upgraded to version 3.2.11.

EMR-3.51.3

Service

Description

JindoSDK

JindoSDK is updated to resolve the issue that causes deadlocks.

EMR-3.51.2

Service

Description

JindoCache

  • JindoCache is updated to 6.5.1.

  • The performance of reading data from and writing data to distributed hash tables is improved.

Spark

  • The issue that partition directories cannot be deleted is fixed.

  • The issue related to the Hive package dependency is fixed. This ensures that the connection between Spark and the Metastore client remains uninterrupted.

Trino

  • The issue that some modified configurations may be unexpectedly restored to original configurations during a scale-out is fixed.

  • Data in the OSS-HDFS service that is deployed in a high-security cluster can be queried.

  • The issue that exceptions occur on Trino after DLF-Auth is enabled is fixed.

Presto

Data in the OSS-HDFS service that is installed in a high-security cluster can be queried.

HDFS

The issue that the memory size of NameNodes and DataNodes cannot be modified is fixed.

HBase-HDFS

YARN

  • Multiple timeline events can be sent by the ResourceManager at a time, which improves the processing capability.

  • The logic issue in processing containers and resources of the ResourceManager is fixed.

ZooKeeper

  • The issue that the memory configuration of a node group cannot be modified is fixed.

  • The log configuration files can be reconstructed.

Impala

The issue that client configurations are unexpectedly modified during an auto scaling activity is fixed.

Ranger

The latest version of JindoSDK is supported, which effectively reduces the CPU load.

Knox

The following issue is fixed: The URL of Knox fails to be accessed when a cluster has only one Master Extend node group.

Kafka

The following issue is fixed: The EMR cluster in which Kafka Connect is deployed fails to be started.

StarRocks

The issue that added BE nodes are not displayed after a scale-out is fixed.

Doris

Doris is updated to 2.1.4.

Paimon

Paimon is updated to 0.9-ali-7.

EMR-HOOK

The lineage information of a MaxCompute table can be parsed.

EMR-3.51.1

Service

Change

Spark

Supports deploying the Master-Extend node group.

Hive

Kyuubi

Paimon

Replaced the Flink dependency from the VVR version to the community version and added support for DLF Catalog.

Knox

Packaged using JDK 8.

Flink

Restored the DLF configurations and dependencies that were removed in EMR-3.51.0.

EMR-3.51.0

Service

Change

Spark

Upgraded Spark3 to version 3.4.2.

Celeborn

Upgraded to version 0.4.0.

Doris

Upgraded to version 2.1.0.

StarRocks

  • Upgraded StarRocks2 to version 2.5.18.

  • Upgraded StarRocks3 to version 3.2.4.

DeltaLake

Upgraded to version 3.0.0.

Iceberg

Upgraded to version 1.5.0.

Zookeeper

Upgraded to version 3.8.4.

JindoCache

Upgraded to version 6.2.5.

Flink

Upgraded to version 1.17.2.

EMR 3.50.x

Release date

Version

Date

EMR-3.50.0

February 19, 2024

Update details

Service

Change

Hudi

Upgraded to version 0.14.0.

Flume

Upgraded to version 1.11.0.

Kyuubi

Upgraded to version 1.7.3.

Impala

Upgraded to version 4.3.0.

Celeborn

Upgraded to version 0.3.2.

JindoCache

Upgraded to version 6.2.0.

Paimon

Upgraded to version 0.7-ali-1.

Kafka

  • Upgraded to version 3.6.1.

  • Fixed a SASL security authentication vulnerability in the Kafka Connect component.

Spark

Fixed the Commons Text vulnerability.

StarRocks

  • Upgraded StarRocks2 to version 2.5.13.

  • Upgraded StarRocks3 to version 3.1.5.

Ranger

  • Fixed the Commons Text vulnerability.

  • Fixed the Spring Security path matching permission bypass vulnerability.

  • Fixed the Spring Security forward/include authentication bypass vulnerability.

  • Fixed the Spring Framework identity authentication bypass vulnerability under a special matching pattern.

  • Supports modifying the period for Ranger to synchronize LDAP users.

EMR 3.49.x

Release date

Version

Date

EMR-3.49.1

November 16, 2023

EMR-3.49.0 (New purchases are not supported)

October 27, 2023

Update details

Service

Change

JindoCache

Added the component. The version is 6.1.1.

JindoData

JindoData is unavailable. You can use JindoCache for data caching and DLF-Auth for authentication.

Spark

Removed jdo-related configurations from hive-site.xml.

HBase

Added a configuration item. You can select the HBase Thrift Server version, including v1 and v2, as needed.

StarRocks

Upgraded StarRocks2 to version 2.5.10.

Doris

Upgraded Doris to version 1.2.7.

Celeborn

Upgraded Celeborn to version 0.3.1.

Paimon

Upgraded Paimon to version 0.6-ali-2.

ClickHouse

Upgraded ClickHouse to version 23.8.2.7.

EMR 3.48.x

Release date

Version

Date

EMR-3.48.2

August 17, 2023

Update details

Service

Change

Trino

  • Fixed an issue where the Paimon connector could not successfully query HDFS tables.

  • Fixed an issue where worker monitoring metrics could not be read.

Presto

  • Upgraded to version 0.283.

  • Fixed an issue where worker monitoring metrics could not be read.

ClickHouse

Granted all permissions to the default user by default.

StarRocks

  • Renamed the previous StarRocks to StarRocks2.

  • Added StarRocks3, version 3.1.2. By default, it is created as a storage-compute coupled version. Storage-compute separated versions are not supported.

Celeborn

Upgraded to version 0.3.0.

EMR 3.47.x

Release date

Version

Date

EMR-3.47.0

August 3, 2023

Update details

Service

Change

Hudi

Upgraded to version 0.13.1.

Paimon

Upgraded to version 0.5-ali-1.

StarRocks

Upgraded to version 2.5.8.

JindoData

Upgraded to version 4.6.11.

Trino

  • Upgraded to version 422.

  • The Hudi connector supports querying Merge On Read (MOR) tables.

  • Optimized error messages for dynamic UDF loading.

EMR 3.46.x

Release date

Version

Date

EMR-3.46.1

July 13, 2023

EMR-3.46.0 (New purchases are not supported)

June 1, 2023

Update details

EMR-3.46.1

Service

Description

Spark

  • By default, OSS-HDFS is used to store data of Spark History Server.

  • OSS or OSS-HDFS is used to store data of Spark3 Native Engine.

Hive

By default, OSS-HDFS is used to store data in Hive warehouse files.

OSS-HDFS

The OSS-HDFS service is added.

YARN

By default, OSS-HDFS is used to store data.

HBase

  • By default, OSS-HDFS is used to store HBase data in the HFile format.

  • OSS-HDFS is used to store write-ahead logging (WAL) logs of HBase.

EMR-3.46.0

Service

Change

Kyuubi

Upgraded to version 1.7.1.

Celeborn

Upgraded to version 0.2.2.

Paimon

  • Renamed Flink-Table-Store to Paimon.

  • Upgraded to version 0.4-ali-1.

Starrocks

Upgraded to version 2.5.5.

Doris

Upgraded to version 1.2.4.

ClickHouse

Upgraded to version 22.8.17.17.

Trino

Provided a simple Event Listener by default to obtain audit logs.

Phoenix

Supports Hive on Phoenix.

EMR 3.45.x

Release date

Version

Date

EMR-3.45.1

April 3, 2023

EMR-3.45.0 (New purchases are not supported)

February 28, 2023

Update details

EMR-3.45.1

Service

Description

ClickHouse

ClickHouse is updated to 22.8.14.53.

Trino

The odps.properties connector is added. This allows you to query MaxCompute data.

JindoData

JindoData is updated to 4.6.5.

JindoSDK

JindoSDK is updated to 4.6.5.

Flink Table Store

Flink Table Store is updated to 0.3-ali-2.

YARN

The Node Labels feature is supported.

EMR-3.45.0

Service

Change

Iceberg

Upgraded to version 1.1.0.

Hudi

  • Upgraded to version 0.12.2.

  • Supports the CDC feature.

Kudu

Upgraded to version 1.16.0.

Clickhouse

  • Upgraded to version 22.3.8.39.

  • The ZooKeeper service must be selected when you install the ClickHouse service.

Celeborn

  • Renamed RSS to Celeborn.

  • The version of Celeborn is 0.2.0.

Presto

Added the service. The kernel is community Facebook PrestoDB 0.278.3. The default HTTP port is 8889, and the HTTPS port is 7779.

DeltaLake

Upgraded to version 2.2.0.

StarRocks

Upgraded to version 2.4.3.

Doris

Upgraded to version 1.2.1.

Kafka-Manager

Upgraded to version 3.0.0.6.

Impala

The service is offline.

OpenLDAP

Upgraded to version 2.4.46.

Kyuubi

Upgraded to version 1.6.1.

Ranger

Upgraded to version 2.3.0.

HBase

  • Supports ThriftServer2.

  • The default value of the hbase.block.data.cachecompressed parameter is changed to true.

Flink-Table-Store

Added the service, based on community version 0.3.

JindoData

Upgraded to version 4.6.4.

EMR 3.44.x

Release date

EMR-3.44.0 was released on December 1, 2022.

Update details

Service

Change

Iceberg

Upgraded to version 0.14.1.

Flink

Upgraded to Flink1.15-vvr-6.0.2, which corresponds to the community Flink 1.15 major version.

Kafka

  • Supports LDAP user logon authentication and authorization.

  • Supports user group authorization.

Trino

  • EMR Presto was renamed to its official community name, Trino.

  • Supports Ranger and DLF AUTH.

  • Fixed an issue where connections to worker nodes failed after LDAP was enabled with a single click.

JindoSDK

Upgraded to version 4.6.2.

JindoData

Upgraded to version 4.6.2.

HBase

  • Supports Ranger.

  • Fixed an issue where OSS-HDFS could not be selected as the storage mode when adding a service.

YARN

ACLs are enabled by default in high-security mode.

Starrocks

Upgraded to version 2.3.4.

Doris

Upgraded to version 1.1.5.

Hudi

The console supports configuring hudi-defaults.conf.

Ranger

Supports integration with Trino, YARN, HBase, and Kafka.

DLF-Auth

  • Upgraded to version 2.0.2.

  • Supports Trino and Impala.

OpenLDAP

Integrated with the Nslcd component.

Kudu

Kudu Tserver can no longer be installed in the Task node group.

Spark

Upgraded to version 3.3.1.

Tez

Upgraded to version 0.10.2.

Kyuubi

Upgraded to version 1.6.0.

EMR 3.43.x

Release date

Version

Date

EMR-3.43.1

November 08, 2022

EMR-3.43.0 (New purchases are not supported)

October 14, 2022

Update details

EMR-3.43.1

Service

Change

Kerberos

Supports connecting to an external KDC on EMR.

Kafka

Supports adding a startup command configuration item to customize service startup parameters.

JindoData

  • Upgraded to version 4.6.0.

  • Supports rewriting OSS-HDFS access paths.

Flink

Upgraded to version 1.13_vvr_4.0.15.

RSS

Upgraded to version 0.1.4.

EMR-3.43.0

Service

Change

Spark

  • Upgraded to version 3.3.

  • Supports enabling Kerberos identity authentication.

Hudi

  • Upgraded to version 0.12.0.

  • Supports Spark 3.3.

  • Supports using a cloud MetaStore to host metadata and enabling the acceleration feature. For more information, see Hudi MetaStore usage guide.

Flink

  • Supports enabling Kerberos identity authentication.

  • Supports automatic connection with Data Lake Formation (DLF).

Iceberg

  • Upgraded to version 0.14.0.

  • Supports Spark 3.3.

  • Supports enabling Kerberos identity authentication.

JindoData

  • Upgraded to version 4.5.1.

  • Supports accessing Alibaba Cloud resources without plaintext AccessKeys.

Hadoop-Common and HDFS

  • Supports enabling Kerberos identity authentication.

  • Fixed security vulnerability CVE-2022-25168.

Knox

Integrated with Ranger. The Ranger UI can be accessed from the Access Links And Ports tab.

HBase

  • Upgraded to version 1.7.1.

  • Supports enabling Kerberos identity authentication.

  • Supports group-based configuration.

RSS

  • Upgraded to version 0.1.2.

  • Supports enabling Kerberos identity authentication.

Doris

  • Upgraded to version 1.1.2.

  • Supports enabling Kerberos identity authentication.

StarRocks

  • Upgraded to version 2.2.6.

  • Supports enabling Kerberos identity authentication.

Kafka

  • Upgraded to version 2.13_3.2.1.

  • Supports enabling Kerberos identity authentication.

DeltaLake

  • Upgraded to version 2.1.0.

  • Supports Spark 3.3.

  • Supports enabling Kerberos identity authentication.

Kudu

Added the component. The version is 1.14.0.

Impala

  • Supports creating views in DLF.

  • Supports enabling Kerberos identity authentication.

YARN, Imapla, Ranger, Hive, Kyuubi, Tez, Kafka, Zookeeper, DLF-Auth, Phoenix, Sqoop, Presto

Supports enabling Kerberos identity authentication.

EMR 3.42.x

Release date

EMR-3.42.0 was released on August 5, 2022.

Update details

Service

Change

Hive

Supports one-click integration with LDAP.

Presto

  • Upgraded to community version 389.

    Uses the standalone Delta Lake and Hudi connectors provided by the community.

    • This version of the Delta Lake connector does not support Time Travel and Z-Order.

    • This version of the Hudi connector does not support querying MOR tables.

  • Supports one-click integration with LDAP.

DeltaLake

  • Integrated with DLF for automated lake table management.

  • Supports Ranger authorization.

  • Fixed an issue where statistics could not be collected for timestamp fields.

  • The optimize and vacuum commands now support returning metric information.

Hudi

Upgraded to version 0.11.1.

HadoopCommon

Added a new component to resolve the issue of HDFS, YARN, and JindoSDK configurations overwriting each other.

YARN

Enhanced elastic features.

Ranger

  • Supports both Spark2 and Spark3.

  • Ranger Usersync supports one-click integration with LDAP.

Kafka

CruiseControl automatically creates related topics on startup.

HBase

Added the component. The version is 1.4.9.

Phoenix

Added the component. The version is 4.14.1.

Doris

Upgraded to version 1.1.1.

StarRocks

Upgraded to version 2.2.3.

ClickHouse

Fixed a memory overflow issue when reading large files from OSS.

EMR 3.40.x

Release date

EMR-3.40.0 was released on April 21, 2022.

Update details

Service

Change

JindoData

Added the component. The version is 4.3.0.

JindoSDK

Upgraded to version 4.3.0.

Spark

Upgraded to version 3.2.1.

Hive

  • Fixed a bug where TEZ repeatedly committed when Speculation was enabled.

  • Fixed a bug where UDFs could only be called after reloading the function.

Presto

Fixed a bug where the Presto service could not be started after it was added when the Hadoop cluster was initialized.

DeltaLake

Fixed a compatibility issue with Streaming SQL.

Hudi

Upgraded to version 0.10.1.

Iceberg

Upgraded to version 0.13.1.

YARN

  • Added a feature to restrict ApplicationMasters (AMs) to run only on CORE group nodes.

  • Fixed an issue where the mareduce.map.java.opts configuration was missing taihaodoctor.

Zookeeper

Optimized JVM parameter configurations.

Flink

Adapted to JindoSDK 4.3.0.

Impala

Flume

Druid

Sqoop

Upgraded the PostgreSQL version.

Zeppelin

Resolved a startup failure issue with the JDBC Interpreter.

Ranger

The Ranger 1.2.0 Spark Plugin supports Hudi.

Oozie

Upgraded Log4j to version 2.17.2.

HBase

Fixed an issue where RegionServer could not be started in HBase 1.4.9.

DLF-Auth

Upgraded to version 2.0.0.

EMR 3.39.x

Release date

Version

Date

EMR-3.39.2

March 25, 2022

EMR-3.39.1 (New purchases are not supported)

February 15, 2022

Update details

EMR-3.39.2

Note

Only OLAP clusters and DataFlow clusters in the new EMR console support this version.

Service

Change

Flink

  • Improved the application performance management (APM) dashboard and added new monitoring metrics, such as sourceIdleTime.

  • Supports CloudMonitor alerts.

Kafka

  • Supports SSL and SASL configurations.

  • Modified the default values of some parameters.

Clickhouse

Modified the default values of some parameters.

EMR-3.39.1

Service

Change

SmartData

The component is offline.

BIGBOOT

RSS

  • Upgraded the ESS service to RSS. For more information, see RSS.

  • Enhanced the features and stability of the service.

JindoSDK

  • Upgraded the architecture to JindoData.

  • EMR integrates JindoSDK 4.0 for the first time and supports services such as OSS and OSS-HDFS..

Spark

  • Optimized Hive on Spark.

  • Adapted to JindoSDK.

Tez

Adapted to JindoSDK.

Hive

Adapted to JindoSDK.

Presto

  • Supports dynamic UDF loading.

  • Delta Lake tables support Time Travel queries with the `for ... as of` syntax.

  • Added a standalone Delta Lake Catalog, provided default Delta connector configurations, and supported ZOrder Dataskip optimization based on the standalone Catalog.

  • Fixed an issue where the Hudi connector could not query Hudi MOR tables. The Hive connector does not support querying Hudi MOR tables.

  • Adapted to JindoSDK.

Delta Lake

  • Metadata management

    • Used the built-in Spark Catalog instead of the Hive CLI API to synchronize metadata and partition information.

    • Automatically reports table statistics (dataProfiling) to the MetaStore.

  • SQL

    • Supports Time Travel syntax.

    • Supports DropPartition SQL syntax.

    • Supports ADD COLUMN operations at specified positions (FIRST and AFTER).

  • Enhanced table management capabilities

    • Supports and enables dynamic adjustment of filesize based on table size by default.

    • Supports and enables automatic Vacuum by default. Supports concurrent Vacuum.

    • Optimized the logic for automatic compaction, which is disabled by default.

    • Added Zorder syntax and accelerated the Zorder process.

Hudi

Upgraded to version 0.10.0.

HDFS

Adapted to JindoSDK.

YARN

Adapted to JindoSDK.

Flume

Adapted to JindoSDK.

Flink

  • By default, the Flink lib directory is uploaded to the HDFS cluster, so that you can use it with the yarn.provided.lib.dirs parameter.

  • Adapted to JindoSDK.

Impala

Adapted to JindoSDK.

Ranger

  • Fixed a startup failure issue with Spark History Server.

  • Adapted to JindoSDK.

HBase

  • Fixed an issue with default parameters.

  • Fixed a GC log date format issue.

  • Fixed a restart issue when RS used an IP address.

Druid

Adapted to JindoSDK.

Clickhouse

Optimized the handling logic when the ClickHouse component is stopped.

Iceberg

  • Upgraded to version 0.13.0.

  • Hid default configuration items to improve user experience.

DLF-Auth

Fixed a startup failure issue with Spark History Server.

StarRocks

Added the service to the new console.

Version 2.0.1 is published.

EMR 3.38.x

Release date

Version

Date

EMR-3.38.3

December 2021

EMR-3.38.2 (New purchases are not supported)

December 2021

EMR-3.38.1 (New purchases are not supported)

November 2021

EMR-3.38.0 (New purchases are not supported)

October 2021

Update details

EMR-3.38.3

Fixed the Log4j security vulnerability in all related components. For more information, see Vulnerability announcement | Apache Log4j2 remote code execution vulnerability.

Service

Change

Presto

  • Fixed an error that occurred when Presto queried Hudi tables in a high availability cluster.

  • Fixed the Log4j vulnerability in the Elasticsearch connector.

DLF Metastore

  • Changed the default setting for Metastore logs from enabled to disabled.

  • Fixed an error caused by an excessively long URI in Metastore gettablestats.

Delta Lake

Fixed an issue with synchronizing schema changes to the Metastore.

Flink

  • Upgraded VVR to version 4.0.11. This version supports the following features:

    • Released the commercial Flink CDC feature:

      • Supports Schema Evolution.

      • Supports Flink SQL semantics for full database synchronization.

    • Supports using Gemini Statebackend to store state on OSS.

  • Provided an enterprise edition of the Hudi Connector with built-in DLF for metadata management.

Sqoop

Fixed an issue where precision was lost for the Decimal type when importing HCatalog tables with Sqoop.

EMR-3.38.2

Service

Change

SmartData

  • Upgraded SmartData to version 3.8.0. For more information, see Introduction to SmartData 3.8.x.

  • Supports authentication and authorization management for OSS based on Kerberos and Ranger.

EMR-3.38.1

Service

Change

SmartData

Upgraded SmartData to version 3.7.3. For more information, see Introduction to SmartData 3.7.x.

Spark

  • Removed the invalid Log4j MetricsAppender configuration.

  • Fixed a NullPointerException issue during SparkContext startup.

Presto

  • Fixed an issue in high availability Hadoop clusters where Presto required host configuration to query Hive tables.

  • Fixed a startup failure issue with Presto under default configurations when memory is low.

  • Fixed an issue where modifications to the worker-jvm configuration did not take effect.

  • Supports Ranger.

Impala

Fixed a no such method error that occurred when querying DLF metadata tables.

Ranger

  • Supports Presto.

  • Fixed a permission issue with Ranger Spark when inserting data into ORC and PARQUET tables.

  • Fixed an issue where Ranger Hive role permissions did not take effect after Kerberos was enabled.

DLF-Auth

  • Upgraded DLF-Auth to version 1.0.1.

  • Supports DLF permissions to control Presto permissions.

  • Fixed an issue with RAM user caching.

EMR-3.38.0

Service

Change

SmartData

Upgraded SmartData to version 3.7.2. For more information, see Introduction to SmartData 3.7.x.

Spark

  • Upgraded Spark to version 2.4.8.

  • Supports both Spark 2.4.8 and Spark 3.1.2.

    Note

    Spark3 does not support Delta or Remote Shuffle Service.

  • For the Spark 3.x series, SparkSQL performance for Distinct calculations is optimized. The optimization is triggered when an aggregate operator contains multiple count(distinct case ... when ...) expressions.

  • Fixed an array-index out of bounds issue in Adaptive Query Execution (AQE) when statistics were missing.

  • Fixed an error that occurred with AQE and Cache in specific scenarios.

Hive

Upgraded Hive to version 2.3.9.

Presto

  • Released as a standalone Presto cluster.

  • Upgraded Presto to community version 358.

    Important

    This version does not support Ranger.

  • Supports connectors such as Hudi and MySQL by default, and updated the default configurations.

  • Presto clusters support elastic scaling.

  • Supports data lake analytics.

DeltaLake

  • Unified delta-connectors for Hive 2 and Hive 3.

  • Fixed an error that occurred when querying multi-level partitioned tables with delta-connectors.

Hudi

  • Upgraded Hudi to version 0.9.0.

  • Fixed a compatibility issue with sql.extension between DeltaLake and Hudi.

HDFS

The default parameter for NameNode reserved capacity now increases automatically. This ensures that NameNode enters safe mode promptly when disk space is low.

Flink

  • Upgraded Flink to version 1.13-vvr-4.0.10, which corresponds to community Flink 1.13.1.

  • Added commercial Flink Connectors, such as the Hologres connector.

  • Added a corresponding Metric Reporter and integrated it with the APM dashboard for monitoring.

  • For the Kafka Connector, added a Kafka Catalog based on SchemaRegistry. This lets you directly read from and write to existing Kafka topics without using DDL.

Storm

The component is offline.

Zeppelin

Upgraded Zeppelin to community version 0.10.0.

Ranger

When Presto is community version 358, this version of Ranger does not support Presto access control.

Hue

  • Fixed an issue where the YARN Job Browser could not properly display or terminate jobs in some cases.

  • The YARN Job Browser is enabled in the default configurations.

  • The Presto protocol is supported in the default configurations.

Druid

Fixed a node restart failure caused by residual PID files after a server power loss.

ClickHouse

  • Updated the default configurations.

  • Supports cluster scale-out.

  • Supports the MetaChecker feature.

  • Supports reading data using the OSS table engine and OSS table function.

  • Supports custom ZooKeeper addresses at the table level.

Iceberg

Added the component. The version is 0.12.0-1.0.1.

Knox

Fixed an issue where the first access to a Spark task failed.

DLF-Auth

Added the component.

Supports DLF permissions to control Hive and Spark permissions. The version is 1.0.0.

ESS

Upgraded ESS to version 1.2.0.

EMR 3.37.x

Release date

Version

Date

EMR-3.37.1

September 2021

EMR-3.37.0 (New purchases are not supported)

August 2021

Update details

EMR-3.37.1

Service

Change

SmartData

Upgraded SmartData to version 3.7.1.

Hue

Fixed an issue where Impala could not be used in high-security clusters.

Kudu

Supports Kerberos.

EMR-3.37.0

Service

Changes

SmartData

Upgraded SmartData to version 3.7.0.

Spark

Fixed a compatibility issue with Delta Lake.

Delta Lake

  • Upgraded Delta-Connectors to support creating and querying tables using StorageHandler syntax.

  • Fixed an issue that occurred when using INSERT OVERWRITE on partitioned tables.

  • Fixed an issue where Optimize wrote virtual fields to files in G-SCD scenarios.

YARN

  • Added appId, CPU, and memory resource usage information to the node Containers REST API.

  • Fixed an issue where ApplicationMaster (AM) logs could not be viewed on nodes released by Auto Scaling.

  • Added support for cleaning up released nodes after they are decommissioned by Auto Scaling.

  • Improved the graceful decommission logic for Auto Scaling. Nodes are now marked as offline only after the NodeManager (NM) process ends.

ZooKeeper

Upgraded to community version 3.6.3.

Flink

  • Added the SmartData component.

  • Fixed an issue that prevented password-free access to OSS when submitting jobs to a DataFlow-Flink cluster through Secure Shell (SSH).

Impala

Fixed an issue that caused an infinite loop when listing directories after an OSS partition directory was directly deleted.

Hue

Fixed a display issue in the user interface when Hue is used with Oozie.

Kudu

Upgraded to community version 1.14.0.

ClickHouse

Updated the default configurations.

EMR-3.36.x

Release date

EMR-3.36.1 was released on July 16, 2021.

Updates

Service

Changes

SmartData

Upgraded SmartData to version 3.6.1.

For more information, see Introduction to SmartData 3.6.x.

Hive

  • Upgraded Hive to version 2.3.8.

  • Fixed an issue where an incorrect result was returned when you execute the show create table command using Data Lake Formation (DLF) metadata.

  • Optimized the default parameters of Hive to improve job performance.

  • Changed the names of configuration items on the hive-env tab of the Hive service Configuration page in the E-MapReduce console to uppercase for ease of use.

  • The error message that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is optimized.

HDFS

Added support for the Zstandard (ZSTD) compression format.

Flink

Upgraded Flink to version 1.12-vvr-3.0.2.

Note

Flink is removed from Hadoop clusters.

Hudi

  • Upgraded Hudi to version 0.8.0.

  • Added support for integration with Spark SQL.

Spark

  • Optimized the names of configuration items on the spark-defaults tab of the Spark service Configuration page in the E-MapReduce console.

  • Optimized the performance of log output.

  • Added support for the ZSTD compression format.

Impala

Fixed an issue that caused a core dump error when you use Hadoop Distributed File System (HDFS).

Tez

Optimized the default parameters of Tez to improve job performance.

Knox

  • Added support for the Kudu component.

  • Added support for the Impala component.

  • Added support for the Hbase component.

Phoenix

Fixed an issue where a "Java Database Connectivity (JDBC) Driver not found" error was reported when you use Hive or Spark SQL to access Phoenix tables.

ClickHouse

Enabled application performance management (APM) monitoring and alerting.

EMR-3.35.x

Release date

EMR-3.35.0 was released on April 21, 2021.

Updates

Service

Change

SmartData

Upgraded to version 3.5.0.

For version details, see Introduction to SmartData 3.5.x.

Spark

  • Fixed an issue where Adaptive Execution did not take effect in some scenarios.

  • Fixed an issue where the behavior of statistical aggregate functions was inconsistent with that of Hive.

  • Fixed an issue where data of the char type was read incorrectly from Hive ORC tables.

HDFS

Adds support for the SM4 national encryption algorithm.

Hue

Upgraded Hue to version 4.9.0.

Alluxio

Upgraded Alluxio to version 2.5.0.

Druid

  • Upgraded Druid to version 0.20.1.

  • Enhanced security.

Livy

Upgraded Livy to version 0.7.1.

EMR 3.34.x

Release date

EMR-3.34.0 was released on March 15, 2021.

Changes

Service

Changes

SmartData

Upgraded to version 3.4.0.

For more information, see Introduction to SmartData 3.4.x.

Spark

  • Optimized some default configurations.
  • Performance optimization: Added support for Window TopK pushdown.
  • Enhanced compatibility for reading and writing CSV or JSON tables in Hive.
  • The ANALYZE statement now supports omitting all table column names.
  • Added support for enabling or disabling the Lightweight Directory Access Protocol (LDAP) feature with a single click.
  • Improved the usability of the Spark Beeline tool.

Hive

  • Optimized some default configurations.

  • Performance optimization: Enhanced the cost-based optimizer (CBO).

  • Added support for enabling or disabling the LDAP feature with a single click.

  • Upgraded Calcite to version 1.12.0.

  • Added the hive.security.authorization.sqlstd.confwhitelist.append parameter.

Presto

Added support for enabling or disabling the LDAP feature with a single click.

YARN

Fixed an important security threat related to unauthorized access to the Hadoop web UI. The threat occurred when accessing the YARN web UI through a Secure Shell (SSH) tunnel, which required user.name=name to be explicitly specified in the URL.

Zookeeper

Upgraded to version 3.6.2.

Flink

Updated the config.sh file during initialization to fix an issue with HADOOP_CLASSPATH.

Impala

  • Upgraded Impala to version 3.4.0.

  • Upgraded Shiro to version 1.7.0.

  • Added support for Data Lake Formation (DLF) metadata.

  • Added support for querying data in Delta format.

  • Added support for enabling or disabling the LDAP feature with a single click.

Tez

Optimized the default configurations.

HAS

Fixed an issue where the admin.keytab file could not be re-initialized after an error occurred during the HAS installation flow.

Ranger

  • The issue caused by filter pushdown in Spark is fixed.

  • The issue that prevents Presto from being enabled after you disable Presto in Ranger is fixed.

  • LDAP authentication can be enabled or disabled with a click.

Knox

Fixed an issue with the Knox link for Druid 0.20.0.

Hue

Added support for enabling or disabling the LDAP feature with a single click.

Hudi

  • Added support for the SQL on Hudi feature.
  • Fixed an accuracy issue that occurred when querying partial data.
  • Added support for partition pruning when you query Copy On Write tables in Hudi using Spark.
  • Added support for a bucketing index mechanism to improve write performance.

Delta Lake

  • Fixed an issue where metadata could not be synchronized to Hive Metastore from an existing Delta table.
  • Fixed an issue where the MERGE command could not parse the * character.
  • Fixed an error that occurred during the creation of table metadata when transforming data from Parquet format to a Delta table.
  • Fixed an issue where the OPTIMIZE command failed when there were no files to compact.
  • The MERGE syntax now supports using a subquery as the source.
  • Introduced a caching mechanism to improve query efficiency when you use Presto to query Delta tables.
  • Added support for querying Delta tables using Impala.

Superset

  • The issue that prevents the admin user from logging on to the web UI is fixed.

  • Datasets are compatible with Druid clusters.

  • Spark SQL datasets are no longer supported.

Sqoop

Added support for importing files in Parquet format to Object Storage Service (OSS).

Alluxio

Upgraded to version 2.4.1.

Phoenix

Hive on Phoenix now supports backing field settings.

Pig

Removed.

EMR-3.33.x

Release date

EMR-3.33.0 was released on January 15, 2021.

Updates

Service

Changes

SmartData

Upgraded to version 3.2.0.

For more information, see Introduction to SmartData 3.2.x.

Spark

  • Upgraded to version 2.4.7.

  • Upgraded jQuery to version 3.5.1.

  • Added compatibility with Hive to automatically update table and partition sizes.

  • Added support for outputting Spark metadata and job running information to DataWorks.

Hive

  • Upgraded to version 2.3.7.

  • HCatalog now supports Data Lake Formation.

  • Added support for outputting Hive metadata and job running information to DataWorks.

Metastore

  • Added the Hive Statistics feature.

  • HCatalog now supports Data Lake Formation.

  • Optimized the method for obtaining STS tokens.

HDFS

Upgraded jQuery to version 3.5.1.

YARN

  • Upgraded jQuery to version 3.5.1.

  • Adjusted the Fair Scheduler configuration.

  • Optimized Timeline Server.

Zeppelin

Upgraded to version 0.9.0.

Ranger

  • Added audit log configuration for Hive.

  • Added audit configuration for Log4j.

OpenLDAP

  • Added an audit feature.

  • Enabled the SSL port (10636) by default.

  • Added support for one-click startup of Presto.

Knox

  • Fixed a Spring vulnerability.

  • Fixed an issue with viewing the Executors page in the Spark UI.

  • Fixed an issue with the Oozie job status page.

Hue

Added support for Presto.

Druid

Upgraded to version 0.20.0.

EMRHook

  • Added a new software service.

  • hive-hook: Supports outputting Hive metadata and job running information to DataWorks.

  • spark-hook: Supports outputting Spark metadata and job running information to DataWorks.

EMR-3.32.x

Release date

EMR-3.32.0 was released on November 23, 2020.

Updates

Service

Changes

SmartData

Upgraded to version 3.1.0.

For more information, see Introduction to SmartData 3.1.x.

Alluxio

  • Supports Alluxio 2.4.0.

  • Default parameter settings scale with cluster node size.

  • Uses HDFS in the EMR cluster as the default UnderFS. This feature is ready to use out of the box.

  • Enhanced the Alluxio OSS UnderFS to support new features such as OSS multi-versioning.

  • Compatible with engines such as Hadoop, Hive, Spark, and Presto.

HUDI

Supports HUDI 0.6.0.

Spark

JindoTable supports enabling or disabling the data collection feature.

Hive

  • Fixed a connection pool leak issue in HiveServer.

  • JindoTable supports enabling or disabling the data collection feature.

  • Optimized the performance of ADD COLUMN.

  • Fixed an issue where incorrect data was read from HUDI tables.

  • Default parameter settings scale with cluster node size.

HDFS

Supports a larger number of snapshots.

YARN

Default parameter settings scale with cluster node size.

Tez

Default parameter settings scale with cluster node size.

Sqoop

Fixed an issue with importing files in Avro format.

EMR 3.30.x

Release date

EMR-3.30.0 was released on October 26, 2020.

Updates

Service

Updates

SmartData

Upgraded to 3.0.0.

For more information, see Introduction to SmartData 3.0.x.

Spark

  • Added support for Alibaba Cloud Data Lake Formation (DLF) metadata.

  • Upgraded the HAS dependency to 2.0.1.

  • Fixed an issue with backticks in Streaming SQL.

  • Removed the Delta JAR package. Delta is now deployed separately.

  • Modified the log path to write all logs to HDFS.

Hive

  • Added support for Alibaba Cloud DLF metadata.

  • Resolved an issue where a DUMMY file was written when reading an empty directory in a Delta table.

  • Upgraded the HAS dependency to 2.0.1.

Presto

  • Added support for Alibaba Cloud DLF metadata.

  • Resolved an issue that limited the reading of Delta tables.

  • Fixed an issue where the JVM configuration was missing in high-security mode.

  • Upgraded the HAS dependency to 2.0.1.

HDFS

  • Added support for hot-swappable disk mode.

  • Upgraded the HAS dependency to 2.0.1.

YARN

  • Fixed an issue with YARN RMZKStateStore.

  • Added support for SNAPPY files output by SLS.

  • Modified the directory configuration for MapReduce Local mode to resolve a directory permission check issue.

  • Added support for hot-swappable disk mode.

  • Set the log path to write all logs to HDFS.

  • Upgraded the HAS dependency to 2.0.1.

Zookeeper

  • Added support for attaching the service port to an internal IP address at startup.

  • Upgraded the HAS dependency to 2.0.1.

Flink-Vvp

  • Upgraded to version 1.11-2.2.2.

  • Added support for SQL and Autopilot features.

Note

Only Dataflow clusters support Flink-Vvp. Hadoop clusters do not support Flink-Vvp at this time.

Flink

  • Added support for writing to OSS in cache mode. This feature, combined with Flink Checkpoints and a resumable Source, achieves EXACTLY_ONCE semantics.

  • Synchronized with Flink community version 1.11.1 features. SQL now supports multiple outputs (MULTI INSERT).

  • Upgraded the HAS dependency to 2.0.1.

Impala

  • Added support for custom configurations of catalogd.flgs, impalad.flgs, and statestored.flgs.

  • Upgraded Shiro to version 1.6.0.

  • Upgraded the HAS dependency to 2.0.1.

Tez

  • Optimized the default memory parameters for the Application Master (AM).

  • Upgraded the HAS dependency to 2.0.1.

HAS

Upgraded the HAS dependency to 2.0.1.

Storm

Zeppelin

Ranger

OpenLDAP

Oozie

Knox

Kafka

HUE

HBase

Druid

EMR-3.29.x

Release date

EMR-3.29.0 was released on July 29, 2020.

Updates

Service

Changes

Bigboot

  • Upgraded to version 2.7.301.

  • Jindo DistCp now supports writing data to OSS with the Archive or Infrequent Access storage class.

  • Enhanced the FUSE feature to support multiple namespaces.

  • Improved the metadata caching feature in Cache mode.

Spark

  • Upgraded Spark to 2.4.5.2.0.

  • Added support for third-party metastores.

  • Added the datalake metastore-client.

Hive

  • Upgraded Hive to 2.3.5.6.0.

  • Added support for third-party metastores.

  • Added the datalake metastore-client.

Presto

Upgraded to version 338.

Ranger

  • Upgraded the software package to 1.2.0-1.5.0.

  • Added support for Presto 338.

  • Added descriptions to configuration files.

Hadoop Distributed File System (HDFS)

Enabled adaptive configuration for the reserved space size of datanodes.

Knox

Impala, later versions of Flink, and PAI are supported.

Druid

Upgraded to version 0.18.1.

SmartData

Upgraded to version 2.7.301.

EMR 3.28.x

Release date

EMR-3.28.0 was released on June 12, 2020.

New features

Service

Changes

Bigboot

  • Releases the first version of JindoTable, which provides hotspot statistics for tables and partitions.

  • Adds support for complete storage policies in Block mode and tiered storage policies, such as Infrequent Access and Archive.

  • Adds the Jindo DistCp data migration tool.

  • Improves and fixes Jindo Fuse.

  • Improves the integration of the JFS scheme with the Hive engine and Jindo JobCommitter in Cache mode.

  • Adds a feature to set a read ratio in Block mode for reading data directly from OSS. This reduces the overhead of reading from the local cache.

  • Decouples JindoFS software modules into Bigboot (control layer), Smartdata (distributed service), and the JindoFS SDK. Each module can be independently upgraded and maintained.

Updates

Service

Changes

Flink

Upgrades open source Flink to Ververica Platform Enterprise Edition. The platform is heavily customized based on open source Flink 1.10 and provides value-added features, such as the self-developed Gemini storage engine.

Bigboot

Upgrades to version 2.7.0.

Delta

  • Upgrades to version 0.6.0.

  • Decouples the Delta code from the Spark code.

Spark

  • Upgrades to version 2.4.5.

  • Supports streaming-sql scripts from DataFactory.

  • Supports Delta 0.6.0.

Hive

Supports Delta 0.6.0.

Ranger

  • Supports custom deployments of Hadoop Distributed File System (HDFS), Hive, and Spark.

  • Supports the configuration of ranger-admin-site and ranger-ugsync-site in the console.

HDFS

Now prints DataNode exception information when an HDFS write fails due to no available DataNodes (HDFS-9023).

Hue

  • Supports installing the Hue component on Gateway clusters.

  • Supports deploying multiple Hue instances on a single node.

DataFactory

Supports Delta 0.6.0.

Druid

Upgrades to version 0.18.0.

Knox

  • Upgrades to version 1.1.0-1.0.7.

  • Supports the HBase UI.

EMR-3.27.x

Release dates

Version

Date

EMR-3.27.0

April 29, 2020

EMR-3.27.1 (New purchases are not supported)

May 8, 2020

EMR-3.27.2 (New purchases are not supported)

May 20, 2020

New features

Feature

Change

Custom component deployment

Added support for custom deployment of components on master nodes. The following components are supported:

  • Hadoop

  • Spark

  • Hive

  • Zookeeper

  • Presto

Graceful shutdown for Auto Scaling

When graceful shutdown is enabled, nodes are not released immediately. They are released after tasks are completed within a specified time period.

Updates

Service

Change

Spark

  • CUBE now supports date type partition fields.

  • Increased the stack depth of Spark-Submit.

Delta

  • Enhanced Data Definition Language (DDL) syntax, including commands such as CREATE, SHOW, and DESCRIBE.

  • Delta now supports the Optimize syntax with ZOrder.

Knox

  • Adapted for the Druid User Interface (UI).

  • Multi-master deployment is supported.

Hive

  • hcatalog tables now support the magic committer.

  • Removed some outdated default configurations.

Bigboot

  • Upgraded to version 2.6.3.

  • Multi-master deployment is supported.

SmartData

  • Upgraded to version 2.6.3.

  • Multi-master deployment is supported.

Ranger

  • Ranger now supports the Solr component.

  • Ranger now supports PrestoSQL version 311.

Tez

Tez now supports setting scratchdir on OSS.

Presto

Upgraded to version 331.

Druid

Upgraded to version 0.17.1.

Superset

Upgraded to version 0.35.2.

Sqoop

  • The MySQL Java Database Connectivity (JDBC) JAR package is upgraded to version 5.1.48.

  • The MySQL direct export mode supports setting a custom encoding using --mysql-charset.

EMR-3.26.x

Release dates

Version

Date

EMR-3.26.3 (New purchases are not supported)

April 16, 2020

Updates

Service

Changes

Bigboot

  • Upgraded to version 2.6.3.

  • Added support for OTS metadata and Namespace HA.

SmartData

Hive

HCatalog tables now support the direct committer.

YARN

Changed the default committer to JindoOssCommitter.

HDFS

Upgraded JindoFS-related configurations.

Spark

Changed the default committer to JindoOssCommitter.

EMR-3.25.x

Release date

EMR-3.25.0 was released on January 13, 2020.

New features

Ranger service: Added support for Ranger Presto operations.

Updates

Service

Changes

Ranger

  • Initialized the RangerAdmin database for high-availability (HA) clusters.

  • Fixed a security issue in the RangerUserSync startup script.

Spark

  • Added support for configuring Delta-related parameters, such as spark.sql.extensions, in the console.

  • Added support for Hive to read Delta tables without setting the input format.

  • Added support for the ALTER TABLE SET TBLPROPERTIES and UNSET TBLPROPERTIES statements.

Delta

Hive

Fixed an issue where MapReduce (MR) task execution failed in automatic local mode.

Presto

  • Upgraded to version 310.

  • Upgraded the joda-time version to 2.10.5.

Tez

  • Upgraded to version 0.9.2.

  • Fixed an issue where the application progress was not displayed correctly in the Tez user interface (UI).

  • Fixed an issue where the application history could not be viewed in the Tez UI.

Impala

Fixed an issue where Impala could not access LZO tables.

HDFS

Removed mongo-hadoop related JAR packages.

Zookeeper

Upgraded to version 3.5.6.

YARN

Adapted for the Tez UI. The yarn-site tab now supports adding the configuration item yarn.resourcemanager.system-metrics-publisher.enabled=true.

Bigboot

  • Upgraded to version 2.2.3.

  • Added support for rename operations in OSS Cache mode.

SmartData

Knox

Upgraded dependency package versions.

Oozie

Upgraded dependency package versions.

EMR-3.24.x

Release date

EMR-3.24.0 was released on November 18, 2019.

New features

Service

Changes

Delta

  • Supports SQL syntax, including ALTER, CONVERT, CREATE, CTAS, DELETE, DESC, INSERT, MERGE, OPTIMIZE, UPDATE, and VACUUM.

  • Built-in and optimized the OPTIMIZE command.

  • Supports the Hive connector.

  • Supports other existing open-source features.

Grafana

Added as a new component for standalone Flink clusters. Version: 6.4.2.

Prometheus

Added as a new component for standalone Flink clusters. Version: 2.13.0.

AlertManager

Added as a new component for standalone Flink clusters. Version: 0.19.0.

TensorFlow on spark

  • Supports running TensorFlow on Spark. This deeply integrates Spark with the deep learning framework. The integration includes optimized task scheduling and data exchange. It provides a complete workflow, from data pre-processing to deep learning training.

  • Supports streaming tasks.

Updates

Service

Changes

SmartData

  • Optimized JindoFS usage modes. The usage of Block mode is unchanged. Cache mode now supports its original usage and is also compatible with the original OSS file system usage. It supports data and metadata caching. These features can be enabled or disabled separately through configuration and are disabled by default.

  • Optimized read and write performance for Block mode and Cache mode.

  • Optimized disk cleanup. This provides more accurate statistics and more timely cleanup for hot data cached on local disks. It strictly ensures that disk usage does not exceed the quota.

  • Improved support for Gateway clusters. Block mode and Cache mode can now be used on a Gateway.

  • Supports a deployment mode where one storage cluster is separated from multiple compute clusters.

Spark

  • Added support for Delta-related parameters.

  • Added support for Ranger Spark plugin configuration.

  • Upgraded JindoCube to version 0.3.0.

Hive

  • Added logic for the SQL compatibility check feature.

  • Released a combination of Hive 2.3.5 and Hadoop 2.8.5.

  • When restarting the component, the content of hiveserver2-site.xml is no longer synchronized to hive-site.xml under spark-conf.

  • Supports using the MSCK command to add incremental folders.

  • Fixed a bug that occurred when Hive reused a Tez container.

  • Supports using the MSCK command to optimize column-based folders.

Bigboot

Upgraded to 2.2.1. Fixed issues with native code support on some machine models.

Ranger

  • Refactored the deployment method for the Spark plugin.

  • Fixed a bug where header2 in an HA cluster did not obtain the keytab.

Kudu

Fixed the startup logic.

Zookeeper

Added configuration for four-letter words. This is enabled by default.

HDFS

Added compatibility with JindoFS.

YARN

  • Changed the default value of the yarn.scheduler.capacity.node-locality-delay configuration to -1.

  • Added compatibility with JindoFS.

Has

Integrated with OpenLDAP as the backend.

OpenLDAP

Added compatibility with Has.

Presto

Upgraded to version 0.228.

Kafka

Removed D1 bad disks.

Druid

Upgraded to 0.16.0.

Flume

Upgraded to 1.9.0.

Flink

  • Upgraded to 1.9.1.

  • Supports standalone Flink clusters (released to a whitelist).

EMR-3.23.x

Release date

EMR-3.23.0 was released on September 18, 2019.

Updates

Service

Changes

Druid

  • Upgraded to 0.15.1.

  • Added the router component.

  • Upgraded fastjson.

Spark

  • Updated Spark Thrift Server to fix a class loader issue.

  • Refactored Spark transaction code to improve stability.

  • Fixed an issue with reading and writing files in ORC format after the built-in Hive was upgraded to version 2.3.

  • Added support for the MERGE INTO syntax.

  • Added support for the SCAN and STREAM syntax.

  • The Structured Streaming Kafka sink now supports exactly-once semantics (EOS).

  • Updated Delta Lake to 0.4.0.

Hive

  • Removed the old version of the Hive hook.

  • Added an optimization to handle data skew for multiple COUNT(DISTINCT) fields.

  • Fixed an issue where data was lost when joining tables with different bucket versions.

Flink

Upgraded to 1.8.2.

Bigboot

  • Updated the small file tool.

  • Updated the OSS JAR package to fix a non-daemon thread issue.

Kafka

  • Added support for the Deployment Set awareness feature.

  • Removed the fastjson dependency.

HDFS

  • Optimized the deployment logic for the SmartData OSS JAR package.

  • Updated the SmartData OSS JAR package.

Flume

Upgraded fastjson.

TensorFlow on Spark

Added this service.

HAS

Upgraded fastjson.

Livy

Upgraded fastjson.

EMR-3.22.x

Release date

EMR-3.22.0 was released on July 28, 2019.

New features

Service

Change

Kudu

  • Added Kudu as a new component. Kudu fills a gap in the Hadoop ecosystem. It provides fast data inserts and random access similar to HBase, and lets you modify data. It also provides large-scale data analytics and query capabilities similar to Hadoop Distributed File System (HDFS) or Parquet.

    • Provides C++ and Java APIs for custom development.

    • Integrates with Impala, Spark, and Hive Metastore.

  • This version of Kudu is based on Apache Kudu 1.10.0.

OpenLDAP

  • Added OpenLDAP as a new component to replace ApacheDS. ApacheDS is now offline.

  • Supports high availability (HA).

Updates

Component

Details

JindoFileSystem

  • Multiple storage modes

    • Block mode: Data is stored as blocks in the backend OSS. The local Namespace service maintains metadata. Block mode provides better metadata and data performance. Block mode supports different storage policies, including WARM (local replicas, OSS replicas), COLD (OSS replicas only), HOT (multiple local replicas, OSS replicas), TEMP (local replicas only), and ALL_HDD (multiple local replicas). The default policy is WARM. You can set different storage policies for folders based on your application scenario.

    • Cache mode: This mode is compatible with existing OSS storage methods. In Cache mode, files are stored as objects in OSS. Data and metadata for each file are cached locally based on access frequency. This improves data and metadata access performance. Cache mode provides different metadata synchronization policies to meet the needs of different scenarios.

  • External client support

    • The client software development kit (SDK) lets you access the EMR JindoFS file system from outside an EMR cluster. You can use the client to access the Namespace in Block mode. However, external clients cannot use the data cache built by EMR JindoFS within the EMR cluster. This results in lower performance compared to using it within the EMR cluster.

    • Cache mode retains the original OSS storage semantics. It uses JindoFS to accelerate data caching within the EMR cluster. Therefore, you can directly access data from outside the EMR cluster using an OSS client, such as the OSS SDK or EMR OssFileSystem.

  • Ecosystem component support

    • JindoFS now supports many compute engines on EMR, such as Spark, Flink, Hive, MapReduce, Impala, and Presto.

    • For scenarios that separate computing and storage, you can also store job logs in JindoFS, such as YARN Container logs and Spark Event logs.

    • JindoFS can be used as the HFile backend storage for HBase to expand its storage capacity.

OssFileSystem

  • Added logic to OssFileSystem to automatically detect bad disks. This fixes an issue where cache writes failed during OSS writes due to bad disks.

  • Completed the related configurations for OssFileSystem.

Bigboot

  • Upgraded to version 2.0.0.

  • Includes several major updates, such as support for multiple Namespaces, storing local data blocks as large files, multi-mode storage, and external clients.

  • Fixed an issue where the Bigboot monitor status was incorrect during a machine restart.

  • Added a service spec for the Kudu component.

  • Added correctness checks for all service specs.

Hadoop

  • HDFS

    • Adapted for HDFS Federation. You can now create HDFS Federation clusters using custom configurations and APIs. This avoids the need for a second format operation when creating a Federation cluster.

    • Optimized the bad disk detection logic. For local disk scenarios, you can trigger bad disk detection when a DataNode block report is triggered by dfsadmin.

  • YARN

    Fixed an issue where the MapReduce JobHistory job list did not update when MapReduce job Container logs were stored in JindoFS or OSS.

Spark

  • Relational Cache

    Added support for Relational Cache. Relational Cache uses pre-computation to accelerate user queries. You can create a Relational Cache to pre-compute data. When a user query is executed, the Spark Optimizer automatically finds a suitable cache, rewrites the SQL execution plan, and continues the computation based on the cached data. This improves query speed. This feature is suitable for scenarios such as reports, dashboards, data synchronization, and multidimensional analysis.

    • Use Data Definition Language (DDL) to perform operations such as CACHE, UNCACHE, ALTER, and SHOW. Cached data supports all Spark data sources and data formats.

    • Supports automatic cache data updates and updates using the REFRESH command. Supports incremental updates based on partitions.

    • Supports execution plan optimization based on Relational Cache.

  • Streaming SQL

    • Standardized the parameter configuration for Stream Query Writer.

    • Optimized the schema compatibility check for Kafka data tables.

    • If a Kafka data table schema does not exist, it is automatically created in SchemaRegistry.

    • Optimized the log information for when a Kafka schema is incompatible.

    • Fixed an issue where column names had to be explicitly specified when writing query results to a Kafka table.

    • Removed the restriction that streaming SQL queries only support Kafka and Loghub data sources.

  • Delta

    Added Delta. You can use Spark to create a Delta data source to support scenarios such as streaming data writes, transactional reads and writes, data validation, and data history. For more information, see Delta details.

    • Supports using the DataFrame API to read data from or write data to Delta.

    • Supports using the Structured Streaming API to read from or write to Delta as a source or sink.

    • Supports using the Delta API to perform operations such as update, delete, merge, vacuum, and optimize.

    • Supports using SQL to perform operations such as creating Delta-based tables, importing data to Delta, and reading from Delta tables.

  • Others

    • Added a constraint feature that supports primary keys and foreign keys.

    • Resolved JAR file conflicts, such as for servlets.

Flink

Rollback of Log4j logs

Kafka

  • Log rollback for Log4j.

  • Upgraded fastjson.

Zeppelin

Upgraded the dependent commons-lang3 package to version 3.7. This fixes an issue where PySpark could not write to OSS. For more information, see Spark 2.4 incompatibility with commons-lang3 in Zeppelin.

Ranger

Added support for SHOW GRANTS.

Analytics-Zoo

Fixed a NumPy installation error.

Impala

Now compatible with Apache Kudu 1.10.0.

Presto

Upgraded to version 0.221.

ZooKeeper

Upgraded to version 3.5.5.

Versions earlier than EMR-3.22.x

EMR-3.1.1

  • Upgraded the operating system (OS) to CentOS 7.2.

  • Upgraded Spark to version 2.1.1.

  • Upgraded emr-core to version 1.2.6.

  • Fixed a bug related to AccessKey-free operations for OSS.

EMR-3.0.2

  • Upgraded emr-core to version 1.2.5.

  • Extended AccessKey-free support for OSS to more regions.

  • Adjusted the replacement policy for role-based AccessKeys.

  • Fixed some bugs in Hive and Hadoop.

EMR-3.0.1

  • Added support for interactive mode and unified table management. You can now store Hive metadata in an external database. This allows multiple clusters to share the same metadata.

  • Upgraded emr-core to version 1.2.4, which optimizes the read and write performance of Object Storage Service (OSS).

  • Upgraded Spark to version 2.0.2.

Note

This version is fully compatible with EMR-3.0.0.

EMR-3.0.0

Initial release.