All Products
Search
Document Center

E-MapReduce:EMR 3.x series release notes

Last Updated:Mar 24, 2026

This topic describes the release dates and update details for the EMR 3.x series. For more information about the components that are supported in each version, see Release versions.

This topic covers EMR 3.x series versions from EMR-3.0.0 to EMR-3.55.x. For information about components supported in each version, see Release versions.

EMR 3.55.x

Release version information

DataLake cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
Hive2.3.9
Spark22.4.8
Spark33.4.2
YARN2.8.5
Trino422
DeltaLake3.0.0
Hudi0.15.0
Iceberg1.5.0
Flume1.11.0
Kyuubi1.9.2
Tez0.10.2
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Sqoop1.4.7
DLF-Auth2.0.2
Presto0.283
Zookeeper3.8.4
Knox1.5.0
Celeborn0.5.2
JindoCache6.10.1
Paimon1-ali-16.3

OLAP cluster

ServiceVersion
StarRocks22.5.22
StarRocks33.2.11
Doris2.1.4
ClickHouse23.8.2.7
Zookeeper3.8.4

DataFlow cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
YARN2.8.5
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Zookeeper3.8.4
Knox1.5.0
Flink1.17.2
Paimon1-ali-6.2

DataServing cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Zookeeper3.8.4
Knox1.5.0
HBase1.7.1
JindoCache6.8.2
Phoenix4.16.1

Custom cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
Hive2.3.9
Spark22.4.8
Spark33.4.2
YARN2.8.5
Trino422
DeltaLake3.0.0
Hudi0.15.0
Iceberg1.5.0
Flume1.11.0
Kyuubi1.9.2
Tez0.10.2
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Sqoop1.4.7
DLF-Auth2.0.2
Presto0.283
StarRocks22.5.22
StarRocks33.2.11
Zookeeper3.8.4
Knox1.5.0
Celeborn0.5.2
Flink1.17.2
HBase1.7.1
JindoCache6.10.1
Paimon1-ali-16.3
Phoenix4.16.1

Update details

ServiceChange
Ranger
  • Jindoauth Server supports custom RAM Roles for client users to access OSS.

  • Fixed a missing dependency in Ranger-yarn-plugin.

Paimon| Upgraded to version 1-ali-16.3. JindoCache| Upgraded to version 6.10.1.

Release date

VersionDate
EMR-3.55.0October 27, 2025

EMR 3.54.x

Release version information

DataLake cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
Hive2.3.9
Spark22.4.8
Spark33.4.2
YARN2.8.5
Trino422
DeltaLake3.0.0
Hudi0.15.0
Iceberg1.5.0
Flume1.11.0
Kyuubi1.9.2
Tez0.10.2
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Sqoop1.4.7
DLF-Auth2.0.2
Presto0.283
Zookeeper3.8.4
Knox1.5.0
Celeborn0.5.2
JindoCache6.8.2
Paimon1-ali-6.2

OLAP cluster

ServiceVersion
StarRocks22.5.22
StarRocks33.2.11
Doris2.1.4
ClickHouse23.8.2.7
Zookeeper3.8.4

DataFlow cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
YARN2.8.5
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Zookeeper3.8.4
Knox1.5.0
Flink1.17.2
Paimon1-ali-6.2

DataServing cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Zookeeper3.8.4
Knox1.5.0
HBase1.7.1
JindoCache6.8.2
Phoenix4.16.1

Custom cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
Hive2.3.9
Spark22.4.8
Spark33.4.2
YARN2.8.5
Trino422
DeltaLake3.0.0
Hudi0.15.0
Iceberg1.5.0
Flume1.11.0
Kyuubi1.9.2
Tez0.10.2
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Sqoop1.4.7
DLF-Auth2.0.2
Presto0.283
StarRocks22.5.22
StarRocks33.2.11
Zookeeper3.8.4
Knox1.5.0
Celeborn0.5.2
Flink1.17.2
HBase1.7.1
JindoCache6.8.2
Paimon1-ali-6.2
Phoenix4.16.1

Update details

ServiceChange
HiveFixed some known bugs.
TezFixed community bugs to improve performance and stability.

Release date

VersionDate
EMR-3.54.0July 10, 2025

EMR 3.53.x

Release version information

DataLake cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
Hive2.3.9
Spark22.4.8
Spark33.4.2
YARN2.8.5
Trino422
DeltaLake3.0.0
Hudi0.15.0
Iceberg1.5.0
Flume1.11.0
Kyuubi1.9.2
Tez0.10.2
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Sqoop1.4.7
DLF-Auth2.0.2
Presto0.283
Zookeeper3.8.4
Knox1.5.0
Celeborn0.5.2
JindoCache6.8.2
Paimon1-ali-6.2

OLAP cluster

ServiceVersion
StarRocks22.5.22
StarRocks33.2.11
Doris2.1.4
ClickHouse23.8.2.7
Zookeeper3.8.4

DataFlow cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
YARN2.8.5
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Zookeeper3.8.4
Knox1.5.0
Flink1.17.2
Paimon1-ali-6.2

DataServing cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Zookeeper3.8.4
Knox1.5.0
HBase1.7.1
JindoCache6.8.2
Phoenix4.16.1

Custom cluster

ServiceVersion
Hadoop-Common2.8.5
HDFS2.8.5
OSS-HDFS1.0.0
Hive2.3.9
Spark22.4.8
Spark33.4.2
YARN2.8.5
Trino422
DeltaLake3.0.0
Hudi0.15.0
Iceberg1.5.0
Flume1.11.0
Kyuubi1.9.2
Tez0.10.2
OpenLDAP2.4.46
Ranger2.3.0
Ranger-plugin1.0.0
Sqoop1.4.7
DLF-Auth2.0.2
Presto0.283
StarRocks22.5.22
StarRocks33.2.11
Zookeeper3.8.4
Knox1.5.0
Celeborn0.5.2
Flink1.17.2
HBase1.7.1
JindoCache6.8.2
Paimon1-ali-6.2
Phoenix4.16.1

Update details

ServiceChange
TrinoFixed an issue where LDAP was unavailable.
YARNFixed open source bugs (YARN-10213, YARN-6207, and YARN-9339).
StarRocksSupports the creation of clusters with separated storage and compute resources.
JindoCacheUpgraded to version 6.8.2.
EMRHOOKEnhanced stability.

Release date

VersionDate
EMR-3.53.0April 24, 2025

EMR 3.52.x

Update details

ServiceChange
Spark
  • Fixed a configuration issue that occurred during scale-out.

  • Fixed an issue where SASL connections occasionally failed in Kerberos clusters.

Hive| Fixed a configuration issue that occurred during scale-out. Trino| Resolved an issue where connections failed after LDAP was enabled. Presto Zookeeper| Supports adding custom configurations. Ranger| Replaced the existing Spark 3 Ranger plugin with the version provided by the open source Kyuubi project. Hudi| Upgraded to version 0.15.0. Celeborn| Upgraded to version 0.5.2. JindoCache| Upgraded to version 6.5.3. StarRocks3| Upgraded to version 3.2.11. Kyuubi| Upgraded to version 1.9.2. StarRocks2| Upgraded to version 2.5.22. Impala| The service is unavailable. You can use the recommended service as an alternative or manually install the corresponding service. You can replace Impala with Presto, Trino, ClickHouse, or StarRocks. Kudu Kafka-Manager

Release date

VersionDate
EMR-3.52.1December 18, 2024
EMR-3.52.0 (New purchases are not supported)December 4, 2024

EMR 3.51.x

Update details

EMR-3.51.4

ServiceChange
JindoCacheUpgraded to version 6.5.3.
StarRocks2Upgraded to version 2.5.22.
StarRocks3Upgraded to version 3.2.11.

EMR-3.51.3

ServiceDescription
JindoSDKJindoSDK is updated to resolve the issue that causes deadlocks.

EMR-3.51.2

ServiceDescription
JindoCache
  • JindoCache is updated to 6.5.1.

  • The performance of reading data from and writing data to distributed hash tables is improved.

Spark|

  • The issue that partition directories cannot be deleted is fixed.

  • The issue related to the Hive package dependency is fixed. This ensures that the connection between Spark and the Metastore client remains uninterrupted.

Trino|

  • The issue that some modified configurations may be unexpectedly restored to original configurations during a scale-out is fixed.

  • Data in the OSS-HDFS service that is deployed in a high-security cluster can be queried.

  • The issue that exceptions occur on Trino after DLF-Auth is enabled is fixed.

Presto| Data in the OSS-HDFS service that is installed in a high-security cluster can be queried. HDFS| The issue that the memory size of NameNodes and DataNodes cannot be modified is fixed. HBase-HDFS YARN|

  • Multiple timeline events can be sent by the ResourceManager at a time, which improves the processing capability.

  • The logic issue in processing containers and resources of the ResourceManager is fixed.

ZooKeeper|

  • The issue that the memory configuration of a node group cannot be modified is fixed.

  • The log configuration files can be reconstructed.

Impala| The issue that client configurations are unexpectedly modified during an auto scaling activity is fixed. Ranger| The latest version of JindoSDK is supported, which effectively reduces the CPU load. Knox| The following issue is fixed: The URL of Knox fails to be accessed when a cluster has only one Master Extend node group. Kafka| The following issue is fixed: The EMR cluster in which Kafka Connect is deployed fails to be started. StarRocks| The issue that added BE nodes are not displayed after a scale-out is fixed. Doris| Doris is updated to 2.1.4. Paimon| Paimon is updated to 0.9-ali-7. EMR-HOOK| The lineage information of a MaxCompute table can be parsed.

EMR-3.51.1

ServiceChange
SparkSupports deploying the Master-Extend node group.

Hive Kyuubi Paimon| Replaced the Flink dependency from the VVR version to the community version and added support for DLF Catalog. Knox| Packaged using JDK 8. Flink| Restored the DLF configurations and dependencies that were removed in EMR-3.51.0.

EMR-3.51.0

ServiceChange
SparkUpgraded Spark3 to version 3.4.2.
CelebornUpgraded to version 0.4.0.
DorisUpgraded to version 2.1.0.
StarRocks
  • Upgraded StarRocks2 to version 2.5.18.

  • Upgraded StarRocks3 to version 3.2.4.

DeltaLake| Upgraded to version 3.0.0. Iceberg| Upgraded to version 1.5.0. Zookeeper| Upgraded to version 3.8.4. JindoCache| Upgraded to version 6.2.5. Flink| Upgraded to version 1.17.2.

Release date

VersionDate
EMR-3.51.4December 18, 2024
EMR-3.51.3 (New purchases are not supported)November 29, 2024
EMR-3.51.2 (New purchases are not supported)August 29, 2024
EMR-3.51.1 (New purchases are not supported)June 21, 2024
EMR-3.51.0 (New purchases are not supported)April 23, 2024

EMR 3.50.x

Update details

ServiceChange
HudiUpgraded to version 0.14.0.
FlumeUpgraded to version 1.11.0.
KyuubiUpgraded to version 1.7.3.
ImpalaUpgraded to version 4.3.0.
CelebornUpgraded to version 0.3.2.
JindoCacheUpgraded to version 6.2.0.
PaimonUpgraded to version 0.7-ali-1.
Kafka
  • Upgraded to version 3.6.1.

  • Fixed a SASL security authentication vulnerability in the Kafka Connect component.

Spark| Fixed the Commons Text vulnerability. StarRocks|

  • Upgraded StarRocks2 to version 2.5.13.

  • Upgraded StarRocks3 to version 3.1.5.

Ranger|

  • Fixed the Commons Text vulnerability.

  • Fixed the Spring Security path matching permission bypass vulnerability.

  • Fixed the Spring Security forward/include authentication bypass vulnerability.

  • Fixed the Spring Framework identity authentication bypass vulnerability under a special matching pattern.

  • Supports modifying the period for Ranger to synchronize LDAP users.

Release date

VersionDate
EMR-3.50.0February 19, 2024

EMR 3.49.x

Update details

ServiceChange
JindoCacheAdded the component. The version is 6.1.1.
JindoDataJindoData is unavailable. You can use JindoCache for data caching and DLF-Auth for authentication.
SparkRemoved jdo-related configurations from hive-site.xml.
HBaseAdded a configuration item. You can select the HBase Thrift Server version, including v1 and v2, as needed.
StarRocksUpgraded StarRocks2 to version 2.5.10.
DorisUpgraded Doris to version 1.2.7.
CelebornUpgraded Celeborn to version 0.3.1.
PaimonUpgraded Paimon to version 0.6-ali-2.
ClickHouseUpgraded ClickHouse to version 23.8.2.7.

Release date

VersionDate
EMR-3.49.1November 16, 2023
EMR-3.49.0 (New purchases are not supported)October 27, 2023

EMR 3.48.x

Update details

ServiceChange
Trino
  • Fixed an issue where the Paimon connector could not successfully query HDFS tables.

  • Fixed an issue where worker monitoring metrics could not be read.

Presto|

  • Upgraded to version 0.283.

  • Fixed an issue where worker monitoring metrics could not be read.

ClickHouse| Granted all permissions to the default user by default. StarRocks|

  • Renamed the previous StarRocks to StarRocks2.

  • Added StarRocks3, version 3.1.2. By default, it is created as a storage-compute coupled version. Storage-compute separated versions are not supported.

Celeborn| Upgraded to version 0.3.0.

Release date

VersionDate
EMR-3.48.2August 17, 2023

EMR 3.47.x

Update details

ServiceChange
HudiUpgraded to version 0.13.1.
PaimonUpgraded to version 0.5-ali-1.
StarRocksUpgraded to version 2.5.8.
JindoDataUpgraded to version 4.6.11.
Trino
  • Upgraded to version 422.

  • The Hudi connector supports querying Merge-On-Read (MOR) tables.

  • Optimized error messages for dynamic UDF loading.

Release date

VersionDate
EMR-3.47.0August 3, 2023

EMR 3.46.x

Update details

EMR-3.46.1

ServiceDescription
Spark
  • By default, OSS-HDFS is used to store data of Spark History Server.

  • OSS or OSS-HDFS is used to store data of Spark3 Native Engine.

Hive| By default, OSS-HDFS is used to store data in Hive warehouse files. OSS-HDFS| The OSS-HDFS service is added. YARN| By default, OSS-HDFS is used to store data. HBase|

  • By default, OSS-HDFS is used to store HBase data in the HFile format.

  • OSS-HDFS is used to store write-ahead logging (WAL) logs of HBase.

EMR-3.46.0

ServiceChange
KyuubiUpgraded to version 1.7.1.
CelebornUpgraded to version 0.2.2.
Paimon
  • Renamed Flink-Table-Store to Paimon.

  • Upgraded to version 0.4-ali-1.

Starrocks| Upgraded to version 2.5.5. Doris| Upgraded to version 1.2.4. ClickHouse| Upgraded to version 22.8.17.17. Trino| Provided a simple Event Listener by default to obtain audit logs. Phoenix| Supports Hive on Phoenix.

Release date

VersionDate
EMR-3.46.1July 13, 2023
EMR-3.46.0 (New purchases are not supported)June 1, 2023

EMR 3.45.x

Update details

EMR-3.45.1

ServiceDescription
ClickHouseClickHouse is updated to 22.8.14.53.
TrinoThe odps.properties connector is added. This allows you to query MaxCompute data.
JindoDataJindoData is updated to 4.6.5.
JindoSDKJindoSDK is updated to 4.6.5.
Flink Table StoreFlink Table Store is updated to 0.3-ali-2.
YARNThe Node Labels feature is supported.

EMR-3.45.0

ServiceChange
IcebergUpgraded to version 1.1.0.
Hudi
  • Upgraded to version 0.12.2.

  • Supports the CDC feature.

Kudu| Upgraded to version 1.16.0. Clickhouse|

  • Upgraded to version 22.3.8.39.

  • The ZooKeeper service must be selected when you install the ClickHouse service.

Celeborn|

  • Renamed RSS to Celeborn.

  • The version of Celeborn is 0.2.0.

Presto| Added the service. The kernel is community Facebook PrestoDB 0.278.3. The default HTTP port is 8889, and the HTTPS port is 7779. DeltaLake| Upgraded to version 2.2.0. StarRocks| Upgraded to version 2.4.3. Doris| Upgraded to version 1.2.1. Kafka-Manager| Upgraded to version 3.0.0.6. Impala| The service is offline. OpenLDAP| Upgraded to version 2.4.46. Kyuubi| Upgraded to version 1.6.1. Ranger| Upgraded to version 2.3.0. HBase|

  • Supports ThriftServer2.

  • The default value of the hbase.block.data.cachecompressed parameter is changed to true.

Flink-Table-Store| Added the service, based on community version 0.3. JindoData| Upgraded to version 4.6.4.

Release date

VersionDate
EMR-3.45.1April 3, 2023
EMR-3.45.0 (New purchases are not supported)February 28, 2023

EMR 3.44.x

Update details

ServiceChange
IcebergUpgraded to version 0.14.1.
FlinkUpgraded to Flink1.15-vvr-6.0.2, which corresponds to the community Flink 1.15 major version.
Kafka
  • Supports LDAP user logon authentication and authorization.

  • Supports user group authorization.

Trino|

  • EMR Presto was renamed to its official community name, Trino.

  • Supports Ranger and DLF AUTH.

  • Fixed an issue where connections to worker nodes failed after LDAP was enabled with a single click.

JindoSDK| Upgraded to version 4.6.2. JindoData| Upgraded to version 4.6.2. HBase|

  • Supports Ranger.

  • Fixed an issue where OSS-HDFS could not be selected as the storage mode when adding a service.

YARN| ACLs are enabled by default in high-security mode. Starrocks| Upgraded to version 2.3.4. Doris| Upgraded to version 1.1.5. Hudi| The console supports configuring hudi-defaults.conf. Ranger| Supports integration with Trino, YARN, HBase, and Kafka. DLF-Auth|

  • Upgraded to version 2.0.2.

  • Supports Trino and Impala.

OpenLDAP| Integrated with the Nslcd component. Kudu| Kudu Tserver can no longer be installed in the Task node group. Spark| Upgraded to version 3.3.1. Tez| Upgraded to version 0.10.2. Kyuubi| Upgraded to version 1.6.0.

Release date

EMR-3.44.0 was released on December 1, 2022.

EMR 3.43.x

Update details

EMR-3.43.1

ServiceChange
KerberosSupports connecting to an external KDC on EMR.
KafkaSupports adding a startup command configuration item to customize service startup parameters.
JindoData
  • Upgraded to version 4.6.0.

  • Supports rewriting OSS-HDFS access paths.

Flink| Upgraded to version 1.13_vvr_4.0.15. RSS| Upgraded to version 0.1.4.

EMR-3.43.0

ServiceChange
Spark
  • Upgraded to version 3.3.

  • Supports enabling Kerberos identity authentication.

Hudi|

  • Upgraded to version 0.12.0.

  • Supports Spark 3.3.

  • Supports using a cloud MetaStore to host metadata and enabling the acceleration feature. For more information, see Hudi MetaStore usage guide.

Flink|

  • Supports enabling Kerberos identity authentication.

  • Supports automatic connection with Data Lake Formation (DLF).

Iceberg|

  • Upgraded to version 0.14.0.

  • Supports Spark 3.3.

  • Supports enabling Kerberos identity authentication.

JindoData|

  • Upgraded to version 4.5.1.

  • Supports accessing Alibaba Cloud resources without plaintext AccessKeys.

Hadoop-Common and HDFS|

  • Supports enabling Kerberos identity authentication.

  • Fixed security vulnerability CVE-2022-25168.

Knox| Integrated with Ranger. The Ranger UI can be accessed from the Access Links And Ports tab. HBase|

  • Upgraded to version 1.7.1.

  • Supports enabling Kerberos identity authentication.

  • Supports group-based configuration.

RSS|

  • Upgraded to version 0.1.2.

  • Supports enabling Kerberos identity authentication.

Doris|

  • Upgraded to version 1.1.2.

  • Supports enabling Kerberos identity authentication.

StarRocks|

  • Upgraded to version 2.2.6.

  • Supports enabling Kerberos identity authentication.

Kafka|

  • Upgraded to version 2.13_3.2.1.

  • Supports enabling Kerberos identity authentication.

DeltaLake|

  • Upgraded to version 2.1.0.

  • Supports Spark 3.3.

  • Supports enabling Kerberos identity authentication.

Kudu| Added the component. The version is 1.14.0. Impala|

  • Supports creating views in DLF.

  • Supports enabling Kerberos identity authentication.

YARN, Imapla, Ranger, Hive, Kyuubi, Tez, Kafka, Zookeeper, DLF-Auth, Phoenix, Sqoop, Presto| Supports enabling Kerberos identity authentication.

Release date

VersionDate
EMR-3.43.1November 08, 2022
EMR-3.43.0 (New purchases are not supported)October 14, 2022

EMR 3.42.x

Update details

ServiceChange
HiveSupports one-click integration with LDAP.
Presto
  • Upgraded to community version 389.Uses the standalone Delta Lake and Hudi connectors provided by the community.

    • This version of the Delta Lake connector does not support Time Travel and Z-Order.

    • This version of the Hudi connector does not support querying MOR tables.

  • Supports one-click integration with LDAP.

DeltaLake|

  • Integrated with DLF for automated lake table management.

  • Supports Ranger authorization.

  • Fixed an issue where statistics could not be collected for timestamp fields.

  • The optimize and vacuum commands now support returning metric information.

Hudi| Upgraded to version 0.11.1. HadoopCommon| Added a new component to resolve the issue of HDFS, YARN, and JindoSDK configurations overwriting each other. YARN| Enhanced elastic features. Ranger|

  • Supports both Spark2 and Spark3.

  • Ranger Usersync supports one-click integration with LDAP.

Kafka| CruiseControl automatically creates related topics on startup. HBase| Added the component. The version is 1.4.9. Phoenix| Added the component. The version is 4.14.1. Doris| Upgraded to version 1.1.1. StarRocks| Upgraded to version 2.2.3. ClickHouse| Fixed a memory overflow issue when reading large files from OSS.

Release date

EMR-3.42.0 was released on August 5, 2022.

EMR 3.40.x

Update details

ServiceChange
JindoDataAdded the component. The version is 4.3.0.
JindoSDKUpgraded to version 4.3.0.
SparkUpgraded to version 3.2.1.
Hive
  • Fixed a bug where TEZ repeatedly committed when Speculation was enabled.

  • Fixed a bug where UDFs could only be called after reloading the function.

Presto| Fixed a bug where the Presto service could not be started after it was added when the Hadoop cluster was initialized. DeltaLake| Fixed a compatibility issue with Streaming SQL. Hudi| Upgraded to version 0.10.1. Iceberg| Upgraded to version 0.13.1. YARN|

  • Added a feature to restrict ApplicationMasters (AMs) to run only on CORE group nodes.

  • Fixed an issue where the mareduce.map.java.opts configuration was missing taihaodoctor.

Zookeeper| Optimized JVM parameter configurations. Flink| Adapted to JindoSDK 4.3.0. Impala Flume Druid Sqoop| Upgraded the PostgreSQL version. Zeppelin| Resolved a startup failure issue with the JDBC Interpreter. Ranger| The Ranger 1.2.0 Spark Plugin supports Hudi. Oozie| Upgraded Log4j to version 2.17.2. HBase| Fixed an issue where RegionServer could not be started in HBase 1.4.9. DLF-Auth| Upgraded to version 2.0.0.

Release date

EMR-3.40.0 was released on April 21, 2022.

EMR 3.39.x

Update details

EMR-3.39.2

__

Note

Only OLAP clusters and DataFlow clusters in the new EMR console support this version.

ServiceChange
Flink
  • Improved the application performance management (APM) dashboard and added new monitoring metrics, such as sourceIdleTime.

  • Supports CloudMonitor alerts.

Kafka|

  • Supports SSL and SASL configurations.

  • Modified the default values of some parameters.

Clickhouse| Modified the default values of some parameters.

EMR-3.39.1

ServiceChange
SmartDataThe component is offline.

BIGBOOT RSS|

  • Upgraded the ESS service to RSS. For more information, see RSS.

  • Enhanced the features and stability of the service.

JindoSDK|

  • Upgraded the architecture to JindoData.

  • EMR integrates JindoSDK 4.0 for the first time and supports services such as OSS and OSS-HDFS..

Spark|

  • Optimized Hive on Spark.

  • Adapted to JindoSDK.

Tez| Adapted to JindoSDK. Hive| Adapted to JindoSDK. Presto|

  • Supports dynamic UDF loading.

  • Delta Lake tables support Time Travel queries with the for ... as of syntax.

  • Added a standalone Delta Lake Catalog, provided default Delta connector configurations, and supported Z-order Dataskip optimization based on the standalone Catalog.

  • Fixed an issue where the Hudi connector could not query Hudi MOR tables. The Hive connector does not support querying Hudi MOR tables.

  • Adapted to JindoSDK.

Delta Lake|

  • Metadata management

    • Used the built-in Spark Catalog instead of the Hive CLI API to synchronize metadata and partition information.

    • Automatically reports table statistics (dataProfiling) to the MetaStore.

  • SQL

    • Supports Time Travel syntax.

    • Supports DropPartition SQL syntax.

    • Supports ADD COLUMN operations at specified positions (FIRST and AFTER).

  • Enhanced table management capabilities

    • Supports and enables dynamic adjustment of filesize based on table size by default.

    • Supports and enables automatic Vacuum by default. Supports concurrent Vacuum.

    • Optimized the logic for automatic compaction, which is disabled by default.

    • Added Zorder syntax and accelerated the Zorder process.

Hudi| Upgraded to version 0.10.0. HDFS| Adapted to JindoSDK. YARN| Adapted to JindoSDK. Flume| Adapted to JindoSDK. Flink|

  • By default, the Flink lib directory is uploaded to the HDFS cluster, so that you can use it with the yarn.provided.lib.dirs parameter.

  • Adapted to JindoSDK.

Impala| Adapted to JindoSDK. Ranger|

  • Fixed a startup failure issue with Spark History Server.

  • Adapted to JindoSDK.

HBase|

  • Fixed an issue with default parameters.

  • Fixed a GC log date format issue.

  • Fixed a restart issue when RS used an IP address.

Druid| Adapted to JindoSDK. Clickhouse| Optimized the handling logic when the ClickHouse component is stopped. Iceberg|

  • Upgraded to version 0.13.0.

  • Hid default configuration items to improve user experience.

DLF-Auth| Fixed a startup failure issue with Spark History Server. StarRocks| Added the service to the new console.Version 2.0.1 is published.

Release date

VersionDate
EMR-3.39.2March 25, 2022
EMR-3.39.1 (New purchases are not supported)February 15, 2022

EMR 3.38.x

Update details

EMR-3.38.3

Fixed the Log4j security vulnerability in all related components. For more information, see Vulnerability announcement | Apache Log4j2 remote code execution vulnerability.

ServiceChange
Presto
  • Fixed an error that occurred when Presto queried Hudi tables in a high availability cluster.

  • Fixed the Log4j vulnerability in the Elasticsearch connector.

DLF Metastore|

  • Changed the default setting for Metastore logs from enabled to disabled.

  • Fixed an error caused by an excessively long URI in Metastore gettablestats.

Delta Lake| Fixed an issue with synchronizing schema changes to the Metastore. Flink|

  • Upgraded VVR to version 4.0.11. This version supports the following features:

    • Released the commercial Flink CDC feature:

      • Supports Schema Evolution.

      • Supports Flink SQL semantics for full database synchronization.

    • Supports using Gemini Statebackend to store state on OSS.

  • Provided an enterprise edition of the Hudi Connector with built-in DLF for metadata management.

Sqoop| Fixed an issue where precision was lost for the Decimal type when importing HCatalog tables with Sqoop.

EMR-3.38.2

ServiceChange
SmartData
  • Upgraded SmartData to version 3.8.0. For more information, see Introduction to SmartData 3.8.x.

  • Supports authentication and authorization management for OSS based on Kerberos and Ranger.

EMR-3.38.1

ServiceChange
SmartDataUpgraded SmartData to version 3.7.3. For more information, see Introduction to SmartData 3.7.x.
Spark
  • Removed the invalid Log4j MetricsAppender configuration.

  • Fixed a NullPointerException issue during SparkContext startup.

Presto|

  • Fixed an issue in high availability Hadoop clusters where Presto required host configuration to query Hive tables.

  • Fixed a startup failure issue with Presto under default configurations when memory is low.

  • Fixed an issue where modifications to the worker-jvm configuration did not take effect.

  • Supports Ranger.

Impala| Fixed a no such method error that occurred when querying DLF metadata tables. Ranger|

  • Supports Presto.

  • Fixed a permission issue with Ranger Spark when inserting data into ORC and PARQUET tables.

  • Fixed an issue where Ranger Hive role permissions did not take effect after Kerberos was enabled.

DLF-Auth|

  • Upgraded DLF-Auth to version 1.0.1.

  • Supports DLF permissions to control Presto permissions.

  • Fixed an issue with RAM user caching.

EMR-3.38.0

ServiceChange
SmartDataUpgraded SmartData to version 3.7.2. For more information, see Introduction to SmartData 3.7.x.
Spark
  • Upgraded Spark to version 2.4.8.

  • Supports both Spark 2.4.8 and Spark 3.1.2.__Note Spark3 does not support Delta or Remote Shuffle Service.

  • For the Spark 3.x series, SparkSQL performance for Distinct calculations is optimized. The optimization is triggered when an aggregate operator contains multiple count(distinct case ... when ...) expressions.

  • Fixed an array-index out of bounds issue in Adaptive Query Execution (AQE) when statistics were missing.

  • Fixed an error that occurred with AQE and Cache in specific scenarios.

Hive| Upgraded Hive to version 2.3.9. Presto|

  • Released as a standalone Presto cluster.

  • Upgraded Presto to community version 358.__Important This version does not support Ranger.

  • Supports connectors such as Hudi and MySQL by default, and updated the default configurations.

  • Presto clusters support elastic scaling.

  • Supports data lake analytics.

DeltaLake|

  • Unified delta-connectors for Hive 2 and Hive 3.

  • Fixed an error that occurred when querying multi-level partitioned tables with delta-connectors.

Hudi|

  • Upgraded Hudi to version 0.9.0.

  • Fixed a compatibility issue with sql.extension between DeltaLake and Hudi.

HDFS| The default parameter for NameNode reserved capacity now increases automatically. This ensures that NameNode enters safe mode promptly when disk space is low. Flink|

  • Upgraded Flink to version 1.13-vvr-4.0.10, which corresponds to community Flink 1.13.1.

  • Added commercial Flink Connectors, such as the Hologres connector.

  • Added a corresponding Metric Reporter and integrated it with the APM dashboard for monitoring.

  • For the Kafka Connector, added a Kafka Catalog based on Schema Registry. This lets you directly read from and write to existing Kafka topics without using DDL.

Storm| The component is offline. Zeppelin| Upgraded Zeppelin to community version 0.10.0. Ranger| When Presto is community version 358, this version of Ranger does not support Presto access control. Hue|

  • Fixed an issue where the YARN Job Browser could not properly display or terminate jobs in some cases.

  • The YARN Job Browser is enabled in the default configurations.

  • The Presto protocol is supported in the default configurations.

Druid| Fixed a node restart failure caused by residual PID files after a server power loss. ClickHouse|

  • Updated the default configurations.

  • Supports cluster scale-out.

  • Supports the MetaChecker feature.

  • Supports reading data using the OSS table engine and OSS table function.

  • Supports custom ZooKeeper addresses at the table level.

Iceberg| Added the component. The version is 0.12.0-1.0.1. Knox| Fixed an issue where the first access to a Spark task failed. DLF-Auth| Added the component.Supports DLF permissions to control Hive and Spark permissions. The version is 1.0.0. ESS| Upgraded ESS to version 1.2.0.

Release date

VersionDate
EMR-3.38.3December 2021
EMR-3.38.2 (New purchases are not supported)December 2021
EMR-3.38.1 (New purchases are not supported)November 2021
EMR-3.38.0 (New purchases are not supported)October 2021

EMR 3.37.x

Update details

EMR-3.37.1

ServiceChange
SmartDataUpgraded SmartData to version 3.7.1.
HueFixed an issue where Impala could not be used in high-security clusters.
KuduSupports Kerberos.

EMR-3.37.0

ServiceChanges
SmartDataUpgraded SmartData to version 3.7.0.
SparkFixed a compatibility issue with Delta Lake.
Delta Lake
  • Upgraded Delta-Connectors to support creating and querying tables using StorageHandler syntax.

  • Fixed an issue that occurred when using INSERT OVERWRITE on partitioned tables.

  • Fixed an issue where Optimize wrote virtual fields to files in G-SCD scenarios.

YARN|

  • Added appId, CPU, and memory resource usage information to the node Containers REST API.

  • Fixed an issue where ApplicationMaster (AM) logs could not be viewed on nodes released by Auto Scaling.

  • Added support for cleaning up released nodes after they are decommissioned by Auto Scaling.

  • Improved the graceful decommission logic for Auto Scaling. Nodes are now marked as offline only after the NodeManager (NM) process ends.

ZooKeeper| Upgraded to community version 3.6.3. Flink|

  • Added the SmartData component.

  • Fixed an issue that prevented password-free access to OSS when submitting jobs to a DataFlow-Flink cluster through Secure Shell (SSH).

Impala| Fixed an issue that caused an infinite loop when listing directories after an OSS partition directory was directly deleted. Hue| Fixed a display issue in the user interface when Hue is used with Oozie. Kudu| Upgraded to community version 1.14.0. ClickHouse| Updated the default configurations.

Release date

VersionDate
EMR-3.37.1September 2021
EMR-3.37.0 (New purchases are not supported)August 2021

EMR-3.36.x

Updates

ServiceChanges
SmartDataUpgraded SmartData to version 3.6.1.For more information, see Introduction to SmartData 3.6.x.
Hive
  • Upgraded Hive to version 2.3.8.

  • Fixed an issue where an incorrect result was returned when you execute the show create table command using Data Lake Formation (DLF) metadata.

  • Optimized the default parameters of Hive to improve job performance.

  • Changed the names of configuration items on the hive-env tab of the Hive service Configuration page in the E-MapReduce console to uppercase for ease of use.

  • The error message that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is optimized.

HDFS| Added support for the Zstandard (ZSTD) compression format. Flink| Upgraded Flink to version 1.12-vvr-3.0.2.__Note Flink is removed from Hadoop clusters. Hudi|

  • Upgraded Hudi to version 0.8.0.

  • Added support for integration with Spark SQL.

Spark|

  • Optimized the names of configuration items on the spark-defaults tab of the Spark service Configuration page in the E-MapReduce console.

  • Optimized the performance of log output.

  • Added support for the Zstandard (ZSTD) compression format.

Impala| Fixed an issue that caused a core dump error when you use Hadoop Distributed File System (HDFS). Tez| Optimized the default parameters of Tez to improve job performance. Knox|

  • Added support for the Kudu component.

  • Added support for the Impala component.

  • Added support for the Hbase component.

Phoenix| Fixed an issue where a "Java Database Connectivity (JDBC) Driver not found" error was reported when you use Hive or Spark SQL to access Phoenix tables. ClickHouse| Enabled application performance management (APM) monitoring and alerting.

Release date

EMR-3.36.1 was released on July 16, 2021.

EMR-3.35.x

Updates

ServiceChange
SmartDataUpgraded to version 3.5.0.For version details, see Introduction to SmartData 3.5.x.
Spark
  • Fixed an issue where Adaptive Execution did not take effect in some scenarios.

  • Fixed an issue where the behavior of statistical aggregate functions was inconsistent with that of Hive.

  • Fixed an issue where data of the char type was read incorrectly from Hive ORC tables.

HDFS| Adds support for the SM4 national encryption algorithm. Hue| Upgraded Hue to version 4.9.0. Alluxio| Upgraded Alluxio to version 2.5.0. Druid|

  • Upgraded Druid to version 0.20.1.

  • Enhanced security.

Livy| Upgraded Livy to version 0.7.1.

Release date

EMR-3.35.0 was released on April 21, 2021.

EMR 3.34.x

Changes

ServiceChanges
SmartDataUpgraded to version 3.4.0.For more information, see Introduction to SmartData 3.4.x.
Spark
  • Optimized some default configurations.

  • Performance optimization: Added support for Window TopK pushdown.

  • Enhanced compatibility for reading and writing CSV or JSON tables in Hive.

  • The ANALYZE statement now supports omitting all table column names.

  • Added support for enabling or disabling the Lightweight Directory Access Protocol (LDAP) feature with a single click.

  • Improved the usability of the Spark Beeline tool.

Hive|

  • Optimized some default configurations.

  • Performance optimization: Enhanced the cost-based optimizer (CBO).

  • Added support for enabling or disabling the LDAP feature with a single click.

  • Upgraded Calcite to version 1.12.0.

  • Added the hive.security.authorization.sqlstd.confwhitelist.append parameter.

Presto| Added support for enabling or disabling the LDAP feature with a single click. YARN| Fixed an important security threat related to unauthorized access to the Hadoop web UI. The threat occurred when accessing the YARN web UI through a Secure Shell (SSH) tunnel, which required user.name=name to be explicitly specified in the URL. Zookeeper| Upgraded to version 3.6.2. Flink| Updated the config.sh file during initialization to fix an issue with HADOOP_CLASSPATH. Impala|

  • Upgraded Impala to version 3.4.0.

  • Upgraded Shiro to version 1.7.0.

  • Added support for Data Lake Formation (DLF) metadata.

  • Added support for querying data in Delta format.

  • Added support for enabling or disabling the LDAP feature with a single click.

Tez| Optimized the default configurations. HAS| Fixed an issue where the admin.keytab file could not be re-initialized after an error occurred during the HAS installation flow. Ranger|

  • The issue caused by filter pushdown in Spark is fixed.

  • The issue that prevents Presto from being enabled after you disable Presto in Ranger is fixed.

  • LDAP authentication can be enabled or disabled with a click.

Knox| Fixed an issue with the Knox link for Druid 0.20.0. Hue| Added support for enabling or disabling the LDAP feature with a single click. Hudi|

  • Added support for the SQL on Hudi feature.

  • Fixed an accuracy issue that occurred when querying partial data.

  • Added support for partition pruning when you query Copy-On-Write (COW) tables in Hudi using Spark.

  • Added support for a bucketing index mechanism to improve write performance.

Delta Lake|

  • Fixed an issue where metadata could not be synchronized to Hive Metastore from an existing Delta table.

  • Fixed an issue where the MERGE command could not parse the * character.

  • Fixed an error that occurred during the creation of table metadata when transforming data from Parquet format to a Delta table.

  • Fixed an issue where the OPTIMIZE command failed when there were no files to compact.

  • The MERGE syntax now supports using a subquery as the source.

  • Introduced a caching mechanism to improve query efficiency when you use Presto to query Delta tables.

  • Added support for querying Delta tables using Impala.

Superset|

  • The issue that prevents the admin user from logging on to the web UI is fixed.

  • Datasets are compatible with Druid clusters.

  • Spark SQL datasets are no longer supported.

Sqoop| Added support for importing files in Parquet format to Object Storage Service (OSS). Alluxio| Upgraded to version 2.4.1. Phoenix| Hive on Phoenix now supports backing field settings. Pig| Removed.

Release date

EMR-3.34.0 was released on March 15, 2021.

EMR-3.33.x

Updates

ServiceChanges
SmartDataUpgraded to version 3.2.0.For more information, see Introduction to SmartData 3.2.x.
Spark
  • Upgraded to version 2.4.7.

  • Upgraded jQuery to version 3.5.1.

  • Added compatibility with Hive to automatically update table and partition sizes.

  • Added support for outputting Spark metadata and job running information to DataWorks.

Hive|

  • Upgraded to version 2.3.7.

  • HCatalog now supports Data Lake Formation.

  • Added support for outputting Hive metadata and job running information to DataWorks.

Metastore|

  • Added the Hive Statistics feature.

  • HCatalog now supports Data Lake Formation.

  • Optimized the method for obtaining STS tokens.

HDFS| Upgraded jQuery to version 3.5.1. YARN|

  • Upgraded jQuery to version 3.5.1.

  • Adjusted the Fair Scheduler configuration.

  • Optimized Timeline Server.

Zeppelin| Upgraded to version 0.9.0. Ranger|

  • Added audit log configuration for Hive.

  • Added audit configuration for Log4j.

OpenLDAP|

  • Added an audit feature.

  • Enabled the SSL port (10636) by default.

  • Added support for one-click startup of Presto.

Knox|

  • Fixed a Spring vulnerability.

  • Fixed an issue with viewing the Executors page in the Spark UI.

  • Fixed an issue with the Oozie job status page.

Hue| Added support for Presto. Druid| Upgraded to version 0.20.0. EMRHook|

  • Added a new software service.

  • hive-hook: Supports outputting Hive metadata and job running information to DataWorks.

  • spark-hook: Supports outputting Spark metadata and job running information to DataWorks.

Release date

EMR-3.33.0 was released on January 15, 2021.

EMR-3.32.x

Updates

ServiceChanges
SmartDataUpgraded to version 3.1.0.For more information, see Introduction to SmartData 3.1.x.
Alluxio
  • Supports Alluxio 2.4.0.

  • Default parameter settings scale with cluster node size.

  • Uses HDFS in the EMR cluster as the default UnderFS. This feature is ready to use out of the box.

  • Enhanced the Alluxio OSS UnderFS to support new features such as OSS multi-versioning.

  • Compatible with engines such as Hadoop, Hive, Spark, and Presto.

HUDI| Supports HUDI 0.6.0. Spark| JindoTable supports enabling or disabling the data collection feature. Hive|

  • Fixed a connection pool leak issue in HiveServer.

  • JindoTable supports enabling or disabling the data collection feature.

  • Optimized the performance of ADD COLUMN.

  • Fixed an issue where incorrect data was read from HUDI tables.

  • Default parameter settings scale with cluster node size.

HDFS| Supports a larger number of snapshots. YARN| Default parameter settings scale with cluster node size. Tez| Default parameter settings scale with cluster node size. Sqoop| Fixed an issue with importing files in Avro format.

Release date

EMR-3.32.0 was released on November 23, 2020.

EMR 3.30.x

Updates

ServiceUpdates
SmartDataUpgraded to 3.0.0.For more information, see Introduction to SmartData 3.0.x.
Spark
  • Added support for Alibaba Cloud Data Lake Formation (DLF) metadata.

  • Upgraded the HAS dependency to 2.0.1.

  • Fixed an issue with backticks in Streaming SQL.

  • Removed the Delta JAR package. Delta is now deployed separately.

  • Modified the log path to write all logs to HDFS.

Hive|

  • Added support for Alibaba Cloud DLF metadata.

  • Resolved an issue where a DUMMY file was written when reading an empty directory in a Delta table.

  • Upgraded the HAS dependency to 2.0.1.

Presto|

  • Added support for Alibaba Cloud DLF metadata.

  • Resolved an issue that limited the reading of Delta tables.

  • Fixed an issue where the JVM configuration was missing in high-security mode.

  • Upgraded the HAS dependency to 2.0.1.

HDFS|

  • Added support for hot-swappable disk mode.

  • Upgraded the HAS dependency to 2.0.1.

YARN|

  • Fixed an issue with YARN RMZKStateStore.

  • Added support for SNAPPY files output by SLS.

  • Modified the directory configuration for MapReduce Local mode to resolve a directory permission check issue.

  • Added support for hot-swappable disk mode.

  • Set the log path to write all logs to HDFS.

  • Upgraded the HAS dependency to 2.0.1.

Zookeeper|

  • Added support for attaching the service port to an internal IP address at startup.

  • Upgraded the HAS dependency to 2.0.1.

Flink-Vvp|

  • Upgraded to version 1.11-2.2.2.

  • Added support for SQL and Autopilot features.

__Note Only Dataflow clusters support Flink-Vvp. Hadoop clusters do not support Flink-Vvp at this time. Flink|

  • Added support for writing to OSS in cache mode. This feature, combined with Flink Checkpoints and a resumable Source, achieves EXACTLY_ONCE semantics.

  • Synchronized with Flink community version 1.11.1 features. SQL now supports multiple outputs (MULTI INSERT).

  • Upgraded the HAS dependency to 2.0.1.

Impala|

  • Added support for custom configurations of catalogd.flgs, impalad.flgs, and statestored.flgs.

  • Upgraded Shiro to version 1.6.0.

  • Upgraded the HAS dependency to 2.0.1.

Tez|

  • Optimized the default memory parameters for the Application Master (AM).

  • Upgraded the HAS dependency to 2.0.1.

HAS| Upgraded the HAS dependency to 2.0.1. Storm Zeppelin Ranger OpenLDAP Oozie Knox Kafka HUE HBase Druid

Release date

EMR-3.30.0 was released on October 26, 2020.

EMR-3.29.x

Updates

ServiceChanges
Bigboot
  • Upgraded to version 2.7.301.

  • Jindo DistCp now supports writing data to OSS with the Archive or Infrequent Access storage class.

  • Enhanced the FUSE feature to support multiple namespaces.

  • Improved the metadata caching feature in Cache mode.

Spark|

  • Upgraded Spark to 2.4.5.2.0.

  • Added support for third-party metastores.

  • Added the datalake metastore-client.

Hive|

  • Upgraded Hive to 2.3.5.6.0.

  • Added support for third-party metastores.

  • Added the datalake metastore-client.

Presto| Upgraded to version 338. Ranger|

  • Upgraded the software package to 1.2.0-1.5.0.

  • Added support for Presto 338.

  • Added descriptions to configuration files.

Hadoop Distributed File System (HDFS)| Enabled adaptive configuration for the reserved space size of datanodes. Knox| Impala, later versions of Flink, and PAI are supported. Druid| Upgraded to version 0.18.1. SmartData| Upgraded to version 2.7.301.

Release date

EMR-3.29.0 was released on July 29, 2020.

EMR 3.28.x

Updates

ServiceChanges
FlinkUpgrades open source Flink to Ververica Platform Enterprise Edition. The platform is heavily customized based on open source Flink 1.10 and provides value-added features, such as the self-developed Gemini storage engine.
BigbootUpgrades to version 2.7.0.
Delta
  • Upgrades to version 0.6.0.

  • Decouples the Delta code from the Spark code.

Spark|

  • Upgrades to version 2.4.5.

  • Supports streaming-sql scripts from DataFactory.

  • Supports Delta 0.6.0.

Hive| Supports Delta 0.6.0. Ranger|

  • Supports custom deployments of Hadoop Distributed File System (HDFS), Hive, and Spark.

  • Supports the configuration of ranger-admin-site and ranger-ugsync-site in the console.

HDFS| Now prints DataNode exception information when an HDFS write fails due to no available DataNodes (HDFS-9023). Hue|

  • Supports installing the Hue component on Gateway clusters.

  • Supports deploying multiple Hue instances on a single node.

DataFactory| Supports Delta 0.6.0. Druid| Upgrades to version 0.18.0. Knox|

  • Upgrades to version 1.1.0-1.0.7.

  • Supports the HBase UI.

New features

ServiceChanges
Bigboot
  • Releases the first version of JindoTable, which provides hotspot statistics for tables and partitions.

  • Adds support for complete storage policies in Block mode and tiered storage policies, such as Infrequent Access and Archive.

  • Adds the Jindo DistCp data migration tool.

  • Improves and fixes Jindo Fuse.

  • Improves the integration of the JFS scheme with the Hive engine and Jindo JobCommitter in Cache mode.

  • Adds a feature to set a read ratio in Block mode for reading data directly from OSS. This reduces the overhead of reading from the local cache.

  • Decouples JindoFS software modules into Bigboot (control layer), Smartdata (distributed service), and the JindoFS SDK. Each module can be independently upgraded and maintained.

Release date

EMR-3.28.0 was released on June 12, 2020.

EMR-3.27.x

Updates

ServiceChange
Spark
  • CUBE now supports date type partition fields.

  • Increased the stack depth of Spark-Submit.

Delta|

  • Enhanced Data Definition Language (DDL) syntax, including commands such as CREATE, SHOW, and DESCRIBE.

  • Delta now supports the Optimize syntax with Z-order.

Knox|

  • Adapted for the Druid User Interface (UI).

  • Multi-master deployment is supported.

Hive|

  • hcatalog tables now support the magic committer.

  • Removed some outdated default configurations.

Bigboot|

  • Upgraded to version 2.6.3.

  • Multi-master deployment is supported.

SmartData|

  • Upgraded to version 2.6.3.

  • Multi-master deployment is supported.

Ranger|

  • Ranger now supports the Solr component.

  • Ranger now supports PrestoSQL version 311.

Tez| Tez now supports setting scratchdir on OSS. Presto| Upgraded to version 331. Druid| Upgraded to version 0.17.1. Superset| Upgraded to version 0.35.2. Sqoop|

  • The MySQL Java Database Connectivity (JDBC) JAR package is upgraded to version 5.1.48.

  • The MySQL direct export mode supports setting a custom encoding using --mysql-charset.

New features

FeatureChange
Custom component deploymentAdded support for custom deployment of components on master nodes. The following components are supported:
  • Hadoop

  • Spark

  • Hive

  • Zookeeper

  • Presto

Graceful shutdown for Auto Scaling| When graceful shutdown is enabled, nodes are not released immediately. They are released after tasks are completed within a specified time period.

Release dates

VersionDate
EMR-3.27.0April 29, 2020
EMR-3.27.1 (New purchases are not supported)May 8, 2020
EMR-3.27.2 (New purchases are not supported)May 20, 2020

EMR-3.26.x

Updates

ServiceChanges
Bigboot
  • Upgraded to version 2.6.3.

  • Added support for OTS metadata and Namespace HA.

SmartData Hive| HCatalog tables now support the direct committer. YARN| Changed the default committer to JindoOssCommitter. HDFS| Upgraded JindoFS-related configurations. Spark| Changed the default committer to JindoOssCommitter.

Release dates

VersionDate
EMR-3.26.3 (New purchases are not supported)April 16, 2020

EMR-3.25.x

Updates

ServiceChanges
Ranger
  • Initialized the RangerAdmin database for high-availability (HA) clusters.

  • Fixed a security issue in the RangerUserSync startup script.

Spark|

  • Added support for configuring Delta-related parameters, such as spark.sql.extensions, in the console.

  • Added support for Hive to read Delta tables without setting the input format.

  • Added support for the ALTER TABLE SET TBLPROPERTIES and UNSET TBLPROPERTIES statements.

Delta Hive| Fixed an issue where MapReduce (MR) task execution failed in automatic local mode. Presto|

  • Upgraded to version 310.

  • Upgraded the joda-time version to 2.10.5.

Tez|

  • Upgraded to version 0.9.2.

  • Fixed an issue where the application progress was not displayed correctly in the Tez user interface (UI).

  • Fixed an issue where the application history could not be viewed in the Tez UI.

Impala| Fixed an issue where Impala could not access LZO tables. HDFS| Removed mongo-hadoop related JAR packages. Zookeeper| Upgraded to version 3.5.6. YARN| Adapted for the Tez UI. The yarn-site tab now supports adding the configuration item yarn.resourcemanager.system-metrics-publisher.enabled=true. Bigboot|

  • Upgraded to version 2.2.3.

  • Added support for rename operations in OSS Cache mode.

SmartData Knox| Upgraded dependency package versions. Oozie| Upgraded dependency package versions.

New features

Ranger service: Added support for Ranger Presto operations.

Release date

EMR-3.25.0 was released on January 13, 2020.

EMR-3.24.x

Updates

ServiceChanges
SmartData
  • Optimized JindoFS usage modes. The usage of Block mode is unchanged. Cache mode now supports its original usage and is also compatible with the original OSS file system usage. It supports data and metadata caching. These features can be enabled or disabled separately through configuration and are disabled by default.

  • Optimized read and write performance for Block mode and Cache mode.

  • Optimized disk cleanup. This provides more accurate statistics and more timely cleanup for hot data cached on local disks. It strictly ensures that disk usage does not exceed the quota.

  • Improved support for Gateway clusters. Block mode and Cache mode can now be used on a Gateway.

  • Supports a deployment mode where one storage cluster is separated from multiple compute clusters.

Spark|

  • Added support for Delta-related parameters.

  • Added support for Ranger Spark plugin configuration.

  • Upgraded JindoCube to version 0.3.0.

Hive|

  • Added logic for the SQL compatibility check feature.

  • Released a combination of Hive 2.3.5 and Hadoop 2.8.5.

  • When restarting the component, the content of hiveserver2-site.xml is no longer synchronized to hive-site.xml under spark-conf.

  • Supports using the MSCK command to add incremental folders.

  • Fixed a bug that occurred when Hive reused a Tez container.

  • Supports using the MSCK command to optimize column-based folders.

Bigboot| Upgraded to 2.2.1. Fixed issues with native code support on some machine models. Ranger|

  • Refactored the deployment method for the Spark plugin.

  • Fixed a bug where header2 in an HA cluster did not obtain the keytab.

Kudu| Fixed the startup logic. Zookeeper| Added configuration for four-letter words. This is enabled by default. HDFS| Added compatibility with JindoFS. YARN|

  • Changed the default value of the yarn.scheduler.capacity.node-locality-delay configuration to -1.

  • Added compatibility with JindoFS.

Has| Integrated with OpenLDAP as the backend. OpenLDAP| Added compatibility with Has. Presto| Upgraded to version 0.228. Kafka| Removed D1 bad disks. Druid| Upgraded to 0.16.0. Flume| Upgraded to 1.9.0. Flink|

  • Upgraded to 1.9.1.

  • Supports standalone Flink clusters (released to a whitelist).

New features

ServiceChanges
Delta
  • Supports SQL syntax, including ALTER, CONVERT, CREATE, CTAS, DELETE, DESC, INSERT, MERGE, OPTIMIZE, UPDATE, and VACUUM.

  • Built-in and optimized the OPTIMIZE command.

  • Supports the Hive connector.

  • Supports other existing open-source features.

Grafana| Added as a new component for standalone Flink clusters. Version: 6.4.2. Prometheus| Added as a new component for standalone Flink clusters. Version: 2.13.0. AlertManager| Added as a new component for standalone Flink clusters. Version: 0.19.0. TensorFlow on spark|

  • Supports running TensorFlow on Spark. This deeply integrates Spark with the deep learning framework. The integration includes optimized task scheduling and data exchange. It provides a complete workflow, from data pre-processing to deep learning training.

  • Supports streaming tasks.

Release date

EMR-3.24.0 was released on November 18, 2019.

EMR-3.23.x

Updates

ServiceChanges
Druid
  • Upgraded to 0.15.1.

  • Added the router component.

  • Upgraded fastjson.

Spark|

  • Updated Spark Thrift Server to fix a class loader issue.

  • Refactored Spark transaction code to improve stability.

  • Fixed an issue with reading and writing files in ORC format after the built-in Hive was upgraded to version 2.3.

  • Added support for the MERGE INTO syntax.

  • Added support for the SCAN and STREAM syntax.

  • The Structured Streaming Kafka sink now supports exactly-once semantics (EOS).

  • Updated Delta Lake to 0.4.0.

Hive|

  • Removed the old version of the Hive hook.

  • Added an optimization to handle data skew for multiple COUNT(DISTINCT) fields.

  • Fixed an issue where data was lost when joining tables with different bucket versions.

Flink| Upgraded to 1.8.2. Bigboot|

  • Updated the small file tool.

  • Updated the OSS JAR package to fix a non-daemon thread issue.

Kafka|

  • Added support for the Deployment Set awareness feature.

  • Removed the fastjson dependency.

HDFS|

  • Optimized the deployment logic for the SmartData OSS JAR package.

  • Updated the SmartData OSS JAR package.

Flume| Upgraded fastjson. TensorFlow on Spark| Added this service. HAS| Upgraded fastjson. Livy| Upgraded fastjson.

Release date

EMR-3.23.0 was released on September 18, 2019.

EMR-3.22.x

Updates

ComponentDetails
JindoFileSystem
  • Multiple storage modes

    • Block mode: Data is stored as blocks in the backend OSS. The local Namespace service maintains metadata. Block mode provides better metadata and data performance. Block mode supports different storage policies, including WARM (local replicas, OSS replicas), COLD (OSS replicas only), HOT (multiple local replicas, OSS replicas), TEMP (local replicas only), and ALL_HDD (multiple local replicas). The default policy is WARM. You can set different storage policies for folders based on your application scenario.

    • Cache mode: This mode is compatible with existing OSS storage methods. In Cache mode, files are stored as objects in OSS. Data and metadata for each file are cached locally based on access frequency. This improves data and metadata access performance. Cache mode provides different metadata synchronization policies to meet the needs of different scenarios.

  • External client support

    • The client software development kit (SDK) lets you access the EMR JindoFS file system from outside an EMR cluster. You can use the client to access the Namespace in Block mode. However, external clients cannot use the data cache built by EMR JindoFS within the EMR cluster. This results in lower performance compared to using it within the EMR cluster.

    • Cache mode retains the original OSS storage semantics. It uses JindoFS to accelerate data caching within the EMR cluster. Therefore, you can directly access data from outside the EMR cluster using an OSS client, such as the OSS SDK or EMR OssFileSystem.

  • Ecosystem component support

    • JindoFS now supports many compute engines on EMR, such as Spark, Flink, Hive, MapReduce, Impala, and Presto.

    • For scenarios that separate computing and storage, you can also store job logs in JindoFS, such as YARN Container logs and Spark Event logs.

    • JindoFS can be used as the HFile backend storage for HBase to expand its storage capacity.

OssFileSystem|

  • Added logic to OssFileSystem to automatically detect bad disks. This fixes an issue where cache writes failed during OSS writes due to bad disks.

  • Completed the related configurations for OssFileSystem.

Bigboot|

  • Upgraded to version 2.0.0.

  • Includes several major updates, such as support for multiple Namespaces, storing local data blocks as large files, multi-mode storage, and external clients.

  • Fixed an issue where the Bigboot monitor status was incorrect during a machine restart.

  • Added a service spec for the Kudu component.

  • Added correctness checks for all service specs.

Hadoop|

  • HDFS

    • Adapted for HDFS Federation. You can now create HDFS Federation clusters using custom configurations and APIs. This avoids the need for a second format operation when creating a Federation cluster.

    • Optimized the bad disk detection logic. For local disk scenarios, you can trigger bad disk detection when a DataNode block report is triggered by dfsadmin.

  • YARN Fixed an issue where the MapReduce JobHistory job list did not update when MapReduce job Container logs were stored in JindoFS or OSS.

Spark|

  • Relational Cache Added support for Relational Cache. Relational Cache uses pre-computation to accelerate user queries. You can create a Relational Cache to pre-compute data. When a user query is executed, the Spark Optimizer automatically finds a suitable cache, rewrites the SQL execution plan, and continues the computation based on the cached data. This improves query speed. This feature is suitable for scenarios such as reports, dashboards, data synchronization, and multidimensional analysis.

    • Use Data Definition Language (DDL) to perform operations such as CACHE, UNCACHE, ALTER, and SHOW. Cached data supports all Spark data sources and data formats.

    • Supports automatic cache data updates and updates using the REFRESH command. Supports incremental updates based on partitions.

    • Supports execution plan optimization based on Relational Cache.

  • Streaming SQL

    • Standardized the parameter configuration for Stream Query Writer.

    • Optimized the schema compatibility check for Kafka data tables.

    • If a Kafka data table schema does not exist, it is automatically created in Schema Registry.

    • Optimized the log information for when a Kafka schema is incompatible.

    • Fixed an issue where column names had to be explicitly specified when writing query results to a Kafka table.

    • Removed the restriction that streaming SQL queries only support Kafka and Loghub data sources.

  • Delta Added Delta. You can use Spark to create a Delta data source to support scenarios such as streaming data writes, transactional reads and writes, data validation, and data history. For more information, see Delta details.

    • Supports using the DataFrame API to read data from or write data to Delta.

    • Supports using the Structured Streaming API to read from or write to Delta as a source or sink.

    • Supports using the Delta API to perform operations such as update, delete, merge, vacuum, and optimize.

    • Supports using SQL to perform operations such as creating Delta-based tables, importing data to Delta, and reading from Delta tables.

  • Others

    • Added a constraint feature that supports primary keys and foreign keys.

    • Resolved JAR file conflicts, such as for servlets.

Flink| Rollback of Log4j logs Kafka|

  • Log rollback for Log4j.

  • Upgraded fastjson.

Zeppelin| Upgraded the dependent commons-lang3 package to version 3.7. This fixes an issue where PySpark could not write to OSS. For more information, see Spark 2.4 incompatibility with commons-lang3 in Zeppelin. Ranger| Added support for SHOW GRANTS. Analytics-Zoo| Fixed a NumPy installation error. Impala| Now compatible with Apache Kudu 1.10.0. Presto| Upgraded to version 0.221. ZooKeeper| Upgraded to version 3.5.5.

New features

ServiceChange
Kudu
  • Added Kudu as a new component. Kudu fills a gap in the Hadoop ecosystem. It provides fast data inserts and random access similar to HBase, and lets you modify data. It also provides large-scale data analytics and query capabilities similar to Hadoop Distributed File System (HDFS) or Parquet.

    • Provides C++ and Java APIs for custom development.

    • Integrates with Impala, Spark, and Hive Metastore.

  • This version of Kudu is based on Apache Kudu 1.10.0.

OpenLDAP|

  • Added OpenLDAP as a new component to replace ApacheDS. ApacheDS is now offline.

  • Supports high availability (HA).

Release date

EMR-3.22.0 was released on July 28, 2019.

Versions earlier than EMR-3.22.x

EMR-3.1.1

  • Upgraded the operating system (OS) to CentOS 7.2.

  • Upgraded Spark to version 2.1.1.

  • Upgraded emr-core to version 1.2.6.

  • Fixed a bug related to AccessKey-free operations for OSS.

EMR-3.0.2

  • Upgraded emr-core to version 1.2.5.

  • Extended AccessKey-free support for OSS to more regions.

  • Adjusted the replacement policy for role-based AccessKeys.

  • Fixed some bugs in Hive and Hadoop.

EMR-3.0.1

  • Added support for interactive mode and unified table management. You can now store Hive metadata in an external database. This allows multiple clusters to share the same metadata.

  • Upgraded emr-core to version 1.2.4, which optimizes the read and write performance of Object Storage Service (OSS).

  • Upgraded Spark to version 2.0.2.

__

Note

This version is fully compatible with EMR-3.0.0.

EMR-3.0.0

Initial release.