All Products
Search
Document Center

E-MapReduce:Release notes for EMR V3.x series

Last Updated:Mar 24, 2025

This topic describes the release dates and updates of EMR V4.x. For more information, see Release versions.

EMR-3.52.x

Release dates

Version

Date

EMR-3.52.1

December 18, 2024

EMR-3.52.0 (New purchases are not supported)

December 4, 2024

Updates

Service

Changes

Spark

  • The configuration issues are fixed during a scale-out.

  • The issue that Simple Authentication and Security Layer (SASL) connections fail in a Kerberos cluster is fixed.

Hive

The configuration issues are fixed during a scale-out.

Trino

The issue that the connection to the service in an EMR cluster fails after LDAP authentication is enabled for the EMR cluster is fixed.

Presto

Zookeeper

Custom configurations are supported.

Ranger

Spark 3 Ranger is replaced with Kyuubi Ranger.

Hudi

Hudi is updated to 0.15.0.

Celeborn

Celeborn is updated to 0.5.2.

JindoCache

JindoCache is updated to 6.5.3.

StarRocks3

StarRocks 3 is updated to 3.2.11.

Kyuubi

Kyuubi is updated to 1.9.2.

StarRocks2

StarRocks 2 is updated to 2.5.22.

Impala

The service is unavailable. You can use the recommended service as an alternative or manually install the corresponding service.

You can replace Impala with Presto, Trino, ClickHouse, or StarRocks.

Kudu

Kafka

Kafka-Manager

EMR-3.51.x

This topic describes the release notes for EMR V3.51.X, including the release dates, updates, and release version information.

Release dates

Version

Date

EMR-3.51.4

December 18, 2024

EMR-3.51.3 (New purchases are not supported)

November 29, 2024

EMR-3.51.2 (New purchases are not supported)

August 29, 2024

EMR-3.51.1 (New purchases are not supported)

June 21, 2024

EMR-3.51.0 (New purchases are not supported)

April 23, 2024

Updates

EMR-3.51.4

Service

Changes

JindoCache

JindoCache is updated to 6.5.3.

StarRocks2

StarRocks 2 is updated to 2.5.22.

StarRocks3

StarRocks 3 is updated to 3.2.11.

EMR-3.51.3

Service

Description

JindoSDK

JindoSDK is updated to resolve the issue that causes deadlocks.

EMR-3.51.2

Service

Description

JindoCache

  • JindoCache is updated to 6.5.1.

  • The performance of reading data from and writing data to distributed hash tables is improved.

Spark

  • The issue that partition directories cannot be deleted is fixed.

  • The issue related to the Hive package dependency is fixed. This ensures that the connection between Spark and the Metastore client remains uninterrupted.

Trino

  • The issue that some modified configurations may be unexpectedly restored to original configurations during a scale-out is fixed.

  • Data in the OSS-HDFS service that is deployed in a high-security cluster can be queried.

  • The issue that exceptions occur on Trino after DLF-Auth is enabled is fixed.

Presto

Data in the OSS-HDFS service that is installed in a high-security cluster can be queried.

HDFS

The issue that the memory size of NameNodes and DataNodes cannot be modified is fixed.

HBase-HDFS

YARN

  • Multiple timeline events can be sent by the ResourceManager at a time, which improves the processing capability.

  • The logic issue in processing containers and resources of the ResourceManager is fixed.

ZooKeeper

  • The issue that the memory configuration of a node group cannot be modified is fixed.

  • The log configuration files can be reconstructed.

Impala

The issue that client configurations are unexpectedly modified during an auto scaling activity is fixed.

Ranger

The latest version of JindoSDK is supported, which effectively reduces the CPU load.

Knox

The following issue is fixed: The URL of Knox fails to be accessed when a cluster has only one Master Extend node group.

Kafka

The following issue is fixed: The EMR cluster in which Kafka Connect is deployed fails to be started.

StarRocks

The issue that added BE nodes are not displayed after a scale-out is fixed.

Doris

Doris is updated to 2.1.4.

Paimon

Paimon is updated to 0.9-ali-7.

EMR-HOOK

The lineage information of a MaxCompute table can be parsed.

EMR-3.51.1

Service

Changes

Spark

Master-Extend node groups are supported.

Hive

Kyuubi

Paimon

Flink that uses Ververica Runtime (VVR) is replaced with Apache Flink, and Data Lake Formation (DLF) catalogs are supported.

Knox

JDK 8 is used for packaging.

Flink

DLF configurations and dependencies that were removed in EMR V3.51.0 have been restored.

EMR-3.51.0

Service

Changes

Spark

Spark 3 is updated to 3.4.2.

Celeborn

Celeborn is updated to 0.4.0.

Doris

Doris is updated to 2.1.0.

StarRocks

  • StarRocks 2 is updated to 2.5.18.

  • StarRocks 3 is updated to 3.2.4.

DeltaLake

Delta Lake is updated to 3.0.0.

Iceberg

Iceberg is updated to 1.5.0.

Zookeeper

ZooKeeper is updated to 3.8.4.

JindoCache

JindoCache is updated to 6.2.5.

Flink

Flink is updated to 1.17.2.

EMR-3.50.x

Release dates

Version

Date

EMR-3.50.0

February 19, 2024

Updates

Service

Changes

Hudi

Hudi is updated to 0.14.0.

Flume

Flume is updated to 1.11.0.

Kyuubi

Kyuubi is updated to 1.7.3.

Impala

Impala is updated to 4.3.0.

Celeborn

Celeborn is updated to 0.3.2.

JindoCache

JindoCache is updated to 6.2.0.

Paimon

Paimon is updated to 0.7-ali-1.

Kafka

  • Kafka is updated to 3.6.1.

  • The vulnerability in SASL-based security authentication of the Kafka Connect component is fixed. SASL is short for Simple Authentication and Security Layer.

Spark

Vulnerabilities in the Commons Text library are fixed.

StarRocks

  • StarRocks 2 is updated to 2.5.13.

  • StarRocks 3 is updated to 3.1.5.

Ranger

  • Vulnerabilities in the Commons Text library are fixed.

  • The path matching permission bypass vulnerability in the Spring Security framework is fixed.

  • The forward/include authentication bypass vulnerability in the Spring Security framework is fixed.

  • The identity authentication bypass vulnerability in a special matching mode in Spring Framework is fixed.

  • The interval at which Ranger obtains user information from the Lightweight Directory Access Protocol (LDAP) server and updates the user information can be modified.

EMR-3.49.x

This topic describes the release notes for EMR V3.49.X, including the release dates, updates, and release version information.

Release dates

Version

Date

EMR-3.49.1

November 16, 2023

EMR-3.49.0 (New purchases are not supported)

October 27, 2023

Updates

Service

Changes

JindoCache

The JindoCache service is added. The version of JindoCache is 6.1.1.

JindoData

JindoData is unavailable. You can use JindoCache for data caching and DLF-Auth for authentication.

Spark

Removed hive-site.xml configurations related to jdo.

HBase

Configuration items are added on the Configure tab of the HBase service page, and you can select the version of HBase Thrift Server based on your business requirements. The version of HBase Thrift Server can be v1 or v2.

StarRocks

StarRocks 2 is updated to 2.5.10.

Doris

Doris is updated to 1.2.7.

Celeborn

Celeborn is updated to 0.3.1.

Paimon

Paimon is updated to 0.6-ali-2.

ClickHouse

ClickHouse is updated to 23.8.2.7.

EMR-3.48.x

This topic describes the release notes for EMR V3.48.X, including the release dates, updates, and release version information.

Release dates

Version

Date

EMR-3.48.2

August 17, 2023

Updates

Service

Changes

Trino

  • The issue that the Paimon connector cannot be used to query Hadoop Distributed File System (HDFS) tables is resolved.

  • The issue that the metrics of worker nodes cannot be read is resolved.

Presto

  • Presto is updated to 0.283.

  • The issue that the metrics of worker nodes cannot be read is resolved.

ClickHouse

By default, all permissions are granted to the default user.

StarRocks

  • The earlier StarRocks service is renamed StarRocks 2.

  • StarRocks 3 is added. StarRocks 3.1.2 is supported. By default, StarRocks 3 supports compute-storage integration. StarRocks 3 does not support compute-storage separation.

Celeborn

Celeborn is updated to 0.3.0.

EMR-3.47.x

This topic describes the release notes for EMR V3.47.X, including the release dates, updates, and release version information.

Release dates

Version

Date

EMR-3.47.0

August 3, 2023

Updates

Service

Changes

Hudi

Hudi is updated to 0.13.1.

Paimon

Paimon is updated to 0.5-ali-1.

StarRocks

StarRocks is updated to 2.5.8.

JindoData

JindoData is updated to 4.6.11.

Trino

  • Trino is updated to 422.

  • The Hudi connector can be used to query Merge on Read tables.

  • The error message for dynamic user-defined function (UDF) loading is optimized.

EMR-3.46.x

This topic describes the release notes for EMR V3.46.X, including the release dates, updates, and release version information.

Release dates

Version

Date

EMR-3.46.1

July 13, 2023

EMR-3.46.0 (New purchases are not supported)

June 1, 2023

Updates

EMR-3.46.1

Service

Description

Spark

  • By default, OSS-HDFS is used to store data of Spark History Server.

  • OSS or OSS-HDFS is used to store data of Spark3 Native Engine.

Hive

By default, OSS-HDFS is used to store data in Hive warehouse files.

OSS-HDFS

The OSS-HDFS service is added.

YARN

By default, OSS-HDFS is used to store data.

HBase

  • By default, OSS-HDFS is used to store HBase data in the HFile format.

  • OSS-HDFS is used to store write-ahead logging (WAL) logs of HBase.

EMR-3.46.0

Service

Changes

Kyuubi

Kyuubi is updated to 1.7.1.

Celeborn

Celeborn is updated to 0.2.2.

Paimon

  • Flink Table Store is renamed Paimon.

  • Paimon is updated to 0.4-ali-1.

Starrocks

StarRocks is updated to 2.5.5.

Doris

Doris is updated to 1.2.4.

ClickHouse

ClickHouse is updated to 22.8.17.17.

Trino

By default, a simple event listener is provided. This allows you to obtain audit logs.

Phoenix

Hive on Phoenix is supported.

EMR-3.45.x

This topic describes the release notes for EMR V3.45.X, including the release dates, updates, and release version information.

Release dates

Version

Date

EMR-3.45.1

April 3, 2023

EMR-3.45.0 (New purchases are not supported)

February 28, 2023

Updates

EMR-3.45.1

Service

Description

ClickHouse

ClickHouse is updated to 22.8.14.53.

Trino

The odps.properties connector is added. This allows you to query MaxCompute data.

JindoData

JindoData is updated to 4.6.5.

JindoSDK

JindoSDK is updated to 4.6.5.

Flink Table Store

Flink Table Store is updated to 0.3-ali-2.

YARN

The Node Labels feature is supported.

EMR-3.45.0

Service

Changes

Iceberg

Iceberg is updated to 1.1.0.

Hudi

  • Hudi is updated to 0.12.2.

  • The Change Data Capture (CDC) feature is supported.

Kudu

Kudu is updated to 1.16.0.

Clickhouse

  • ClickHouse is updated to 22.3.8.39.

  • The ZooKeeper service must be selected when you deploy the ClickHouse service.

Celeborn

  • Remote Shuffle Service (RSS) is renamed Celeborn.

  • The version of Celeborn is 0.2.0.

Presto

The Presto service is added. The kernel of the service is PrestoDB 0.278.3 in the Facebook community. The default HTTP port for the service is 8889, and the default HTTPS port for the service is 7779.

DeltaLake

Delta Lake is updated to 2.2.0.

StarRocks

StarRocks is updated to 2.4.3.

Doris

Doris is updated to 1.2.1.

Kafka-Manager

Kafka Manager is updated to 3.0.0.6.

Impala

The service is no longer used.

OpenLDAP

OpenLDAP is updated to 2.4.46.

Kyuubi

Kyuubi is updated to 1.6.1.

Ranger

Ranger is updated to 2.3.0.

HBase

  • ThriftServer2 is supported.

  • The default value of the hbase.block.data.cachecompressed parameter is changed to true.

Flink-Table-Store

The Flink Table Store service is added and is based on open source Flink Table Store 0.3.

JindoData

JindoData is updated to 4.6.4.

EMR-3.44.x

Release dates

EMR-3.44.0 December 1, 2022

Updates

Service

Changes

Iceberg

Iceberg is updated to 0.14.1.

Flink

Flink is updated to 1.15-vvr-6.0.2, which corresponds to Apache Flink 1.15.

Kafka

  • LDAP authentication of logon users is supported.

  • User group authentication is supported.

Trino

  • EMR Presto is renamed Trino.

  • Ranger and DLF-Auth are supported.

  • The following issue is fixed: After LDAP authentication is enabled with simple operations, the worker node cannot be connected.

JindoSDK

JindoSDK is updated to 4.6.2.

JindoData

JindoData is updated to 4.6.2.

HBase

  • Ranger is supported.

  • The following issue is fixed: When you add a service, the OSS-HDFS service cannot be selected for storage.

YARN

By default, the access control list (ACL) feature is enabled for EMR clusters that are deployed in security mode.

Starrocks

StarRocks is updated to 2.3.4.

Doris

Doris is updated to 1.1.5.

Hudi

The hudi-defaults.conf file can be configured in the console.

Ranger

Ranger can be integrated with Trino, YARN, HBase, and Kafka.

DLF-Auth

  • DLF-Auth is updated to 2.0.2.

  • Trino and Impala are supported.

OpenLDAP

OpenLDAP is integrated with Nslcd.

Kudu

Kudu Tserver can no longer be installed in task node groups.

Spark

Spark is updated to 3.3.1.

Tez

Tez is updated to 0.10.2.

Kyuubi

Kyuubi is updated to 1.6.0.

EMR-3.43.x

Release dates

Version

Date

EMR-3.43.1

November 08, 2022

EMR-3.43.0 (New purchases are not supported)

October 14, 2022

Updates

EMR-3.43.1

Service

Changes

Kerberos

An external key distribution center (KDC) can be connected to the Kerberos service of an EMR cluster.

Kafka

Configuration items for startup commands can be added. Users can configure the startup parameters of the Kafka service based on their business requirements.

JindoData

  • JindoData is updated to 4.6.0.

  • The access path of OSS-HDFS can be modified.

Flink

Flink is updated to 1.13_vvr_4.0.15.

RSS

RSS is updated to 0.1.4.

EMR-3.43.0

Service

Changes

Spark

  • Spark is updated to 3.3.

  • Kerberos authentication is supported.

Hudi

  • Hudi is updated to 0.12.0.

  • Spark 3.3 is supported.

  • You can use a metastore that is hosted on Alibaba Cloud to manage metadata and enable acceleration. For more information, see Use Hudi MetaStore.

Flink

  • Kerberos authentication is supported.

  • Flink can be automatically connected to Data Lake Formation (DLF).

Iceberg

  • Iceberg is updated to 0.14.0.

  • Spark 3.3 is supported.

  • Kerberos authentication is supported.

JindoData

  • JindoData is updated to 4.5.1.

  • Alibaba Cloud resources can be accessed without an AccessKey pair.

Hadoop-Common and HDFS

  • Kerberos authentication is supported.

  • The CVE-2022-25168 vulnerability is fixed.

Knox

Knox is integrated with Ranger. You can access the Ranger UI from the Access Links And Ports tab.

HBase

  • HBase is updated to 1.7.1.

  • Kerberos authentication is supported.

  • Node groups can be separately configured.

RSS

  • RSS is updated to 0.1.2.

  • Kerberos authentication is supported.

Doris

  • Doris is updated to 1.1.2.

  • Kerberos authentication is supported.

StarRocks

  • StarRocks is updated to 2.2.6.

  • Kerberos authentication is supported.

Kafka

  • Kafka is updated to 2.13_3.2.1.

  • Kerberos authentication is supported.

DeltaLake

  • Delta Lake is updated to 2.1.0.

  • Spark 3.3 is supported.

  • Kerberos authentication is supported.

Kudu

The component is released. The version is 1.14.0.

Impala

  • DLF can be used to create views.

  • Kerberos authentication is supported.

YARN, Imapla, Ranger, Hive, Kyuubi, Tez, Kafka, Zookeeper, DLF-Auth, Phoenix, Sqoop, and Presto

Kerberos authentication is supported.

EMR-3.42.x

Release dates

EMR-3.42.0 August 5, 2022

Updates

Service

Changes

Hive

LDAP authentication can be enabled with one click.

Presto

  • Presto is updated to 389.

    Delta Lake and Hudi connectors that are provided by the Presto community can be used.

    • The time travel and Z-Order features are not supported by the Delta Lake connector.

    • The Hudi connector cannot be used to query Merge on Read (MOR) tables.

  • LDAP authentication can be enabled with one click.

DeltaLake

  • Delta Lake is interconnected with Data Lake Formation (DLF) to support automatic management of tables in data lakes.

  • Ranger authentication is supported.

  • The issue that statistics about fields of the TIMESTAMP type cannot be collected is fixed.

  • Metric information can be returned by the OPTIMIZE and VACUUM commands.

Hudi

Hudi is updated to 0.11.1.

HadoopCommon

The Hadoop Common service is added. This way, the issue that the configurations of HDFS, YARN, and JindoSDK are overwritten is fixed.

YARN

The auto scaling feature is enhanced.

Ranger

  • Both of Spark 2 and Spark 3 are supported.

  • LDAP authentication can be enabled for Ranger UserSync with one click.

Kafka

Related topics are automatically created when CruiseControl is used.

HBase

The HBase service is added. The service version is 1.4.9.

Phoenix

The Phoenix service is added. The service version is 4.14.1.

Doris

Doris is updated to 1.1.1.

StarRocks

StarRocks is updated to 2.2.3.

ClickHouse

The following issue is fixed: An out-of-memory (OOM) error occurs when you read large files from Object Storage Service (OSS).

EMR-3.40.x

Release dates

EMR-3.40.0 April 21, 2022

Updates

Service

Changes

JindoData

The JindoData service is added. The service version is 4.3.0.

JindoSDK

JindoSDK is updated to 4.3.0.

Spark

Spark is updated to 3.2.1.

Hive

  • The following issue is fixed: After speculative execution is enabled for Hive on Tez, both the original task and the speculative task are committed.

  • The following issue is fixed: User-defined functions (UDFs) can be called only after you reload the functions.

Presto

The following issue is fixed: After a Hadoop cluster is initialized and the Presto service is added to the cluster, the Presto service fails to start.

DeltaLake

The incompatibility issue between Delta Lake and Streaming SQL is fixed.

Hudi

Hudi is updated to 0.10.1.

Iceberg

Iceberg is updated to 0.13.1.

YARN

  • A feature that allows you to run an ApplicationMaster only on a core node is added.

  • The issue that the mareduce.map.java.opts configuration lacks taihaodoctor is fixed.

Zookeeper

The configuration of Java Virtual Machine (JVM) parameters is optimized.

Flink

Flink, Impala, Flume, and Druid are adapted to JindoSDK 4.3.0.

Impala

Flume

Druid

Sqoop

The PostgreSQL version is updated.

Zeppelin

The issue that the Java Database Connectivity (JDBC) interpreter fails to start is fixed.

Ranger

Hudi tables are supported by the Spark plugin of Ranger 1.2.0.

Oozie

Log4j is updated to 2.17.2.

HBase

The issue that the RegionServer of HBase 1.4.9 fails to start is fixed.

DLF-Auth

DLF-Auth is updated to 2.0.0.

EMR-3.39.x

Release dates

Version

Date

EMR-3.39.2

March 25, 2022

EMR-3.39.1 (New purchases are not supported)

February 15, 2022

Updates

EMR-3.39.2

Note

Only OLAP clusters and DataFlow clusters in the new EMR console support this version.

Service

Changes

Flink

  • The application performance management (APM) dashboard is optimized. Metrics such as sourceIdleTime are added.

  • Alerting by CloudMonitor is supported.

Kafka

  • SSL and Simple Authentication and Security Layer (SASL) configurations are supported.

  • The default values of some parameters are changed.

Clickhouse

The default values of some parameters are changed.

EMR-3.39.1

Service

Changes

SmartData

The service is no longer used.

BIGBOOT

RSS

  • The ESS service is upgraded to RSS. For more information, see RSS.

  • The service functionality and stability are enhanced.

JindoSDK

  • The architecture of SmartData is upgraded to JindoData.

  • EMR integrates JindoSDK 4.0 for the first time, supporting OSS and OSS-HDFS services.

Spark

  • Hive on Spark is optimized.

  • Spark is adapted to JindoSDK.

Tez

Tez is adapted to JindoSDK.

Hive

Hive is adapted to JindoSDK.

Presto

  • User-defined functions (UDFs) can be dynamically loaded.

  • The for ... as of syntax can be used in the time travel feature to query data in a Delta Lake table.

  • An independent Delta Lake catalog is added. Presto provides default configurations for a Delta connector and supports the Optimize, Z-ordering, and Data Skipping features based on the independent Delta Lake catalog.

  • The issue that data in Merge on Read tables of Hudi cannot be queried by using a Hudi connector is fixed. You cannot use a Hive connector to query Merge on Read tables of Hudi.

  • Presto is adapted to JindoSDK.

Delta Lake

  • Metadata management

    • The built-in catalog of Spark, instead of an API operation that is called by using the Hive CLI, is used to synchronize metadata and partition information.

    • The statistics on table data are automatically reported to metastores.

  • SQL

    • The syntax of the time travel feature is supported.

    • The DROP PARTITION SQL syntax is supported.

    • The ADD COLUMN statement can be used to add columns to specified locations (FIRST and AFTER).

  • Enhanced table management capabilities

    • The file size can be dynamically adjusted based on the table size. By default, this feature is enabled.

    • The auto-vacuum feature is supported and enabled by default. Concurrent vacuum operations are supported.

    • The logic of automatic compaction is optimized. By default, the automatic compaction feature is disabled.

    • The Z-ordering syntax is added. Z-ordering-based data processing is accelerated.

Hudi

Hudi is updated to 0.10.0.

HDFS

HDFS is adapted to JindoSDK.

YARN

YARN is adapted to JindoSDK.

Flume

Flume is adapted to JindoSDK.

Flink

  • By default, the lib directory of Flink is uploaded to the HDFS cluster. This way, you can use the yarn.provided.lib.dirs parameter.

  • Flink is adapted to JindoSDK.

Impala

Impala is adapted to JindoSDK.

Ranger

  • The issue that Spark History Server fails to start is fixed.

  • Ranger is adapted to JindoSDK.

HBase

  • The issue about default parameter settings is fixed.

  • The issue about the date format of garbage collection (GC) logs is fixed.

  • The restart issue that occurs if an IP address is configured for RegionServer is fixed.

Druid

Druid is adapted to JindoSDK.

Clickhouse

The logic of processing data when the ClickHouse component stops working is optimized.

Iceberg

  • Iceberg is updated to 0.13.0.

  • Default configuration items are hidden to improve user experience.

DLF-Auth

The issue that Spark History Server fails to start is fixed.

StarRocks

The service is added in the new EMR console.

Version 2.0.1 is published.

EMR-3.38.x

Release dates

Version

Date

EMR-3.38.3

December 2021

EMR-3.38.2 (New purchases are not supported)

December 2021

EMR-3.38.1 (New purchases are not supported)

November 2021

EMR-3.38.0 (New purchases are not supported)

October 2021

Updates

EMR-3.38.3

The Log4j2 arbitrary code execution vulnerability is fixed for all related components. For more information, see Security bulletin | Apache Log4j2 arbitrary code execution vulnerability.

Service

Changes

Presto

  • The error that is returned when Presto in a high availability cluster is used to query data in a Hudi table is fixed.

  • The Log4j security vulnerability of the Elasticsearch connector is fixed.

DLF Metastore

  • By default, the logging feature for metastores is disabled. In earlier versions, the feature is enabled by default.

  • The error caused by an excessively long getTableStats URI for a metastore is fixed.

Delta Lake

The issue that schema changes fail to be synchronized to a metastore is fixed.

Flink

  • VVR is updated to 4.0.11. This version supports the following features:

    • The commercial features of Flink Change Data Capture (CDC) are released:

      • Schema Evolution is supported.

      • Flink SQL semantics that is used for database synchronization is supported.

    • The GeminiStateBackend can be used to store state data in Object Storage Service (OSS).

  • A Hudi connector of Enterprise Edition is provided, and a built-in Data Lake Formation (DLF) catalog is used in the connector to manage metadata.

Sqoop

The issue that precision loss for the DECIMAL data type occurs when you use Sqoop to import data to HCatalog tables is fixed.

EMR-3.38.2

Service

Changes

SmartData

  • SmartData is updated to 3.8.0. For more information, see SmartData 3.8.x.

  • Authentication and authorization based on Kerberos and Ranger can be used to manage permissions on data in OSS.

EMR-3.38.1

Service

Changes

SmartData

SmartData is updated to 3.7.3. For more information, see SmartData 3.7.x.

Spark

  • Log4j Metrics Appender is removed because the configuration is invalid.

  • The null pointer exception that occurs when SparkContext is started is fixed.

Presto

  • The following issue is fixed: You must configure host parameters for the Presto service before you can use Presto in a high-availability Hadoop cluster to query data from a Hive table.

  • The following issue is fixed: Presto cannot be started by default when the memory is small.

  • Fixed the issue where modifications to the worker-jvm configuration could not take effect.

  • Ranger is supported.

Impala

Fixed the issue where an error message no such method error was displayed when querying DLF metadata tables.

Ranger

  • Presto is supported.

  • The exceptions that occur when you use Ranger to configure the permissions of using Spark to insert data into ORC or Parquet tables are fixed.

  • The following issue is fixed: Hive role permissions configured in Ranger do not take effect after Kerberos is enabled.

DLF-Auth

  • DLF-Auth is updated to 1.0.1.

  • The permissions of using Presto to access DLF can be configured.

  • The issue that data cannot be cached for RAM users is fixed.

EMR-3.38.0

Service

Changes

SmartData

Upgrade SmartData to version 3.7.2. For more information, see Introduction to SmartData 3.7.x.

Spark

  • Spark is updated to 2.4.8.

  • Both of Spark 2.4.8 and Spark 3.1.2 are supported.

    Note

    Spark 3 does not support Delta and Remote Shuffle Service.

  • In Spark 3.x, SparkSQL optimizes the performance of DISTINCT calculations. The optimization feature is triggered when multiple count(distinct case ... when ...) statements are included in an aggregation operator.

  • The array-index out of bounds error that is returned when some required statistics for Adaptive Query Execution (AQE) are missing is fixed.

  • Errors related to AQE and data caching in specific scenarios are fixed.

Hive

Upgrade Hive to version 2.3.9.

Presto

  • An independent Presto cluster can be created.

  • Presto is updated to 358.

    Important

    This version does not support Ranger.

  • Connectors such as Hudi and MySQL connectors are supported by default, and the default configurations are updated.

  • Auto scaling is supported by Presto clusters.

  • Data lake analysis is supported.

DeltaLake

  • The same Delta Lake connectors are used in Hive 2 and Hive 3.

  • The error that is returned when you use Delta Lake connectors to query data from multi-level partitioned tables is fixed.

Hudi

  • Hudi is updated to 0.9.0.

  • The issue about the compatibility of sql.extension between Delta Lake and Hudi is fixed.

HDFS

The reserved space parameter of NameNode automatically increases to ensure that NameNode enters SafeMode in time when disk space is insufficient.

Flink

  • Flink is updated to 1.13-vvr-4.0.10, which corresponds to Apache Flink 1.13.1.

  • Commercial connectors, such as a Hologres connector, are added to Flink.

  • Metric reporters are added to report metrics on the APM dashboard.

  • A SchemaRegistry-based Kafka catalog is added to the Kafka connector. This way, you can read data from or write data to existing Kafka topics without the need to execute DDL statements.

Storm

The service is no longer used.

Zeppelin

Upgrade Zeppelin to community version 0.10.0.

Ranger

When Presto is version 358 for the community, this version of Ranger does not support access control for Presto.

Hue

  • The issue that YARN Job Browser sometimes cannot present or terminate jobs is fixed.

  • YARN Job Browser is accessible by default.

  • The Presto protocol is supported by default.

Druid

Fixed the issue where residual PID files during server power outages caused node restart failures.

ClickHouse

  • Some default configurations are updated.

  • Clusters can be scaled out.

  • The MetaChecker feature is supported.

  • Object Storage Service (OSS) table engines and OSS table functions can be used to read data.

  • Table-level custom ZooKeeper addresses are supported.

Iceberg

The component is released. The supported versions range from 0.12.0 to 1.0.1.

Knox

The issue that the first access to the Spark UI fails is fixed.

DLF-Auth

The component is released.

The permissions of using Hive or Spark to access DLF can be configured. The service version is 1.0.0.

ESS

ESS is updated to 1.2.0.

EMR-3.37.x

Release dates

Version

Date

EMR-3.37.1

September 2021

EMR-3.37.0 (New purchases are not supported)

August 2021

Updates

EMR-3.37.1

Service

Changes

SmartData

SmartData is updated to 3.7.1.

Hue

The issue that Impala cannot be used in a high-security cluster is fixed.

Kudu

Kerberos is supported.

EMR-3.37.0

Service

Changes

SmartData

SmartData is updated to 3.7.0.

Spark

The incompatibility issue between Spark and Delta Lake is fixed.

DeltaLake

  • The connectors for Delta Lake are updated. This way, you can use a storage handler to create tables and query data.

  • The exception that occurs when you perform an INSERT OVERWRITE operation on a partitioned table is fixed.

  • The exception that occurs when you perform an OPTIMIZE operation to write virtual fields to a file in the G-SCD scenario is fixed.

YARN

  • Information about app IDs, CPU utilization, and memory usage is added to the RESTful APIs of containers for nodes.

  • The issue that the Application Master (AM) logs of an automatically released node cannot be viewed is fixed.

  • The data of a node that is automatically released based on the decommissioning logic of auto scaling can be deleted.

  • The Graceful Decommission logic of auto scaling is optimized. The node on which NodeManager runs is marked deprecated only after the NodeManager process is complete.

Zookeeper

ZooKeeper is updated to 3.6.3.

Flink

  • The SmartData service is added.

  • Object Storage Service (OSS) can be accessed in password-free mode when you log on to a DataFlow-Flink cluster in SSH mode and submit a job in the cluster.

Impala

The issue that the LIST operation is repeatedly performed on directly deleted OSS partition directories is fixed.

Hue

The UI display exception that occurs when you use Hue together with Oozie is fixed.

Kudu

Kudu is updated to 1.14.0.

Clickhouse

Some default configurations are updated.

EMR-3.36.x

Release dates

EMR-3.36.1 July 16, 2021

Updates

Service

Changes

SmartData

SmartData is updated to 3.6.1.

For more information, see SmartData 3.6.x.

Hive

  • Hive is updated to 2.3.8.

  • Fixed an issue where the result was displayed incorrectly when executing the show create table command using DLF (DataLakeFormation) metadata.

  • The default parameters of Hive are optimized to improve the performance of Hive jobs.

  • The names of configuration items on the Hive-env tab of the Configure page for the Hive service in the E-MapReduce console are changed to uppercase for ease of use.

  • The error message that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is optimized.

HDFS

The data compression algorithm Zstandard is supported.

Flink

Flink is updated to 1.12-vvr-3.0.2.

Note

Flink is removed from Hadoop clusters.

Hudi

  • Hudi is updated to 0.8.0.

  • Hudi can be integrated with Spark SQL.

Spark

  • The names of configuration items on the Spark-defaults tab of the Configure page for the Spark service in the E-MapReduce console are optimized.

  • The performance of log collection is optimized.

  • The data compression algorithm Zstandard is supported.

Impala

The core dump error reported when you use HDFS is fixed.

Tez

The default parameters of Tez are optimized to improve the performance of Tez jobs.

Knox

  • Knox is adapted to Kudu.

  • Knox is adapted to Impala.

  • Knox is adapted to HBase.

Phoenix

The issue that no JDBC driver is found when Hive or Spark SQL is used to access Phoenix tables is fixed.

ClickHouse

The application performance management (APM) feature is available. This feature is used to support monitoring and alerting.

EMR-3.35.x

Release dates

EMR-3.35.0 April 21, 2021

Updates

Service

Changes

SmartData

SmartData is updated to 3.5.0.

For more information, see SmartData 3.5.x.

Spark

  • The issue that adaptive execution does not take effect in some scenarios is fixed.

  • The issue that statistical aggregate functions are used in different manners in Spark and Hive is fixed.

  • The issue that Spark cannot read valid data of the CHAR type from a Hive ORC table is fixed.

HDFS

The SM4 encryption algorithm is supported.

Hue

Hue is updated to 4.9.0.

Alluxio

Alluxio is updated to 2.5.0.

Druid

  • Druid is updated to 0.20.1.

  • Security is enhanced.

Livy

Livy is updated to 0.7.1.

EMR-3.34.x

Release dates

EMR-3.34.0 March 15, 2021

Updates

Service

Changes

SmartData

SmartData is updated to 3.4.0.

For more information, see SmartData 3.4.x.

Spark

  • Some default configurations are optimized.
  • Performance is optimized. Window-based top-k queries can be pushed down.
  • The capability of reading data from and writing data to Hive tables in the CSV or JSON format is enhanced.
  • All the column names of a table can be omitted in the ANALYZE statement.
  • LDAP authentication can be enabled or disabled with a click.
  • Spark Beeline is easier to use.

Hive

  • Some default configurations are optimized.

  • Performance is optimized. The cost-based optimization (CBO) feature is enhanced.

  • LDAP authentication can be enabled or disabled with a click.

  • Calcite is updated to 1.12.0.

  • The hive.security.authorization.sqlstd.confwhitelist.append parameter is added.

Presto

LDAP authentication can be enabled or disabled with a click.

YARN

The risk caused by unauthorized access from a Hadoop cluster to the YARN web UI is fixed. The issue is that you must explicitly specify user.name=name in the URL when you access the YARN web UI by using an SSH tunnel.

Zookeeper

ZooKeeper is updated to 3.6.2.

Flink

The config.sh file is updated during initialization to fix the HADOOP_CLASSPATH issue.

Impala

  • Impala is updated to 3.4.0.

  • Shiro is updated to 1.7.0.

  • Metadata stored in Alibaba Cloud Data Lake Formation (DLF) is supported.

  • Data in the Delta format can be queried.

  • LDAP authentication can be enabled or disabled with a click.

Tez

The default configurations are optimized.

HAS

The issue that admin.keytab cannot be reinitialized after an error occurs during the installation of HAS is fixed.

Ranger

  • The issue caused by filter pushdown in Spark is fixed.

  • The issue that prevents Presto from being enabled after you disable Presto in Ranger is fixed.

  • LDAP authentication can be enabled or disabled with a click.

Knox

The issue about the Knox link of Druid 0.20.0 is fixed.

Hue

LDAP authentication can be enabled or disabled with a click.

Hudi

  • SQL statements can be executed to query data in Hudi tables.
  • The issue that causes the query results on some data to be inaccurate is fixed.
  • Partition pruning is supported if you query data in a Copy on Write table of Hudi by using Spark.
  • The bucket-based index mechanism is supported to improve write performance.

Delta Lake

  • The issue that metadata cannot be synchronized to a Hive metastore based on an existing Delta table is fixed.
  • The issue that the MERGE command cannot parse * is fixed.
  • The issue that causes an error to be reported when you convert data in the Parquet format into a Delta table and create table metadata is fixed.
  • The issue that causes the OPTIMIZE command to fail if no files need to be compacted is fixed.
  • The MERGE syntax can be used to specify a subquery as a source command.
  • Data can be cached if you use Presto to query data in a Delta table. This improves query efficiency.
  • Impala can be used to query data in Delta tables.

Superset

  • The issue that prevents the admin user from logging on to the web UI is fixed.

  • Datasets are compatible with Druid clusters.

  • Spark SQL datasets are no longer supported.

Sqoop

Files in the Parquet format can be imported to OSS.

Alluxio

Alluxio is updated to 2.4.1.

Phoenix

Hive on Phoenix supports field settings.

Pig

Pig is removed.

EMR-3.33.x

Release dates

EMR-3.33.0 January 15, 2021

Updates

Service

Changes

SmartData

SmartData is updated to 3.2.0.

For more information, see SmartData 3.2.x.

Spark

  • Spark is updated to 2.4.7.

  • jQuery is updated to 3.5.1.

  • Table and partition sizes are automatically updated in a manner that is compatible with Hive.

  • Spark metadata and job running information can be sent to DataWorks.

Hive

  • Hive is updated to 2.3.7.

  • Metadata from Alibaba Cloud DLF in an HCatalog table is supported.

  • Hive metadata and job running information can be sent to DataWorks.

Metastore

  • The statistics feature for Hive is added.

  • Metadata from Alibaba Cloud DLF in an HCatalog table is supported.

  • Methods to obtain an STS token are optimized.

HDFS

jQuery is updated to 3.5.1.

YARN

  • jQuery is updated to 3.5.1.

  • Configurations of Fair Scheduler are adjusted.

  • Timeline Server is optimized.

Zeppelin

Zeppelin is updated to 0.9.0.

Ranger

  • The audit log configuration for Hive is added.

  • The Log4j Audit configuration is added.

OpenLDAP

  • The audit feature is added.

  • By default, the SSL port 10636 is enabled.

  • Presto can be enabled with one click.

Knox

  • The Spring vulnerability is fixed.

  • The issue on the Executors page of the Spark UI is fixed.

  • The issue on the job status page of Oozie is fixed.

Hue

Presto is supported.

Druid

Druid is updated to 0.20.0.

EMRHook

  • The EMRHook service is added.

  • hive-hook is used to send Hive metadata and job running information to DataWorks.

  • spark-hook is used to send Spark metadata and job running information to DataWorks.

EMR-3.32.x

Release dates

EMR-3.32.0 November 23, 2020

Updates

Service

Changes

SmartData

SmartData is updated to 3.1.0.

For more information, see SmartData 3.1.x.

Alluxio

  • Alluxio 2.4.0 is supported.

  • The default parameter configurations can be adjusted based on the size of cluster nodes.

  • By default, the HDFS service in the EMR cluster is used as the underlying UnderFS. This way, Alluxio is ready for use.

  • The Alluxio OSS UnderFS is enhanced to adapt to new features such as OSS versioning.

  • Alluxio is adapted to engines such as Hadoop, Hive, Spark, and Presto.

HUDI

Hudi 0.6.0 is supported.

Spark

The data collection feature of JindoTable can be enabled or disabled.

Hive

  • The issue that causes connection pool leakage in HiveServer is fixed.

  • The data collection feature of JindoTable can be enabled or disabled.

  • Optimize the performance of ADD COLUMN.

  • The issue that causes data in Hudi tables to be read incorrectly is fixed.

  • The default parameter configurations can be adjusted based on the size of cluster nodes.

HDFS

A larger number of snapshots are supported.

YARN

The default parameter configurations can be adjusted based on the size of cluster nodes.

Tez

The default parameter configurations can be adjusted based on the size of cluster nodes.

Sqoop

The issue about importing files in the Avro format is fixed.

EMR-3.30.x

Release dates

EMR-3.30.0 October 26, 2020

Updates

Service

Changes

SmartData

Upgrade to 3.0.0.

For more information, see Introduction to SmartData 3.0.x

Spark

  • Alibaba Cloud DLF (Data Lake Formation) metadata is supported.

  • HAS dependencies are updated to 2.0.1.

  • The backtick issue in Streaming SQL is fixed.

  • Delta JAR packages are removed. Delta is separately deployed.

  • The log path is modified to write all logs to HDFS.

Hive

  • Alibaba Cloud DLF (Data Lake Formation) metadata is supported.

  • The issue that a DUMMY file is written when an empty directory in a Delta table is read is fixed.

  • HAS dependencies are updated to 2.0.1.

Presto

  • Alibaba Cloud DLF (Data Lake Formation) metadata is supported.

  • The limitation on reading Delta tables is fixed.

  • The issue that JVM configurations are missing in security mode is fixed.

  • HAS dependencies are updated to 2.0.1.

HDFS

  • The hot-swappable disk mode is supported.

  • HAS dependencies are updated to 2.0.1.

YARN

  • The issue with YARN RMZKStateStore is fixed.

  • SNAPPY files output by SLS are supported.

  • The directory configuration of MapReduce Local mode is modified to resolve the directory permission check issue.

  • The hot-swappable disk mode is supported.

  • The log path is modified to write all logs to HDFS.

  • HAS dependencies are updated to 2.0.1.

Zookeeper

  • Internal IP addresses can be bound to Elastic Compute Service (ECS) instances to start service ports.

  • HAS dependencies are updated to 2.0.1.

Flink-Vvp

  • Flink-Vvp is updated to 1.11-2.2.2.

  • SQL and Autopilot features are supported.

Note

Only DataFlow clusters support Flink-Vvp. Hadoop clusters do not support Flink-Vvp.

Flink

  • The cache mode for writing data to OSS is supported. This mode, combined with the checkpoint feature of Flink and replayable sources, implements EXACTLY_ONCE semantics.

  • The features of Flink 1.11.1 in the community are synchronized. SQL supports multiple outputs (MULTI INSERT).

  • HAS dependencies are updated to 2.0.1.

Impala

  • Custom configurations for catalogd.flgs, impalad.flgs, and statestored.flgs are supported.

  • Shiro is updated to 1.6.0.

  • HAS dependencies are updated to 2.0.1.

Tez

  • The default memory parameters for AM are optimized.

  • HAS dependencies are updated to 2.0.1.

HAS

HAS dependencies are updated to 2.0.1.

Storm

Zeppelin

Ranger

OpenLDAP

Oozie

Knox

Kafka

HUE

HBase

Druid

EMR-3.29.x

Release dates

EMR-3.29.0 July 29, 2020

Updates

Service

Changes

Bigboot

  • Bigboot is updated to 2.7.301.

  • Jindo DistCp supports writing data to OSS in the Archive or Infrequent Access storage class.

  • The Fuse feature is enhanced to support multiple namespaces.

  • The metadata cache feature in Cache mode is improved.

Spark

  • Spark is updated to 2.4.5.2.0.

  • Third-party metastores are supported.

  • The datalake metastore-client is added.

Hive

  • Hive is updated to 2.3.5.6.0.

  • Third-party metastores are supported.

  • The datalake metastore-client is added.

Presto

Presto is updated to 338.

Ranger

  • The software package is updated to 1.2.0-1.5.0.

  • Presto 338 is supported.

  • Descriptions are added to configuration files.

HDFS

The reserved space size for datanodes is automatically adjusted.

Knox

Knox is adapted to Impala, high-version Flink, and PAI.

Druid

Druid is updated to 0.18.1.

SmartData

SmartData is updated to 2.7.301.

EMR-3.28.x

Release dates

EMR-3.28.0 June 12, 2020

New features

Service

Changes

Bigboot

  • The first version of JindoTable is released to provide statistics on the heat of tables or partitions.

  • Complete storage policies in Block mode are supported. Tiered storage policies are supported, including Infrequent Access and Archive.

  • The Jindo DistCp data migration tool is added.

  • Jindo Fuse is improved and fixed.

  • The integration of the JFS scheme in Cache mode with the Hive engine and Jindo JobCommitter is improved.

  • In Block mode, you can set a weight to directly read data from OSS. This helps reduce and share the overhead of reading data from the local cache.

  • JindoFS software modules are decoupled into Bigboot (control layer), Smartdata (distributed service), and JindoFS SDK. Each module can be independently upgraded and maintained.

Updates

Service

Changes

Flink

Open source Flink is upgraded to Ververica Platform Enterprise Edition, which is deeply customized based on open source Flink 1.10 and provides value-added features such as the self-developed Gemini storage engine.

Bigboot

Bigboot is updated to 2.7.0.

Delta

  • Delta is updated to 0.6.0.

  • Delta code is decoupled from Spark code.

Spark

  • Spark is updated to 2.4.5.

  • Streaming-sql scripts of DataFactory are compatible.

  • Delta 0.6.0 is supported.

Hive

Delta 0.6.0 is supported.

Ranger

  • Custom deployment of HDFS, Hive, and Spark is supported.

  • ranger-admin-site and ranger-ugsync-site can be configured in the console.

HDFS

The exception information of DataNodes is printed when no available DataNode is found during HDFS write operations (HDFS-9023).

Hue

  • The Hue component can be installed on gateway clusters.

  • Multiple Hue instances can be deployed on a single node.

DataFactory

Delta 0.6.0 is supported.

Druid

Druid is updated to 0.18.0.

Knox

  • Knox is updated to 1.1.0-1.0.7.

  • Knox is adapted to the HBase UI.

EMR-3.27.x

Release dates

Version

Date

EMR-3.27.0

April 29, 2020

EMR-3.27.1 (New purchases are not supported)

May 8, 2020

EMR-3.27.2 (New purchases are not supported)

May 20, 2020

New features

Feature

Changes

Custom component deployment

You can customize the deployment of components on master nodes. The following components are supported:

  • Hadoop

  • Spark

  • Hive

  • Zookeeper

  • Presto

Graceful shutdown for auto scaling

When graceful shutdown is enabled, nodes are not immediately released. Instead, they wait for tasks to complete within a specified time period before being released.

Updates

Service

Changes

Spark

  • Date type partition fields are supported in CUBE.

  • The stack depth of Spark-Submit is increased.

Delta

  • DDL-related syntax is enhanced, including CREATE, SHOW, DESCRIBE, and other related commands.

  • The Optimize syntax with ZOrder is supported.

Knox

  • Knox is adapted to the Druid UI.

  • Multiple master deployment is supported.

Hive

  • The direct committer in an HCatalog table is supported.

  • Some outdated default configurations are removed.

Bigboot

  • Bigboot is updated to 2.6.3.

  • Multiple master deployment is supported.

SmartData

  • SmartData is updated to 2.6.3.

  • Multiple master deployment is supported.

Ranger

  • The Solr component is supported.

  • PrestoSQL 311 is supported.

Tez

Setting scratchdir on OSS is supported.

Presto

Presto is updated to 331.

Druid

Druid is updated to 0.17.1.

Superset

Superset is updated to 0.35.2.

Sqoop

  • The MySQL JDBC JAR package is updated to 5.1.48.

  • MySQL direct export mode supports setting custom encoding through --mysql-charset.

EMR-3.26.x

Release dates

Version

Date

EMR-3.26.3 (New purchases are not supported)

April 16, 2020

Updates

Service

Changes

Bigboot

  • Bigboot is updated to 2.6.3.

  • OTS metadata and Namespace HA are supported.

SmartData

Hive

The direct committer in an HCatalog table is supported.

YARN

JindoOssCommitter is configured as the default committer.

HDFS

JindoFS-related configurations are updated.

Spark

JindoOssCommitter is configured as the default committer.

EMR-3.25.x

Release dates

EMR-3.25.0 January 13, 2020

New features

Ranger service: Ranger Presto operations are supported.

Updates

Service

Changes

Ranger

  • The RangerAdmin database is initialized for HA clusters.

  • The security issue in the RangerUserSync startup script is fixed.

Spark

  • You can configure Delta-related parameters such as spark.sql.extensions in the console.

  • Hive can read Delta tables without setting inputformat.

  • ALTER TABLE SET TBLPROPERTIES and UNSET TBLPROPERTIES statements are supported.

Delta

Hive

The issue that MR tasks fail in automatic LOCAL mode is fixed.

Presto

  • Presto is updated to 310.

  • joda-time is updated to 2.10.5.

Tez

  • Tez is updated to 0.9.2.

  • The issue that the application progress cannot be properly displayed in tez-ui is fixed.

  • The issue that the application history cannot be viewed in tez-ui is fixed.

Impala

The issue that Impala cannot access LZO tables is fixed.

HDFS

mongo-hadoop-related JAR packages are removed.

Zookeeper

ZooKeeper is updated to 3.5.6.

YARN

The yarn-site tab supports adding the configuration item yarn.resourcemanager.system-metrics-publisher.enabled=true to adapt to tez-ui.

Bigboot

  • Bigboot is updated to 2.2.3.

  • The rename operation is supported in OSS Cache mode.

SmartData

Knox

Dependency package versions are updated.

Oozie

Dependency package versions are updated.

EMR-3.24.x

Release dates

EMR-3.24.0 November 18, 2019

New features

Service

Changes

Delta

  • SQL syntax is supported, including ALTER, CONVERT, CREATE, CTAS, DELETE, DESC, INSERT, MERGE, OPTIMIZE, UPDATE, and VACUUM.

  • Optimize is built-in and optimized.

  • The Hive connector is supported.

  • Other existing open source features are supported.

Grafana

The component is added (Flink independent cluster), version 6.4.2.

Prometheus

The component is added (Flink independent cluster), version 2.13.0.

AlertManager

The component is added (Flink independent cluster), version 0.19.0.

TensorFlow on spark

  • TensorFlow can be placed on top of Spark, deeply integrating Spark with deep learning frameworks. This includes task scheduling and data exchange optimization solutions, providing a complete workflow from data preprocessing to deep learning training tasks.

  • Streaming-type tasks are supported.

Updates

Service

Changes

SmartData

  • JindoFS usage modes are optimized: Block mode remains unchanged; Cache mode not only supports the original usage but also is compatible with the original OSS file system usage method, supporting data caching and metadata caching, which can be controlled separately through configuration (both are disabled by default).

  • The read and write performance of Block mode and Cache mode is optimized.

  • Disk cleaning is optimized, providing more accurate statistics and timely cleaning of hot data cached on local disks, and strictly ensuring that disk usage does not exceed the quota.

  • Support for gateway clusters is improved, allowing Block mode and Cache mode to be used on gateways.

  • A deployment method where one storage cluster is separated from multiple computing clusters is supported.

Spark

  • Delta-related parameter support is added.

  • Support for Ranger spark plugin configuration is added.

  • JindoCube is upgraded to version 0.3.0.

Hive

  • SQL compatibility check functionality logic is added.

  • The Hive 2.3.5 + Hadoop 2.8.5 combination is released.

  • When restarting components, the content in hiveserver2-site.xml is no longer synchronized to hive-site.xml under spark-conf.

  • The MSCK command can be used to add incremental directories.

  • The bug that occurs when Hive reuses tez containers is fixed.

  • The MSCK command can be used to optimize column directories.

Bigboot

Bigboot is updated to 2.2.1, fixing native code support issues on some machine types.

Ranger

  • The Spark plugin deployment method is restructured.

  • The bug where header2 does not get the keytab in HA clusters is fixed.

Kudu

The startup logic is fixed.

Zookeeper

Four-letter command configuration is added and enabled by default.

HDFS

HDFS is adapted to JindoFS.

YARN

  • The default configuration yarn.scheduler.capacity.node-locality-delay is changed to -1.

  • YARN is adapted to JindoFS.

Has

OpenLDAP is used as the backend.

OpenLDAP

OpenLDAP is adapted to Has.

Presto

Presto is updated to 0.228.

Kafka

D1 bad disks are removed.

Druid

Druid is updated to 0.16.0.

Flume

Flume is updated to 1.9.0.

Flink

  • Flink is updated to 1.9.1.

  • Independent Flink clusters are supported (whitelist release).

EMR-3.23.x

Release dates

EMR-3.23.0 September 18, 2019

Updates

Service

Changes

Druid

  • Druid is updated to 0.15.1.

  • The router component is added.

  • fastjson is upgraded.

Spark

  • Spark thriftserver is updated to resolve class loader issues.

  • Spark transaction-related code is refactored to improve stability.

  • The ORC format read/write issue after upgrading builtin hive to version 2.3 is resolved.

  • The merge into syntax is supported.

  • The scan and stream syntax is supported.

  • Structured Streaming Kafka sink supports EOS.

  • Delta is updated to 0.4.0.

Hive

  • The old version of hive hook is removed.

  • Support for data skew processing optimization with multiple count distinct fields is added.

  • The issue of data loss when joining tables with different bucket versions is fixed.

Flink

Flink is updated to 1.8.2.

Bigboot

  • The small file tool is updated.

  • The OSS JAR is updated to resolve non-daemon thread issues.

Kafka

  • The Deployment Set awareness feature is added.

  • The fastjson dependency is removed.

HDFS

  • The SmartData OSS JAR package deployment logic is optimized.

  • The SmartData OSS JAR package is updated.

Flume

fastjson is upgraded.

Tensorflow on Spark

The service is added.

Has

fastjson is upgraded.

Livy

fastjson is upgraded.

EMR-3.22.x

Release dates

EMR-3.22.0 July 28, 2019

New features

Service

Changes

Kudu

  • The new component Kudu fills a functional gap in the Hadoop ecosystem, providing HBase-like fast data insertion and random access capabilities, allowing users to modify data, while also providing HDFS or Parquet-like massive data analysis and query capabilities.

    • C++ and Java APIs are provided for secondary development.

    • Integration with Impala, Spark, and Hive Metastore is provided.

  • The Kudu version is based on Apache Kudu 1.10.0.

OpenLDAP

  • The new component replaces ApacheDS, which is deprecated.

  • High availability is supported.

Updates

Component

Details

JindoFileSystem

  • Multiple storage modes

    • Block mode: Data is stored in blocks in the backend OSS storage, with metadata information maintained by the local Namespace service. Block mode offers better performance for both metadata and data operations. Block mode supports different storage policies, including WARM (local-replica, OSS-replica), COLD (OSS-replica only), HOT (multiple local replicas, OSS-replica), TEMP (local-replica only), and ALL_HDD (multiple local replicas). The default is WARM, and users can set different storage policies for directories based on different application scenarios.

    • Cache mode: This mode is mainly compatible with existing OSS storage methods. In Cache mode, files are stored as objects in OSS, and each file is cached locally for both data and metadata based on actual access patterns, improving access performance for both data and metadata. Cache mode provides different metadata synchronization strategies to meet user needs in different scenarios.

  • External client support

    • The client SDK provides the ability to access E-MapReduce JindoFS file systems from outside E-MapReduce clusters. Through the client, you can access the Namespace in Block mode, but external clients cannot utilize the data cache built by E-MapReduce JindoFS within E-MapReduce clusters, resulting in some performance differences compared to using it within E-MapReduce clusters.

    • Cache mode retains the semantics of original OSS storage and implements data cache acceleration through JindoFS within E-MapReduce clusters. Therefore, E-MapReduce cluster external access can be achieved directly through OSS clients, such as OSS SDK or E-MapReduce's OssFileSystem.

  • Ecosystem component support

    • JindoFS currently supports many computing engines on E-MapReduce, such as Spark, Flink, Hive, MapReduce, Impala, and Presto.

    • For compute and storage separation scenarios, job logs can also be stored in JindoFS, such as YARN Container logs and Spark Event logs.

    • JindoFS can serve as the backend storage for HBase HFiles, extending HBase's storage capabilities.

OssFileSystem

  • OssFileSystem adds automatic bad disk detection logic, fixing the issue where cache writes fail due to bad disks during OSS writes.

  • OssFileSystem-related configurations are completed.

Bigboot

  • Bigboot is updated to version 2.0.0.

  • Includes several major updates such as multiple Namespace support, local data blocks stored as large files, multi-mode storage support, and external client support.

  • Resolves the issue of incorrect Bigboot monitor status during machine restart.

  • Adds service specs for the Kudu component.

  • Adds validation for the correctness of various service specs.

Hadoop

  • HDFS

    • HDFS Federation adaptation, supporting the creation of HDFS Federation clusters through custom configurations and APIs, avoiding the need for a second Format when creating Federation clusters.

    • Optimized bad disk detection logic. For local disk scenarios, DataNode blockreport can trigger bad disk detection through dfsadmin.

  • YARN

    Fixed the issue where the MapReduce JobHistory job list does not update when Container logs are stored in JindoFS or OSS.

Spark

  • Relational Cache

    Relational Cache is supported, which accelerates user queries through pre-computation. Users can create Relational Cache to pre-compute data, and when executing user queries, Spark Optimizer automatically discovers suitable caches and rewrites SQL execution plans to continue computation based on cached data, thereby improving query speed. This is suitable for reports, dashboards, data synchronization, and multidimensional analysis scenarios.

    • Through DDL, operations such as CACHE, UNCACHE, ALTER, SHOW can be performed. Cached data supports all Spark data sources and data formats.

    • Supports automatic cache data updates and manual updates through the REFRESH command, with support for incremental updates based on partitions.

    • Supports execution plan optimization based on Relational Cache.

  • Streaming SQL

    • Standardizes Stream Query Writer parameter configurations.

    • Optimizes Kafka data table Schema compatibility checks.

    • Automatically creates Schema in SchemaRegistry when Kafka data table Schema does not exist.

    • Optimizes log information when Kafka Schema is incompatible.

    • Fixes the issue where column names must be explicitly specified when writing query results to Kafka tables.

    • Removes the limitation that streaming SQL queries only support Kafka and Loghub data input sources.

  • Delta

    Delta is added, allowing users to create Delta datasources using Spark to support streaming data writes, transactional read/write, data validation, and data rollback application scenarios. For details, see Delta details.

    • Supports using DataFrame API to read data from Delta or write data to Delta.

    • Supports using Structured Streaming API with Delta as source or sink for data reading or writing.

    • Supports using Delta API for update, delete, merge, vacuum, optimize, and other operations on data.

    • Supports using SQL to create Delta-based tables, import data to Delta, and read Delta tables.

  • Others

    • Constraint feature, supporting primary keys and foreign keys.

    • Resolves jar conflict issues with servlet and other libraries.

Flink

log4j log rotation is supported.

Kafka

  • log4j log rotation is supported.

  • fastjson is upgraded.

Zeppelin

The commons-lang3 package is upgraded to version 3.7, fixing the issue where pyspark cannot write to OSS. For details, see Spark 2.4 incompatibility with commons-lang3 in Zeppelin.

Ranger

Support for Show grants is added.

Analytics-Zoo

The NumPy installation error issue is fixed.

Impala

Compatibility with Apache Kudu 1.10.0 is added.

Presto

Presto is updated to version 0.221.

ZooKeeper

ZooKeeper is updated to version 3.5.5.

EMR-3.22.x and earlier versions

EMR-3.1.1

  • The OS is upgraded to CentOS 7.2.

  • Spark is upgraded to version 2.1.1.

  • emr-core is upgraded to version 1.2.6.

  • The defect in OSS password-free operations is fixed.

EMR-3.0.2

  • emr-core is upgraded to version 1.2.5.

  • OSS password-free support is extended to more regions.

  • The role AccessKey replacement strategy is adjusted.

  • Some defects related to Hive and Hadoop are fixed.

EMR-3.0.1

  • Interactive support is added, unified table management is supported, and an external unified database is used to save Hive meta. All clusters using external Hive meta share the same meta information.

  • emr-core is upgraded to version 1.2.4, optimizing OSS read and write performance.

  • Spark is upgraded to version 2.0.2.

Note

This version is fully compatible with EMR-3.0.0.

EMR-3.0.0

This is the first release of the EMR version.