All Products
Search
Document Center

E-MapReduce:Release notes for EMR V4.X series

Last Updated:Mar 28, 2025
Important

No more new versions of E-MapReduce (EMR) V4.X series are will be released.

EMR V4.10.X

Release dates

March 23, 2022 for EMR V4.10.0

Updates

Service

Description

SmartData

The services are no longer used.

BIGBOOT

JindoSDK

  • The architecture of JindoSDK is upgraded to JindoData.

  • EMR is integrated with JindoSDK for JindoData 4.0.0 for the first time. JindoData connects to Alibaba Cloud Object Storage Service (OSS) and the Alibaba Cloud OSS-HDFS service.

Spark

  • Spark is updated to 2.4.8.

  • The issue that adaptive execution does not take effect in some scenarios is fixed.

  • The issue that statistical aggregate functions are used in different manners in Spark and Hive is fixed.

  • The issue that Spark cannot read valid data of the CHAR type from a Hive ORC table is fixed.

  • The default configurations of Thrift Server are optimized.

  • In the EMR console, the parameter names on the spark-defaults subtab of the Configure tab for the Spark service are optimized.

  • Hive on Spark is optimized.

  • The array-index out of bounds error that is returned when some required statistics for Adaptive Query Execution (AQE) are missing is fixed.

  • Errors related to AQE and data caching in specific scenarios are fixed.

  • Log4j Metrics Appender is removed because the configuration is invalid.

  • The null pointer exception that occurs when SparkContext is started is fixed.

  • The data compression algorithm Zstandard is supported.

Hive

  • The issue about HiveServer2 memory leaks caused by user-defined functions (UDFs) is fixed.

  • The issue that the output of the Show Create Table command based on Data Lake Formation (DLF) metadata is inaccurate is fixed.

  • The default parameters of Hive are optimized to improve the performance of Hive jobs.

  • In the EMR console, the parameter names on the hive-env subtab of the Configure tab for the Hive service are changed to uppercase. This facilitates the use of the parameters.

  • The issue that occurs because of the incompatibility between the file system and Hive Metastore when you write data to a Hive table is fixed.

  • In JindoFS in block storage mode, the metadata of multiple Hive tables can be optimized at the same time. By default, this feature is disabled.

Ranger

  • The warning error contained in logs about starting Spark in Ranger is fixed.

  • The issue that user information fails to be automatically synchronized after Ranger is connected to a Lightweight Directory Access Protocol (LDAP) server is fixed.

HDFS

  • The data compression algorithm Zstandard is supported.

  • By default, the reserved space of NameNode adaptively increases. This way, NameNode enters the Safe mode at the earliest opportunity when the disk space is insufficient.

YARN

  • Information about app IDs, CPU utilization, and memory usage is added to the RESTful APIs of containers for nodes.

  • The issue that the Application Master (AM) logs of an automatically released node cannot be viewed is fixed.

  • The issue that a cluster cannot be accessed due to historical state store data is fixed.

  • The data of a node that is automatically released based on the decommissioning logic of auto scaling can be deleted.

  • The Graceful Decommission logic of auto scaling is optimized. The node on which NodeManager runs is marked deprecated only after the NodeManager process is complete.

Knox

  • Knox is adapted to Kudu.

  • Knox is adapted to HBase.

  • The issue that the first access to the Spark UI fails is fixed.

Tez

The default parameters of Tez are optimized to improve the performance of Tez jobs.

Sqoop

The issue that precision loss for the DECIMAL data type occurs when you use Sqoop to import data to HCatalog tables is fixed.

Delta Lake

  • Metadata management

    • The built-in catalog of Spark, instead of an API operation that is called by using the Hive CLI, is used to synchronize metadata and partition information.

    • The statistics on table data are automatically reported to metastores.

  • SQL

    • The syntax of the time travel feature is supported.

    • The DROP PARTITION SQL syntax is supported.

    • The ADD COLUMN statement can be used to add columns to specified locations (FIRST and AFTER).

  • Enhanced table management capabilities

    • The file size can be dynamically adjusted based on the table size. By default, this feature is enabled.

    • The auto-vacuum feature is supported and enabled by default. Concurrent vacuum operations are supported.

    • The logic of automatic compaction is optimized. By default, the automatic compaction feature is disabled.

    • The Z-ordering syntax is added. Z-ordering-based data processing is accelerated.

Hudi

  • Hudi is updated to 0.10.0.

  • The issue about the compatibility of sql.extension between Delta Lake and Hudi is fixed.

Iceberg

The Iceberg service is added.

The version 0.13.0.

Hue

  • The issue that garbled characters are displayed when Hue is used to query historical records is fixed.

  • The UI display exception that occurs when you use Hue together with Oozie is fixed.

  • The issue that YARN Job Browser sometimes cannot present or terminate jobs is fixed.

  • YARN Job Browser is accessible by default.

  • The Presto protocol is supported by default.

DLF-Auth

The DLF-Auth service is added.

The version is 1.0.4.

HBase

  • The time required to restart HBase in a high-security cluster is reduced.

  • The issue that Spark 3.1.1 cannot be integrated with HBase is fixed.

  • The Graceful Stop process is optimized.

ZooKeeper

ZooKeeper is updated to 3.6.3.

Presto

  • Presto is updated to 358.

  • UDFs can be dynamically loaded. For more information, see Dynamically load UDFs.

  • Data lake analysis is supported.

Impala

  • The issue that the LIST operation is repeatedly performed on directly deleted OSS partition directories is fixed.

  • The following issue is fixed: The no such method error message appears when you query data in DLF metadata tables.

Zeppelin

Zeppelin is updated to 0.10.0.

Oozie

The issue that Jetty Server of Oozie fails to start due to JAR package conflicts in high availability (HA) scenarios is fixed.

EMR V4.9.X

Release dates

April 21, 2021 for EMR V4.9.0

Updates

Service

Description

SmartData

SmartData is updated to 3.5.0.

For more information, see SmartData 3.5.X.

Spark

  • The issue that adaptive execution does not take effect in some scenarios is fixed.

  • The issue that statistical aggregate functions are used in different manners in Spark and Hive is fixed.

  • The issue that Spark cannot read valid data of the CHAR type from a Hive ORC table is fixed.

HDFS

The SM4 encryption algorithm is supported.

Hue

Hue is updated to 4.9.0.

Alluxio

Alluxio is updated to 2.5.0.

Livy

Livy is updated to 0.7.1.

EMR V4.8.X

Release dates

March 15, 2021 for EMR V4.8.0

Updates

Service

Description

SmartData

SmartData is updated to 3.4.0.

For more information, see SmartData 3.4.X.

Spark

  • Some default configurations are optimized.

  • Performance is optimized. Window-based top-k queries can be pushed down.

  • The capability of reading data from and writing data to Hive tables in the CSV or JSON format is enhanced.

  • All the column names of a table can be omitted in the ANALYZE statement.

  • LDAP authentication can be enabled or disabled with a click.

  • Spark Beeline is easier to use.

Hive

  • Some default configurations are optimized.

  • Performance is optimized. The cost-based optimization (CBO) feature is enhanced.

  • LDAP authentication can be enabled or disabled with a click.

YARN

The risk caused by unauthorized access from a Hadoop cluster to the YARN web UI is fixed. If you access the YARN web UI by using SSH Tunnel, you no longer need to explicitly specify user.name in the URL.

Tez

Some default configurations are optimized.

Ranger

  • The issue caused by filter pushdown in Spark is fixed.

  • The issue that prevents Presto from being enabled after you disable Presto in Ranger is fixed.

  • LDAP authentication can be enabled or disabled with a click.

Hue

LDAP authentication can be enabled or disabled with a click.

Impala

  • Impala is updated to 3.4.0.

  • Shiro is updated to 1.7.0.

  • Metadata stored in DLF is supported.

  • Data in the Delta format can be queried.

  • LDAP authentication can be enabled or disabled with a click.

  • The exception that occurs when you use the INSERT OVERWRITE statement to overwrite data stored in OSS is fixed.

Hudi

  • SQL statements can be executed to query data in Hudi tables.

  • The issue that causes the query results on some data to be inaccurate is fixed.

  • Partition pruning is supported if you query data in a Copy on Write table of Hudi by using Spark.

  • The bucket-based index mechanism is supported to improve write performance.

Delta Lake

  • The issue that prevents metadata from being synchronized to a Hive metastore based on existing Delta tables is fixed.

  • The issue that prevents the MERGE statement from parsing asterisks (*) in data is fixed.

  • The issue that causes an error to be reported when you convert data in the Parquet format into a Delta table and create table metadata is fixed.

  • The issue that causes the OPTIMIZE command to fail if no files need to be compacted is fixed.

  • A subquery can be used as the source in the MERGE statement.

  • Data can be cached if you use Presto to query data in a Delta table. This improves query efficiency.

  • Impala can be used to query data in Delta tables.

EMR Remote Shuffle Service (ESS)

  • Exceptions in the shuffle read stage, such as ClosedChannelException, IndexOutOfBoundsException, and excessive off-heap memory usage, are fixed.

  • The issue that causes NullPointerException (NPE) to be reported after metric monitoring is enabled is fixed.

HAS

The issue that prevents the admin.keytab file from being initiated again after a HAS installation error is reported is fixed.

Presto

LDAP authentication can be enabled or disabled with a click.

HBase

  • HBase is updated to 2.2.6.

  • Access control based on Ranger is no longer supported.

Sqoop

Files in the Parquet format can be imported to OSS.

Superset

  • The issue that prevents the admin user from logging on to the web UI is fixed.

  • Datasets are compatible with Druid clusters.

  • Spark SQL datasets are no longer supported.

Knox

  • Access to Presto by using Knox is supported.

  • The issue that causes the Druid web UI to be inaccessible is fixed.

  • The limit that you can access the Ranger web UI based on HTTP only by using Knox in high security mode is removed.

EMR V4.6.X

Release dates

January 15, 2021 for EMR V4.6.0

Updates

Service

Description

SmartData

SmartData is updated to 3.2.0.

For more information, see SmartData 3.2.X.

Spark

  • Spark is updated to 2.4.7.

  • jQuery is updated to 3.5.1.

  • Spark is compatible with Hive to automatically update table and partition sizes.

  • Spark metadata and job running information can be sent to DataWorks.

Hive

  • Metadata from DLF in an HCatalog table is supported.

  • Hive metadata and job running information can be sent to DataWorks.

Metastore

  • The statistics feature for Hive is added.

  • Metadata from DLF in an HCatalog table is supported.

  • Methods to obtain a Security Token Service (STS) token are optimized.

HDFS

  • jQuery is updated to 3.5.1.

  • HDFS is updated to 3.2.1.

YARN

  • YARN is updated to 3.2.1.

  • jQuery is updated to 3.5.1.

  • Configurations of Fair Scheduler are adjusted.

  • Timeline Server is optimized.

Zeppelin

Zeppelin is updated to 0.9.0.

OpenLDAP

  • The audit feature is added.

  • By default, the Secure Sockets Layer (SSL) port 10636 is enabled.

  • OpenLDAP can be enabled with one click.

Hue

Presto is supported.

EMR-HOOK

  • The EMRHook service is added.

  • hive-hook is used to send Hive metadata and job running information to DataWorks.

  • spark-hook is used to send Spark metadata and job running information to DataWorks.

EMR V4.5.X

EMR V4.5.1

Release dates

December 13, 2020

Updates

  • The issue that occurs when you query partitioned tables by using Hive or Presto is fixed.

  • EMR V4.5.1 is available only in the China (Hangzhou), China (Shanghai), and China (Beijing) regions.

EMR V4.5.0

Release dates

December 7, 2020

New features

Service

Description

ESS

ESS 1.0.0 is supported.

For more information, see ESS.

Hudi

Hudi 0.6.0 is supported.

Delta Lake

Delta Lake 0.6.1 is supported.

Updates

Service

Description

Ranger

  • Ranger is updated to 2.1.0.

  • Ownership-related permissions are supported.

Presto

  • Presto is updated to 338.

  • Metadata stored in Alibaba Cloud DLF is supported.

Zeppelin

Zeppelin is updated to 0.8.2.

SmartData

SmartData is updated to 3.1.0.

For more information, see SmartData 3.1.X.

Bigboot

Bigboot is updated to 3.1.0.

Hive

  • Metadata stored in Alibaba Cloud DLF is supported.

  • Ownership-related permissions of Ranger are supported.

Spark

Metadata stored in Alibaba Cloud DLF is supported.

DLF Metastore

  • The issue that Presto cannot be started in a high-security cluster is fixed.

  • Hive 3 and metadata caching are supported.

  • The issue that occurs when you query data by using Hive or Presto is fixed.

Impala

Custom configuration of parameters on the catalogd.flgs, impalad.flgs, and statestored.flgs subtabs is supported in the EMR console.

Tez

Vulnerabilities related to autoDeploy on the web UI of Tez are fixed.

OpenLDAP

A rule that determines whether port 10389 is in the waiting state is added.

Hue

Security vulnerabilities of the MySQL backend are fixed.

Kerberos

  • Apache Kerby is updated to 2.0.1.

  • The issue that the kadmin principal of an external Kerberos cluster cannot be customized is fixed.

Sqoop

  • File formats such as Parquet, Avro, and ORC are supported.

  • Metadata stored in Alibaba Cloud DLF is supported.

EMR V4.4.X

Release dates

September 15, 2020 for EMR V4.4.1

Updates

Service

Description

YARN

  • The hadoop/tools/lib directory is deleted from the value of the yarn.application.classpath parameter.

  • Default parameter settings for MapReduce jobs are optimized.

Hive

Default parameter settings are optimized.

Tez

Ranger

  • Impala-based access control is supported.

  • The jackson-databind version is updated.

Impala

  • Integration with Ranger is supported.

  • Shiro is updated to 1.6.0.

SmartData

SmartData and Bigboot are updated to 2.7.301.

Bigboot

Knox

  • The web UI of Tez can be independently viewed. Knox is compatible with Tez on the web UI of YARN.

  • Shiro is updated to 1.6.0.

EMR Doctor

The following issue is fixed: Job information is not collected if a time-based configuration file is empty.

Ganglia

The detection feature of the service RPC port for Hadoop Distributed File System (HDFS) is enabled.

Oozie

  • The issue that the web UI cannot be accessed is fixed.

  • The jackson-databind version is updated.

ZooKeeper

Internal IP addresses can be bound to Elastic Compute Service (ECS) instances to start service ports.

Superset

The startup script is repaired.

Livy

The versions of jackson-databind and Fastjson are updated.

Zepplin

The versions of jackson-databind and Shiro are updated.

HAS

The versions of jackson-databind and Fastjson are updated.

Flume

The Fastjson version is updated.

EMR V4.3.X

Release dates

May 20, 2020 for EMR V4.3.0

Updates

Service

Description

Ranger

  • Custom deployment of HDFS, Hive, and Spark plug-ins is supported. You can enable plug-ins on required service nodes.

  • The RangerUserSync and RangerAdmin components can be configured in the EMR console.

Presto

The Kudu client is updated.

Spark

  • Spark is updated to 2.4.5.

  • The associated Delta Lake is updated to 0.6.0.

  • The issue that PySpark cannot properly run after Ranger Hive is enabled is fixed.

HDFS

  • The issue that the HDFS_NAMENODE_OPTS parameter does not take effect is fixed.

  • Custom deployment is supported.

YARN

Custom deployment is supported.

Hive

Custom deployment is supported.

Knox

Information on the web UI of HDFS NameNode in Hadoop 3.X can be viewed.

Zeppelin

The issue that a zepping.keytab file fails to be generated is fixed.

Kafka

Kafka is updated to 2.4.1.

Kudu

Kudu is updated to 1.11.1.

Impala

Issues related to HAProxy are fixed.

Livy

Issues related to xmllint are fixed.

Hue

  • Hue can be deployed on gateway clusters.

  • Multiple Hue instances can be enabled on a single node.