All Products
Search
Document Center

E-MapReduce:Release notes for EMR V5.2.X

Last Updated:Apr 27, 2023

E-MapReduce (EMR) V5.2.1 is the first stable version of the EMR V5.X series. This topic describes the release notes for EMR V5.2.X, including the release date, updates, and release version information.

Release date

July 16, 2021 for EMR V5.2.1

Updates

Service

Description

SmartData

SmartData is updated to 3.6.1. For more information, see SmartData 3.6.X.

Hive

  • The issue that the output of the show create table command based on Data Lake Formation (DLF) metadata is inaccurate is fixed.

  • The default parameters of Hive are optimized to improve the performance of Hive jobs.

  • In the EMR console, the parameter names on the hive-env tab of the Configure tab for the Hive service are changed to uppercase. This facilitates the use of the parameters.

  • The issue that user-defined functions (UDFs) cause HiveServer2 memory leak is fixed.

  • The error message that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is optimized.

HDFS

The data compression algorithm Zstandard is supported.

Delta Lake

  • Delta Lake is updated to 0.8.0.

  • Spark 3 is supported.

Flink

Flink is updated to 1.12-vvr-3.0.2.

Hudi

  • Hudi is updated to 0.8.0.

  • Hudi can be integrated with Spark SQL.

Spark

Important

In EMR V5.2.1, Spark (3.1.1) and Kudu (1.11.1) are incompatible with each other.

  • Delta Lake and Hudi are supported.

  • Remote Shuffle Service is supported.

  • Livy is supported.

  • In the EMR console, the parameter names on the spark-defaults tab of the Configure tab for the Spark service are optimized.

  • The cost-based optimization (CBO), dynamic partition pruning, and Z-order features are optimized. The performance of these features is 50% higher than in Spark 3.

  • Log Service, DataHub, and Message Queue for Apache RocketMQ can be used as data sources.

Tez

The default parameters of Tez are optimized to improve the performance of Tez jobs.

Ranger

  • The warning error contained in logs about starting Spark in Ranger is fixed.

  • The issue that user information fails to be automatically synchronized after Ranger is connected to a Lightweight Directory Access Protocol (LDAP) server is fixed.

Knox

  • Knox is adapted to Kudu.

  • Knox is adapted to HBase.

Kafka

  • The Cruise Control component can be used to provide the balance feature for Kafka clusters.

  • Disks for Kafka clusters are hot-swappable. You can replace a damaged disk without the need to stop the Kafka broker of your cluster.

  • The default values of some parameters are changed.

Phoenix

The issue that no JDBC driver is found when Hive or Spark SQL is used to access Phoenix tables is fixed.

EMR Remote Shuffle Service (ESS)

Spark 3 is supported.

Release version information

Hadoop clusters

Service

Version

HDFS

3.2.1

YARN

3.2.1

Hive

3.1.2

Spark

3.1.1

Knox

1.1.0

Tez

0.9.2

Ganglia

3.7.2

Sqoop

1.4.7

SmartData

3.6.1

Bigboot

3.6.1

Hudi

0.8.0

OpenLDAP

2.4.44

Hue

4.9.0

HBase

2.3.4

ZooKeeper

3.6.2

Presto

338

Impala

3.4.0

Zeppelin

0.9.0

Flume

1.9.0

Livy

0.7.1

Superset

0.36.0

Ranger

2.1.0

Storm

1.2.2

ESS

1.0.0

Alluxio

2.5.0

Kudu

1.11.1

Oozie

5.1.0

Shuffle Service clusters

Service

Version

ZooKeeper

3.6.2

Ganglia

3.7.2

Kafka

2.4.1

Kafka Manager

1.3.3.16

OpenLDAP

2.4.44

knox

1.1.0

Ranger

2.1.0

Kafka clusters

Service

Version

ESS

1.0.0