E-MapReduce (EMR) V5.2.0 is the first stable version of the EMR V5.X series. This topic describes the release notes for EMR V5.2.X, including the release date, updates, and release version information.

Release date

July 16, 2021 for EMR V5.2.1

Updates

Service Description
SmartData SmartData is updated to 3.6.1.

For more information, see SmartData 3.6.X overview.

Hive
  • The issue that the output of the show create table command based on Data Lake Formation (DLF) metadata is inaccurate is fixed.
  • The default parameters of Hive are optimized to improve the performance of Hive jobs.
  • In the EMR console, the parameter names on the hive-env tab of the Configure tab for the Hive service are changed to uppercase. This facilitates the use of the parameters.
  • The issue that user-defined functions (UDFs) cause HiveServer2 memory leak is fixed.
  • The error that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is fixed.
HDFS The data compression algorithm Zstandard is supported.
Delta Lake
  • Delta Lake is updated to 0.8.0.
  • Spark 3 is supported.
Flink Flink is updated to 1.12-vvr-3.0.2.
Hudi
  • Hudi is updated to 0.8.0.
  • Hudi can be integrated with Spark SQL.
Spark
Notice In EMR V5.2.1, Spark (3.1.1) and Kudu (1.11.1) are incompatible with each other.
  • Delta Lake and Hudi are supported.
  • Remote Shuffle Service is supported.
  • Livy is supported.
  • In the EMR console, the parameter names on the spark-defaults tab of the Configure tab for the Spark service are optimized.
  • The cost-based optimization (CBO), dynamic partition pruning, and Z-order features are optimized. The performance of these features is 50% higher than in Spark 3.
  • Log Service, DataHub, and Message Queue for Apache RocketMQ can be used as data sources.
Tez The default parameters of Tez are optimized to improve the performance of Tez jobs.
Ranger
  • The warning error contained in logs about starting Spark in Ranger is fixed.
  • The issue that user information fails to be automatically synchronized after Ranger is connected to a Lightweight Directory Access Protocol (LDAP) server is fixed.
Knox
  • Knox is adapted to Kudu.
  • Knox is adapted to HBase.
Kafka
  • The Cruise Control component can be used to provide the balance feature for Kafka clusters.
  • Disks for Kafka clusters are hot-swappable. You can replace a damaged disk without the need to stop the Kafka broker of your cluster.
  • The default values of some parameters are changed.
Phoenix The issue that no JDBC driver is found when Hive or Spark SQL is used to access Phoenix tables is fixed.
EMR Remote Shuffle Service (ESS) Spark 3 is supported.

Release version information

Service Version
HDFS 3.2.1
YARN 3.2.1
Hive 3.1.2
Spark 3.1.1
Knox 1.1.0
Tez 0.9.2
Ganglia 3.7.2
Sqoop 1.4.7
SmartData 3.6.1
Bigboot 3.6.1
Hudi 0.8.0
OpenLDAP 2.4.44
Hue 4.9.0
HBase 2.3.4
ZooKeeper 3.6.2
Presto 338
Impala 3.4.0
Zeppelin 0.9.0
Flume 1.9.0
Livy 0.7.1
Superset 0.36.0
Ranger 2.1.0
Storm 1.2.2
ESS 1.0.0
Alluxio 2.5.0
Kudu 1.11.1
Oozie 5.1.0
Service Version
ESS 1.0.0