All Products
Search
Document Center

E-MapReduce:Release notes for EMR V5.5.X

Last Updated:Apr 27, 2023

This topic describes the release notes for E-MapReduce (EMR) V5.5.X, including the release dates, updates, and release version information.

Release dates

Version

Date

EMR V5.5.1

March 25, 2022

EMR V5.5.0

February 15, 2022

Updates

EMR V5.5.1

Note

Only OLAP clusters support this version. You can create an OLAP cluster in the new EMR console.

Service

Description

ClickHouse

The default values of some parameters are changed.

StarRocks

StarRocks is updated to 2.1.1.

EMR V5.5.0

Service

Description

SmartData

The service is no longer used.

Bigboot

RSS

  • EMR Remote Shuffle Service (ESS) is upgraded to Remote Shuffle Service (RSS).

  • The service functionality and stability are enhanced.

JindoSDK

  • The architecture of SmartData is upgraded to JindoData.

  • EMR is integrated with JindoSDK for JindoData 4.0.0 for the first time. JindoData connects to Alibaba Cloud Object Storage Service (OSS) and the Alibaba Cloud OSS-HDFS service.

Spark

  • An IF expression can be used in the COUNT DISTINCT function, and the CASE WHEN syntax for the COUNT DISTINCT function is optimized.

    To use this feature, set spark.sql.optimizer.rewriteConditionalDistinctAggregates to true.

  • Fallback from Shuffle Hash Join to Sort Merge Join is supported.

    To use this feature, set spark.sql.join.preferSortMergeJoin to false and set spark.sql.join.enableShuffledHashJoinFallback to true.

  • Automatic merging of small files in non-dynamic partitions is supported.

    To use this feature, set spark.sql.adaptive.merge.output.small.files.enabled to true.

  • The concurrency in scenarios in which the GROUPING SETS clause or the DISTINCT function is used can be automatically adjusted.

    To use this feature, set spark.sql.execution.optimizeExpand to true.

  • Hive on Spark is optimized.

  • The syntax of the time travel feature is supported.

  • Spark is adapted to JindoSDK.

Tez

Tez is adapted to JindoSDK.

Hive

  • The issue about batch deletion that occurs on Hive Jindo is fixed.

  • The out of memory (OOM) issue that occurs on HiveServer2 is fixed.

  • Hive on Spark is optimized.

  • Hive is adapted to JindoSDK.

Presto

  • Presto is updated to 358.

  • By default, the MySQL, Iceberg, Hudi, Phoenix, Kudu, and Delta connector types are added. The default configurations of these connector types are updated.

  • Data lake analysis is supported.

  • User-defined functions (UDFs) can be dynamically loaded.

  • Presto is adapted to JindoSDK.

Delta Lake

  • Version update

    • Delta Lake is updated to 1.1.0 and compatible with Spark 3.2.0.

    • All features that are released for commercial use are migrated to Delta Lake 1.1.0.

  • Metadata management

    • The feature of synchronizing metadata changes to metastores is optimized.

    • The statistics on table data are automatically reported to metastores.

  • SQL

    • The syntax of the time travel feature is supported.

    • The DROP PARTITION SQL syntax is supported.

    • SQL statements can be executed to overwrite data in dynamic partitions.

    • The ADD COLUMN statement can be used to add columns to specified locations (FIRST and AFTER).

  • Enhanced table management capabilities

    • The file size can be dynamically adjusted based on the table size. By default, this feature is enabled.

    • The auto-vacuum feature is supported and enabled by default. Concurrent vacuum operations are supported.

    • The logic of automatic compaction is optimized. By default, the automatic compaction feature is disabled.

    • The Z-ordering syntax is added. Z-ordering-based data processing is accelerated.

Hudi

  • Hudi is updated to 0.10.0.

  • Spark 3.2.0 is supported.

  • JindoFS in block storage mode is supported.

HDFS

HDFS is adapted to JindoSDK.

YARN

  • YARN is adapted to RSS memory configurations.

  • YARN is adapted to JindoSDK.

Flume

Flume is adapted to JindoSDK.

Impala

Impala is adapted to JindoSDK.

Ranger

  • Spark 3.2.0 is supported.

  • Presto 358 is supported.

HBase

  • The issue about default parameter settings is fixed.

  • The issue about the date format of garbage collection (GC) logs is fixed.

ClickHouse

  • The Hadoop Distributed File System (HDFS) and OSS disk types are added to support the separation of hot and cold data. For more information, see Separate hot and cold data by using HDFS and Separate hot and cold data by using OSS.

  • When you use an engine of the Replicated*MergeTree type, the following disk types are supported: zero copy provided by Amazon S3, OSS, and HDFS.

  • The logic of processing data when the ClickHouse component stops working is optimized.

Iceberg

  • Iceberg is updated to 0.13.0.

  • Presto 358 is supported.

DLF-Auth

  • Spark 3.2.0 is supported.

  • Presto 358 is supported.

Release version information

Note

To view the information about an online analytical processing (OLAP) cluster, you must log on to the new EMR console.

Hadoop clusters

Service

Version

HDFS

3.2.1

YARN

3.2.1

Hive

3.1.2

Spark

3.2.0

Knox

1.1.0

Tez

0.9.2

Ganglia

3.7.2

Sqoop

1.4.7

DLF-Auth

1.0.4

Iceberg

0.13.0

Hudi

0.10.0

Delta Lake

1.1.0

OpenLDAP

2.4.44

Hue

4.9.0

JindoSDK

4.0.0

HBase

2.3.4

ZooKeeper

3.6.3

Presto

358

Impala

3.4.0

Zeppelin

0.10.2

Flume

1.9.0

Livy

0.7.1

Superset

0.36.0

Ranger

2.1.0

RSS

1.0.0

Alluxio

2.5.0

Kudu

1.14.0

Oozie

5.2.1

ClickHouse clusters

Service

Version

ZooKeeper

3.6.3

Ganglia

3.7.2

ClickHouse

21.3.13.9

Shuffle Service clusters

Service

Version

RSS

1.0.0

OLAP clusters

Service

Version

ClickHouse

21.3.13.9.2.9

StarRocks

2.1.1

ZooKeeper

3.6.3