All Products
Search
Document Center

E-MapReduce:Release notes for EMR V3.38.X

Last Updated:Apr 27, 2023

This topic describes the release notes for E-MapReduce (EMR) V3.38.X, including the release dates, updates, and release version information.

Release dates

Version

Date

EMR V3.38.3

December 2021

EMR V3.38.2

December 2021

EMR V3.38.1

November 2021

EMR V3.38.0

October 2021

Updates

EMR V3.38.3

The Log4j security vulnerability of all related components is fixed. For more information, see Vulnerability announcement | RCE vulnerability in Apache Log4j 2.

Service

Description

Presto

  • The error that is reported when Presto in a high availability cluster is used to query data in a Hudi table is fixed.

  • The Log4j security vulnerability of the Elasticsearch connector is fixed.

DLF Metastore

  • By default, the logging feature for metastores is disabled. In earlier versions, the feature is enabled by default.

  • The error caused by an excessively long getTableStats URI for a metastore is fixed.

Delta Lake

The issue that schema changes fail to be synchronized to a metastore is fixed.

Flink

  • VVR is updated to 4.0.11. This version supports the following features:

    • The commercial features of Flink Change Data Capture (CDC) are released:

      • Schema evolution is supported.

      • Flink SQL semantics can be used to synchronize an entire database at a time.

    • GeminiStateBackend can be used to store state data in Object Storage Service (OSS).

  • A Hudi connector of Enterprise Edition is provided, and a built-in Data Lake Formation (DLF) catalog is used in the connector to manage metadata.

Sqoop

The following issue is fixed: Precision loss for the DECIMAL data type occurs when you use Sqoop to import data to HCatalog tables.

EMR V3.38.2

Service

Description

SmartData

  • SmartData is updated to 3.8.0. For more information, see SmartData 3.8.X overview.

  • Authentication and authorization based on Kerberos and Ranger can be used to manage permissions on data in OSS.

EMR V3.38.1

Service

Description

SmartData

SmartData is updated to 3.7.3. For more information, see SmartData 3.7.X overview.

Spark

  • Log4j Metrics Appender is removed because the configuration is invalid.

  • The null pointer exception that occurs when SparkContext is started is fixed.

Presto

  • The following issue is fixed: You must configure host parameters for the Presto service before you can use Presto in a high-availability Hadoop cluster to query data from a Hive table.

  • The following issue is fixed: Presto cannot be started by default when the memory is small.

  • The issue that modifications to worker-jvm do not take effect is fixed.

  • Ranger is supported.

Impala

The following issue is fixed: The no such method error message appears when you query data in DLF metadata tables.

Ranger

  • Presto is supported.

  • The exceptions that occur when you use Ranger to configure the permissions of using Spark to insert data into ORC or Parquet tables are fixed.

  • The following issue is fixed: Hive role permissions configured in Ranger do not take effect after Kerberos is enabled.

DLF-Auth

  • DLF-Auth is updated to 1.0.1.

  • The permissions of using Presto to access DLF can be configured.

  • The issue that data cannot be cached for RAM users is fixed.

EMR V3.38.0

Service

Description

SmartData

SmartData is updated to 3.7.2. For more information, see SmartData 3.7.X overview.

Spark

  • Spark is updated to 2.4.8.

  • Both of Spark 2.4.8 and Spark 3.1.2 are supported.

    Note

    Delta and Remote Shuffle Service are not supported in Spark 3.

  • In Spark 3.x, the Distinct computing performance is optimized for Spark SQL. The optimization feature is triggered if an aggregation operator contains multiple count(distinct case ... when ...) methods.

  • The array-index out of bounds error that is returned when some required statistics for Adaptive Query Execution (AQE) are missing is fixed.

  • Errors related to AQE and data caching in specific scenarios are fixed.

Hive

Hive is updated to 2.3.9.

Presto

  • An independent Presto cluster can be created.

  • Presto is updated to 358.

    Important

    This version does not support Ranger.

  • Connectors such as Hudi and MySQL connectors are supported by default, and the default configurations are updated

  • Auto scaling is supported by Presto clusters.

  • Data lake analysis is supported.

Delta Lake

  • The same Delta Lake connectors are used in Hive 2 and Hive 3.

  • The error that is returned when you use Delta Lake connectors to query data from multi-level partitioned tables is fixed.

Hudi

  • Hudi is updated to 0.9.0.

  • The issue about the compatibility of sql.extension between Delta Lake and Hudi is fixed.

HDFS

By default, the reserved space of NameNode adaptively increases. This way, NameNode enters the Safe mode in a timely manner when the disk space is insufficient.

Flink

  • Flink is updated to 1.13-vvr-4.0.10, which corresponds to Apache Flink 1.13.1.

  • Commercial connectors, such as a Hologres connector, are added to Flink.

  • Metric reporters are added to report metrics on the APM dashboard.

  • A SchemaRegistry-based Kafka catalog is added to the Kafka connector. This way, you can read data from or write data to existing Kafka topics without the need to execute DDL statements.

Storm

The service is no longer used.

Zeppelin

Zeppelin is updated to 0.10.0.

Ranger

If Presto 358 is used, Ranger cannot be used to configure permissions related to Presto.

Hue

  • The issue that YARN Job Browser sometimes cannot present or terminate jobs is fixed.

  • YARN Job Browser is accessible by default.

  • The Presto protocol is supported by default.

Druid

The following issue is fixed: After a server is unexpectedly shut down, the related node fails to restart because a PID file is not deleted.

ClickHouse

  • Some default configurations are updated.

  • Clusters can be scaled out.

  • The MetaChecker feature is supported.

  • OSS table engines and OSS table functions can be used to read data.

  • Table-level custom ZooKeeper addresses are supported.

Iceberg

The service is added. The supported versions range from 0.12.0 to 1.0.1.

Knox

The issue that the first access to the Spark UI fails is fixed.

DLF-Auth

The service is added.

The permissions of using Hive or Spark to access DLF can be configured. The service version is 1.0.0.

ESS

ESS is updated to 1.2.0.

Release version information

Hadoop clusters

Service

Version

HDFS

2.8.5

YARN

2.8.5

Hive

2.3.9

Spark

2.4.8

Knox

1.1.0

Tez

0.9.2

Ganglia

3.7.2

Sqoop

1.4.7

SmartData

EMR V3.38.0: 3.7.2

EMR V3.38.1: 3.7.3

EMR V3.38.2: 3.8.0

Bigboot

Iceberg

0.12.0

DLF-Auth

1.0.0

Hudi

0.9.0

Delta Lake

0.6.1

OpenLDAP

2.4.44

Hue

4.9.0

Spark

3.1.2

HBase

1.4.9

ZooKeeper

3.6.3

Presto

358

Impala

3.4.0

Zeppelin

0.10.0

Flume

1.9.0

Livy

0.7.1

Superset

0.36.0

Ranger

1.2.0

Phoenix

4.14.1

ESS

1.2.0

Alluxio

2.5.0

Kudu

1.14.0

Oozie

5.2.1

Druid clusters

Service

Version

HDFS

2.8.5

Druid

0.20.1

ZooKeeper

3.6.3

Knox

1.1.0

Ganglia

3.7.2

SmartData

3.7.2

Bigboot

OpenLDAP

2.4.44

YARN

2.8.5

Superset

0.36.0

Dataflow clusters

Cluster type

Service

Version

Flink

HDFS

2.8.5

YARN

2.8.5

ZooKeeper

3.6.3

Knox

1.1.0

Flink

EMR V3.38.0: 1.13-vvr-4.0.10

EMR V3.38.3: 1.13-vvr-4.0.11

SmartData

EMR V3.38.0: 3.7.2

EMR V3.38.3: 3.8.0

Bigboot

OpenLDAP

2.4.44

Kafka

ZooKeeper

3.6.3

Ganglia

3.7.2

Kafka

1.1.1

Kafka Manager

1.3.3.16

OpenLDAP

2.4.44

Knox

1.1.0

Ranger

1.2.0

ClickHouse clusters

Service

Version

ZooKeeper

3.6.3

Ganglia

3.7.2

ClickHouse

20.8.12.2

Presto clusters

Service

Version

Knox

1.1.0

Presto

358

Ganglia

3.7.2

SmartData

EMR V3.38.0: 3.7.2

EMR V3.38.1: 3.7.3

EMR V3.38.2: 3.8.0

Bigboot

Hudi

0.9.0

Delta Lake

0.6.1

OpenLDAP

2.4.44

Hue

4.9.0

Alluxio

2.5.0