This topic describes the release notes for E-MapReduce (EMR) V3.38.X, including the release dates, updates, and release version information.

Release dates

Version Date
EMR V3.38.3 December 2021
EMR V3.38.2 December 2021
EMR V3.38.1 November 2021
EMR V3.38.0 October 2021

Updates

  • EMR V3.38.3

    The Log4j security vulnerability of all related components is fixed. For more information, see Vulnerability announcement | RCE vulnerability in Apache Log4j 2.

    Service Description
    Presto
    • The error that is reported when Presto in a high availability cluster is used to query data in a Hudi table is fixed.
    • The Log4j security vulnerability of the Elasticsearch connector is fixed.
    DLF Metastore
    • By default, the logging feature for metastores is disabled. In earlier versions, the feature is enabled by default.
    • The error caused by an excessively long getTableStats URI for a metastore is fixed.
    Delta Lake The issue that schema changes fail to be synchronized to a metastore is fixed.
    Flink
    • VVR is updated to 4.0.11. This version supports the following features:
      • The commercial features of Flink Change Data Capture (CDC) are released:
        • Schema evolution is supported.
        • Flink SQL semantics can be used to synchronize an entire database at a time.
      • GeminiStateBackend can be used to store state data in Object Storage Service (OSS).
    • A Hudi connector of Enterprise Edition is provided, and a built-in Data Lake Formation (DLF) catalog is used in the connector to manage metadata.
    Sqoop The following issue is fixed: Precision loss for the DECIMAL data type occurs when you use Sqoop to import data to HCatalog tables.
  • EMR V3.38.2
    Service Description
    SmartData
    • SmartData is updated to 3.8.0. For more information, see SmartData 3.8.X overview.
    • Authentication and authorization based on Kerberos and Ranger can be used to manage permissions on data in OSS.
  • EMR V3.38.1
    Service Description
    SmartData SmartData is updated to 3.7.3. For more information, see SmartData 3.7.X overview.
    Spark
    • Log4j Metrics Appender is removed because the configuration is invalid.
    • The null pointer exception that occurs when SparkContext is started is fixed.
    Presto
    • The following issue is fixed: You must configure host parameters for the Presto service before you can use Presto in a high-availability Hadoop cluster to query data from a Hive table.
    • The following issue is fixed: Presto cannot be started by default when the memory is small.
    • The issue that modifications to worker-jvm do not take effect is fixed.
    • Ranger is supported.
    Impala The following issue is fixed: The no such method error message appears when you query data in DLF metadata tables.
    Ranger
    • Presto is supported.
    • The exceptions that occur when you use Ranger to configure the permissions of using Spark to insert data into ORC or Parquet tables are fixed.
    • The following issue is fixed: Hive role permissions configured in Ranger do not take effect after Kerberos is enabled.
    DLF-Auth
    • DLF-Auth is updated to 1.0.1.
    • The permissions of using Presto to access DLF can be configured.
    • The issue that data cannot be cached for RAM users is fixed.
  • EMR V3.38.0
    Service Description
    SmartData SmartData is updated to 3.7.2. For more information, see SmartData 3.7.X overview.
    Spark
    • Spark is updated to 2.4.8.
    • Both of Spark 2.4.8 and Spark 3.1.2 are supported.
      Note Delta and Remote Shuffle Service are not supported in Spark 3.
    • In Spark 3.x, the Distinct computing performance is optimized for Spark SQL. The optimization feature is triggered if an aggregation operator contains multiple count(distinct case ... when ...) methods.
    • The array-index out of bounds error that is returned when some required statistics for Adaptive Query Execution (AQE) are missing is fixed.
    • Errors related to AQE and data caching in specific scenarios are fixed.
    Hive Hive is updated to 2.3.9.
    Presto
    • An independent Presto cluster can be created.
    • Presto is updated to 358.
      Notice This version does not support Ranger.
    • Connectors such as Hudi and MySQL connectors are supported by default, and the default configurations are updated.
    • Auto scaling is supported by Presto clusters.
      Note To use the auto scaling feature, .
    • Data lake analysis is supported.
    Delta Lake
    • The same Delta Lake connectors are used in Hive 2 and Hive 3.
    • The error that is returned when you use Delta Lake connectors to query data from multi-level partitioned tables is fixed.
    Hudi
    • Hudi is updated to 0.9.0.
    • The issue about the compatibility of sql.extension between Delta Lake and Hudi is fixed.
    HDFS By default, the reserved space of NameNode adaptively increases. This way, NameNode enters the Safe mode in a timely manner when the disk space is insufficient.
    Flink
    • Flink is updated to 1.13-vvr-4.0.10, which corresponds to Apache Flink 1.13.1.
    • Commercial connectors, such as a Hologres connector, are added to Flink.
    • Metric reporters are added to report metrics on the APM dashboard.
    • A SchemaRegistry-based Kafka catalog is added to the Kafka connector. This way, you can read data from or write data to existing Kafka topics without the need to execute DDL statements.
    Storm The service is no longer used.
    Zeppelin Zeppelin is updated to 0.10.0.
    Ranger If Presto 358 is used, Ranger cannot be used to configure permissions related to Presto.
    Hue
    • The issue that YARN Job Browser sometimes cannot present or terminate jobs is fixed.
    • YARN Job Browser is accessible by default.
    • The Presto protocol is supported by default.
    Druid The following issue is fixed: After a server is unexpectedly shut down, the related node fails to restart because a PID file is not deleted.
    ClickHouse
    • Some default configurations are updated.
    • Clusters can be scaled out.
    • The MetaChecker feature is supported.
    • OSS table engines and OSS table functions can be used to read data.
    • Table-level custom ZooKeeper addresses are supported.
    Iceberg The service is added.

    The supported versions range from 0.12.0 to 1.0.1.

    Knox The issue that the first access to the Spark UI fails is fixed.
    DLF-Auth The service is added.

    The permissions of using Hive or Spark to access DLF can be configured. The supported version is 1.0.0.

    ESS ESS is updated to 1.2.0.

Release version information

Service Version
HDFS 2.8.5
YARN 2.8.5
Hive 2.3.9
Spark 2.4.8
Knox 1.1.0
Tez 0.9.2
Ganglia 3.7.2
Sqoop 1.4.7
SmartData
EMR V3.38.0: 3.7.2
EMR V3.38.1: 3.7.3
EMR V3.38.2: 3.8.0
Bigboot
Iceberg 0.12.0
DLF-Auth 1.0.0
Hudi 0.9.0
Delta Lake 0.6.1
OpenLDAP 2.4.44
Hue 4.9.0
Spark 3.1.2
HBase 1.4.9
ZooKeeper 3.6.3
Presto 358
Impala 3.4.0
Zeppelin 0.10.0
Flume 1.9.0
Livy 0.7.1
Superset 0.36.0
Ranger 1.2.0
Phoenix 4.14.1
ESS 1.2.0
Alluxio 2.5.0
Kudu 1.14.0
Oozie 5.2.1
Service Version
HDFS 2.8.5
Druid 0.20.1
ZooKeeper 3.6.3
Knox 1.1.0
Ganglia 3.7.2
SmartData 3.7.2
Bigboot
OpenLDAP 2.4.44
YARN 2.8.5
Superset 0.36.0
Cluster mode
Component
Version
Flink HDFS 2.8.5
YARN 2.8.5
ZooKeeper 3.6.3
Knox 1.1.0
Flink
EMR V3.38.0: 1.13-vvr-4.0.10
EMR V3.38.3: 1.13-vvr-4.0.11
SmartData
EMR V3.38.0: 3.7.2
EMR V3.38.3: 3.8.0
Bigboot
OpenLDAP 2.4.44
Kafka ZooKeeper 3.6.3
Ganglia 3.7.2
Kafka 1.1.1
Kafka-Manager 1.3.3.16
OpenLDAP 2.4.44
Knox 1.1.0
Ranger 1.2.0
Service Version
ZooKeeper 3.6.3
Ganglia 3.7.2
ClickHouse 20.8.12.2
Component
Version
Knox 1.1.0
Presto 358
Ganglia 3.7.2
SmartData
EMR V3.38.0: 3.7.2
EMR V3.38.1: 3.7.3
EMR V3.38.2: 3.8.0
Bigboot
Hudi 0.9.0
Delta Lake 0.6.1
OpenLDAP 2.4.44
Hue 4.9.0
Alluxio 2.5.0