This topic describes the release notes for E-MapReduce (EMR) V4.10.X, including the release date and updates.

Release date

March 23, 2022 for EMR V4.10.0


Service Description
SmartData The service is no longer used.
  • The architecture of SmartData is upgraded to JindoData.
  • EMR is integrated with JindoSDK for JindoData 4.0.0 for the first time. JindoData connects to Alibaba Cloud Object Storage Service (OSS) and the Alibaba Cloud OSS-HDFS service.
  • Spark is updated to 2.4.8.
  • The issue that adaptive execution does not take effect in some scenarios is fixed.
  • The issue that statistical aggregate functions are used in different manners in Spark and Hive is fixed.
  • The issue that Spark cannot read valid data of the CHAR type from a Hive ORC table is fixed.
  • The default configurations of Thrift Server are optimized.
  • In the EMR console, the parameter names on the spark-defaults tab of the Configure tab for the Spark service are optimized.
  • Hive on Spark is optimized.
  • The array-index out of bounds error that is returned when some required statistics for Adaptive Query Execution (AQE) are missing is fixed.
  • Errors related to AQE and data caching in specific scenarios are fixed.
  • Log4j Metrics Appender is removed because the configuration is invalid.
  • The null pointer exception that occurs when SparkContext is started is fixed.
  • The data compression algorithm Zstandard is supported.
  • The issue about HiveServer2 memory leaks caused by user-defined functions (UDFs) is fixed.
  • The issue that the output of the Show Create Table command based on Data Lake Formation (DLF) metadata is inaccurate is fixed.
  • The default parameters of Hive are optimized to improve the performance of Hive jobs.
  • In the EMR console, the parameter names on the hive-env tab of the Configure tab for the Hive service are changed to uppercase. This facilitates the use of the parameters.
  • The error message that is reported because of the incompatibility between the file system and Hive metastore when you write data to a Hive table is optimized.
  • In JindoFS in block storage mode, the metadata of multiple Hive tables can be optimized at the same time. By default, this feature is disabled.
  • The warning error contained in logs about starting Spark in Ranger is fixed.
  • The issue that user information fails to be automatically synchronized after Ranger is connected to a Lightweight Directory Access Protocol (LDAP) server is fixed.
  • The data compression algorithm Zstandard is supported.
  • By default, the reserved space of NameNode adaptively increases. This way, NameNode enters the Safe mode at the earliest opportunity when the disk space is insufficient.
  • Information about app IDs, CPU utilization, and memory usage is added to the RESTful APIs of containers for nodes.
  • The issue that the Application Master (AM) logs of an automatically released node cannot be viewed is fixed.
  • The issue that a cluster cannot be accessed due to historical state store data is fixed.
  • The data of a node that is automatically released based on the decommissioning logic of auto scaling can be deleted.
  • The Graceful Decommission logic of auto scaling is optimized. The node on which NodeManager runs is marked deprecated only after the NodeManager process is complete.
  • Knox is adapted to Kudu.
  • Knox is adapted to HBase.
  • The issue that the first access to the Spark UI fails is fixed.
Tez The default parameters of Tez are optimized to improve the performance of Tez jobs.
Sqoop The issue that precision loss for the DECIMAL data type occurs when you use Sqoop to import data to HCatalog tables is fixed.
Delta Lake
  • Metadata management
    • The built-in catalog of Spark, instead of an API operation that is called by using the Hive CLI, is used to synchronize metadata and partition information.
    • The statistics on table data are automatically reported to metastores.
  • SQL
    • The syntax of the time travel feature is supported.
    • The DROP PARTITION SQL syntax is supported.
    • The ADD COLUMN statement can be used to add columns to specified locations (FIRST and AFTER).
  • Enhanced table management capabilities
    • The file size can be dynamically adjusted based on the table size. By default, this feature is enabled.
    • The auto-vacuum feature is supported and enabled by default. Concurrent vacuum operations are supported.
    • The logic of automatic compaction is optimized. By default, the automatic compaction feature is disabled.
    • The Z-ordering syntax is added. Z-ordering-based data processing is accelerated.
  • Hudi is updated to 0.10.0.
  • The issue about the compatibility of sql.extension between Delta Lake and Hudi is fixed.
Iceberg The service is added.

The supported version is 0.13.0.

  • The issue that garbled characters are displayed when Hue is used to query historical records is fixed.
  • The UI display exception that occurs when you use Hue together with Oozie is fixed.
  • The issue that YARN Job Browser sometimes cannot present or terminate jobs is fixed.
  • YARN Job Browser is accessible by default.
  • The Presto protocol is supported by default.
DLF-Auth The service is added.

The supported version is 1.0.4.

  • The time required to restart HBase in a high-security cluster is reduced.
  • The issue that Spark 3.1.1 cannot be integrated with HBase is fixed.
  • The Graceful Stop process is optimized.
ZooKeeper ZooKeeper is updated to 3.6.3.
  • Presto is updated to 358.
  • User-defined functions (UDFs) can be dynamically loaded. For more information, see Dynamically load UDFs.
  • Data lake analysis is supported.
  • The issue that the LIST operation is repeatedly performed on directly deleted OSS partition directories is fixed.
  • The following issue is fixed: The no such method error message appears when you query data in DLF metadata tables.
Zeppelin Zeppelin is updated to 0.10.0.
Oozie The issue that Jetty Server of Oozie fails to start due to JAR package conflicts in high availability (HA) scenarios is fixed.

Release version information

Service Version
HDFS 3.2.1
YARN 3.2.1
Hive 3.1.2
Spark 2.4.8
Knox 1.1.0
Tez 0.9.2
Ganglia 3.7.2
Sqoop 1.4.7
DLF-Auth 1.0.4
Iceberg 0.13.0
Hudi 0.10.0
Delta Lake 0.6.1
OpenLDAP 2.4.44
Hue 4.9.0
JindoSDK 4.0.0
HBase 2.3.4
ZooKeeper 3.6.3
Presto 358
Impala 3.4.0
Zeppelin 0.10.2
Flume 1.9.0
Livy 0.7.1
Superset 0.36.0
Ranger 2.1.0
RSS 1.0.0
Alluxio 2.5.0
Kudu 1.14.0
Oozie 5.2.1