All Products
Search
Document Center

E-MapReduce:Release notes

Last Updated:Sep 11, 2024

This topic describes the release notes for E-MapReduce (EMR) and provides links to the relevant references.

For more information about the release notes, see Overview.

2024

August 2024

Feature

Description

Release date

References

Support of the monitoring and diagnostics feature

The monitoring and diagnostics feature is used for intelligent O&M of clusters. The feature is built based on a large model and incorporates the knowledge and experience of the Alibaba Cloud EMR team in the open source big data field, EMR observability, and the diagnostic experience of technical experts. The monitoring and diagnostics feature enhances the observability of EMR. The feature provides real-time health diagnostics for you to identify issues of abnormal clusters and troubleshoot the issues based on suggestions in the diagnostic result. This helps reduce O&M costs. The monitoring and diagnostics feature also provides global optimization suggestions in daily cluster reports to help you improve the running efficiency of clusters.

2024-08-20

Initiate health diagnostics

Optimization of the cluster cloning capability

The cluster cloning feature is optimized to allow modified service configurations, added node groups, and configured auto scaling rules during cluster creation or cluster use to be cloned to a new cluster. This helps you quickly create a cluster with the same configurations as an existing cluster.

2024-08-20

Clone a cluster

Association of more security groups with a node group

A maximum of four security groups can be associated with a node group. This helps you implement access control on ECS instances in a cluster in a flexible manner.

2024-08-20

Manage node groups

June 2024

Feature

Description

Release date

References

Support for enabling of the auto-renewal feature during scale-out

If you turn on Auto-renewal when you scale out an EMR cluster, nodes that are added are automatically renewed. This reduces asynchronous operations. You can modify the renewal duration or disable the auto-renewal feature on the Auto-renewal page.

2024-06-19

Scale out an EMR cluster

Switchover of the billing method from pay-as-you-go to subscription at the node group level

The billing method of core, task, or gateway node groups in a subscription cluster can be changed from pay-as-you-go to subscription. This helps you manage the billing method of resources in a flexible manner.

2024-06-19

Switch from pay-as-you-go to subscription

Creation and custom deployment of Master-Extend node groups

Master-Extend node groups can be created for an EMR cluster. You can deploy components of Spark, Hive, and Kyuubi in a Master-Extend node group based on your business requirements. The system automatically synchronizes the configurations of related components to nodes that require the components. This helps reduce the load on the master node group of an EMR cluster.

2024-06-19

Manage node groups

March 2024

Feature

Description

Release date

References

Creation and management of OSS-HDFS buckets in the EMR console

OSS-HDFS buckets can be created when you create a cluster in the EMR console. You can view the storage overview and object list of the buckets on the Services tab of the cluster in the EMR console. You no longer need to perform these operations in the Object Storage Service (OSS) console. This simplifies the process of using buckets and prevents misoperations that may cause the Hadoop Distributed File System (HDFS) service to become unavailable.

2024-03-14

Create a cluster

Creation of gateway node groups

Gateway nodes are provided to reduce the load on the master node. They can serve as task submission machines. This way, you can submit tasks on gateway nodes with simple operations. Gateway nodes also help implement automatic synchronization of configurations that are related to clusters and task submission environments. This helps you deploy and configure a task submission environment with ease.

2024-03-14

Manage node groups

Management of health check items

The feature of managing health check items is supported. EMR checks the health status of nodes and services of EMR clusters based on the preset health check items. This helps you handle exceptions and risks at the earliest opportunity. You can use the feature to view the check content of nodes and services of a cluster and modify check items.

2024-03-14

Manage health check items

Diversification of health check items for services and components

The health check items of YARN, HDFS, Hive, Kafka, and ZooKeeper are diversified to improve the check accuracy on the health status of services and components.

2024-03-14

View the health status of services and components

2023

October 2023

Feature

Description

Release date

References

Recommendation of auto scaling rules

The feature of recommending auto scaling rules is optimized. You can view the overview information about cluster resources on the Auto Scaling tab in the EMR console. The auto scaling feature helps you analyze the resource utilization of clusters and provides recommended auto scaling rules for the clusters that meet specific conditions. You can enable auto scaling based on the overview information to improve the elasticity of cluster resources.

2023-10-24

View the overview information about cluster resources

Alert rule management

The alert rule management feature is provided. This feature is implemented based on CloudMonitor. You can create and view alert rules for clusters in the EMR console. If resource metrics meet specific alert conditions, alerts are triggered and CloudMonitor sends alert notifications. This way, you can identify and handle the exceptions of monitored clusters at the earliest opportunity.

2023-10-24

Manage alert rules

Display of node health status

Node health status is displayed for you to check whether a node is run as expected. You can view the health status of nodes on the Nodes tab and identify abnormal nodes at the earliest opportunity.

2023-10-24

View the health status of nodes

Configuration of disk performance levels (PLs)

PLs can be configured for disks. When you create a cluster or add a node group, you can specify different PLs for enhanced SSDs (ESSDs) to meet different cluster performance requirements.

2023-10-24

Create a cluster

August 2023

Feature

Description

Release date

References

Cluster template

The cluster template feature is a persistent EMR instance configuration feature that can be used to create an EMR cluster with a few clicks.

2023-08-29

Viewing of overview information about cluster resources

You can view the overview information about cluster resources on the Auto Scaling tab in the EMR console. The auto scaling feature helps you analyze the resource utilization of clusters and provides auto scaling rules for the clusters that meet specific conditions. You can enable auto scaling based on the overview information to improve the elasticity of cluster resources.

2023-08-29

View the overview information about cluster resources

Viewing of configuration items

If a configuration item at the node group or node level is modified, the settings of the configuration item at the node group or node level are displayed on the Configure tab, with Node Group Configuration or Independent Node Configuration selected from the Default Cluster Configuration drop-down list.

2023-08-29

Manage configuration items

July 2023

Feature

Description

Release date

References

Auto scaling management

EMR provides a dedicated management module that allows you to manage the auto scaling feature in an efficient manner. You can use the module to manage auto scaling rules and view the elastic resource usage and cost allocation of your cluster. This way, you can evaluate the cost savings brought by auto scaling and optimize the resource utilization of your cluster.

2023-07-12

Automatic supplementation

The automatic supplementation feature of EMR is optimized. The feature can replace abnormal nodes in a cluster.

Information prompt and event notification capabilities are provided to help you learn how automatic supplementation is performed.

Note

From 18:00 (UTC+8) on July 10, 2023, Automatic Compensation is turned on for new pay-as-you-go task node groups by default.

2023-07-12

Manage automatic supplementation

Service configuration

Service configuration is optimized. The To Be Delivered prompt and Not Effective Yet prompt are added. This provides guidance on operations that users can perform after configurations are modified to ensure that configuration modifications take effect.

2023-07-12

Manage configuration items

Stateless clusters

Stateless clusters are supported. EMR provides a default data lake architecture, which does not depend on Hadoop Distributed File System (HDFS). If you do not need to use services that depend on core nodes, you can remove the core node group to build a completely stateless cluster. This helps further reduce the O&M costs of your cluster.

2023-07-12

Create a cluster

Association of YARN partitions with queues

You can associate YARN partitions with queues and allocate capacity in the EMR console, without the need to configure complex settings.

2023-07-12

Modify resource queues

Per-second billing for pay-as-you-go resources

Per-second billing is supported for pay-as-you-go resources. The finer billing granularity helps effectively reduce resource costs.

2023-07-12

June 2023

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.12.0 is released.

  • EMR V3.X series: EMR V3.46.0 is released.

2023-06-01

Paimon

Paimon is added. Apache Paimon is a data lake platform that allows you to process data in streaming and batch modes. Apache Paimon supports high-throughput data writing and low-latency data queries.

2023-06-01

Presto

Presto is added. Presto (namely PrestoDB) is a flexible and scalable distributed SQL query engine.

2023-06-07

April 2023

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.11.1 is released.

  • EMR V3.X series: EMR V3.45.1 is released.

2023-04-03

New capability in data lakehouse scenarios

Hologres and MaxCompute tables can be accessed by using the Spark and Trino compute engines.

2023-04-03

New capability in data lakehouse scenarios: EMR supports Hologres and MaxCompute data sources

Access to Hologres by using Spark

Spark can be used to read data from Hologres tables.

2023-04-03

Use Spark to access Hologres

Node configuration upgrade

The ECS instance configurations of a node group can be upgraded.

2023-04-03

Upgrade node configurations

Management of YARN partitions in the EMR console

EMR allows you to manage YARN partitions in the console in a visualized manner. You can establish mappings between multiple node groups and partitions at a time.

2023-04-13

Manage YARN partitions in the EMR console

March 2023

Feature

Description

Release date

References

Flink Table Store

Flink Table Store is added. Flink Table Store is a unified data lake storage that allows you to process data in streaming and batch modes. You can use Flink Table Store to write data at high throughput and query data at low latency.

2023-03-03

Export and import of service configurations

Service configurations can be exported in the XML or JSON format. This way, you can back up, migrate, or restore the service configurations of an EMR cluster.

2023-03-02

Export and import service configurations

February 2023

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.11.0 is released.

  • EMR V3.X series: EMR V3.45.0 is released.

2023-02-28

Release notes for 2022

December 2022

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.10.0 is released.

  • EMR V3.X series: EMR V3.44.0 is released.

2022-12-01

Node label of YARN

The node label feature of YARN is supported. This feature allows you to manage nodes on which NodeManagers are deployed in a cluster based on different partitions.

2022-12-14

Node labels

November 2022

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.9.1 is released.

  • EMR V3.X series: EMR V3.43.1 is released.

2022-11-08

Log management

The log management feature is supported. This feature allows you to query the logs that are generated for the open source components in the EMR console.

2022-11-29

Manage logs

October 2022

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.9.0 is released.

  • EMR V3.X series: EMR V3.43.0 is released.

2022-10-14

HBase Shell

HBase Shell can be used to connect to HBase that is deployed in an EMR cluster.

2022-10-21

Use HBase Shell

DataServing cluster

DataServing clusters based on Apache HBase are provided.

2022-10-28

DataServing cluster

September 2022

Feature

Description

Release date

References

Automatic supplementation

Automatic supplementation is supported. After you enable this feature for an EMR cluster, the abnormal ECS instances in the EMR cluster can be automatically replaced when EMR identifies that the ECS instances cannot run the engine services as expected.

2022-09-07

Manage automatic supplementation

Cluster cloning

The cluster cloning feature provided by EMR can be used to create a cluster based on an existing cluster.

2022-09-09

Clone a cluster

August 2022

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.8.0 is released.

  • EMR V3.X series: EMR V3.42.0 is released.

2022-08-05

Deployment set

Deployment sets provided by Alibaba Cloud ECS can be used to manage the distribution of ECS instances. Deployment sets can help improve the disaster recovery capability and availability of ECS instances.

2022-08-05

Add nodes to the deployment set

Gateway deployment by using EMR-CLI

The EMR-CLI tool provided by EMR can be used to deploy a gateway on an ECS instance.

2022-08-05

Use EMR-CLI to deploy a gateway

July 2022

Feature

Description

Release date

References

EMR Doctor

EMR Doctor is provided. It is an intelligent O&M system developed by the Alibaba Cloud EMR team for open source big data clusters.

2022-07-25

Overview

June 2022

Feature

Description

Release date

References

DataLake cluster

DataLake clusters are supported. A DataLake cluster is a big data computing cluster that allows you to analyze data in a flexible, reliable, and efficient manner. You can create a DataLake cluster only in the new EMR console.

2022-06-01

DataLake cluster

Association of a Spark cluster with a Shuffle Service cluster

Remote Shuffle Service (RSS) is an extension provided by Alibaba Cloud EMR to improve the stability and performance of Spark Shuffle. You can associate a Spark cluster that is created on the EMR on ACK page with a Shuffle Service cluster.

2022-06-09

Associate a Spark cluster with a Shuffle Service cluster

May 2022

Feature

Description

Release date

References

Memory management

Memory resources can be managed. The topic in the References column describes the memory usage categories and memory configuration parameters that are related to a backend (BE) in StarRocks. The topic also describes how to view memory usage.

2022-05-10

Manage memory resources