This topic describes the release notes for E-MapReduce (EMR) and provides links to the relevant references.
For more information about the release notes, see Overview.
2023
July 2023
Feature | Description | Release date | References |
Auto scaling management | EMR provides a dedicated management module that allows you to manage the auto scaling feature in an efficient manner. You can use the module to manage auto scaling rules and view the elastic resource usage and cost allocation of your cluster. This way, you can evaluate the cost savings brought by auto scaling and optimize the resource utilization of your cluster. | 2023-07-12 | |
Automatic supplementation | The automatic supplementation feature of EMR is optimized. The feature can replace abnormal nodes in a cluster. Information prompt and event notification capabilities are provided to help you learn how automatic supplementation is performed. Note From 18:00 (UTC+8) on July 10, 2023, Automatic Compensation is turned on for new pay-as-you-go task node groups by default. | 2023-07-12 | |
Service configuration | Service configuration is optimized. The To Be Delivered prompt and Not Effective Yet prompt are added. This provides guidance on operations that users can perform after configurations are modified to ensure that configuration modifications take effect. | 2023-07-12 | |
Data Science clusters on top of ACK | You can deploy Data Science clusters on top of Container Service for Kubernetes (ACK). This way, you can use the benefits brought by ACK in service deployment and container application management to reduce O&M costs of underlying cluster resources. This allows you to focus on big data and AI tasks and reduces overall O&M costs. Data Science clusters support the CPU+GPU heterogeneous computing framework and a variety of model training frameworks, such as PyTorch and DeepSpeed. This helps meet your requirements for high computing performance. | 2023-07-12 | |
Stateless clusters | Stateless clusters are supported. EMR provides a default data lake architecture, which does not depend on Hadoop Distributed File System (HDFS). If you do not need to use services that depend on core nodes, you can remove the core node group to build a completely stateless cluster. This helps further reduce the O&M costs of your cluster. | 2023-07-12 | |
Per-second billing for pay-as-you-go resources | Per-second billing is supported for pay-as-you-go resources. The finer billing granularity helps effectively reduce resource costs. | 2023-07-12 |
June 2023
Feature | Description | Release date | References |
Version update |
| 2023-06-01 | |
Paimon | Paimon is added. Apache Paimon is a data lake platform that allows you to process data in streaming and batch modes. Apache Paimon supports high-throughput data writing and low-latency data queries. | 2023-06-01 | |
Presto | Presto is added. Presto (namely PrestoDB) is a flexible and scalable distributed SQL query engine. | 2023-06-07 |
May 2023
Feature | Description | Release date | References |
Running of Spark jobs on ARM-based nodes | Spark jobs in Spark clusters that are created on the EMR on ACK page can be run on elastic container instances that use the ARM architecture. | 2023-05-05 |
April 2023
Feature | Description | Release date | References |
Version update |
| 2023-04-03 | |
New capability in data lakehouse scenarios | Hologres and MaxCompute tables can be accessed by using the Spark and Trino compute engines. | 2023-04-03 | New capability in data lakehouse scenarios: EMR supports Hologres and MaxCompute data sources |
Access to Hologres by using Spark | Spark can be used to read data from Hologres tables. | 2023-04-03 | |
Node configuration upgrade | The ECS instance configurations of a node group can be upgraded. | 2023-04-03 | |
Management of YARN partitions in the EMR console | EMR allows you to manage YARN partitions in the console in a visualized manner. You can establish mappings between multiple node groups and partitions at a time. | 2023-04-13 |
March 2023
Feature | Description | Release date | References |
Flink Table Store | Flink Table Store is added. Flink Table Store is a unified data lake storage that allows you to process data in streaming and batch modes. You can use Flink Table Store to write data at high throughput and query data at low latency. | 2023-03-03 | |
Export and import of service configurations | Service configurations can be exported in the XML or JSON format. This way, you can back up, migrate, or restore the service configurations of an EMR cluster. | 2023-03-02 |
February 2023
Feature | Description | Release date | References |
Version update |
| 2023-02-28 |
2022
December 2022
Feature | Description | Release date | References |
Version update |
| 2022-12-01 | |
Node label of YARN | The node label feature of YARN is supported. This feature allows you to manage nodes on which NodeManagers are deployed in a cluster based on different partitions. | 2022-12-14 |
November 2022
Feature | Description | Release date | References |
Version update |
| 2022-11-08 | |
Log management | The log management feature is supported. This feature allows you to query the logs that are generated for the open source components in the EMR console. | 2022-11-29 |
October 2022
Feature | Description | Release date | References |
Version update |
| 2022-10-14 | |
HBase Shell | HBase Shell can be used to connect to HBase that is deployed in an EMR cluster. | 2022-10-21 | |
DataServing cluster | DataServing clusters based on Apache HBase are provided. | 2022-10-28 |
September 2022
Feature | Description | Release date | References |
Automatic supplementation | Automatic supplementation is supported. After you enable this feature for an EMR cluster, the abnormal ECS instances in the EMR cluster can be automatically replaced when EMR identifies that the ECS instances cannot run the engine services as expected. | 2022-09-07 | |
Cluster cloning | The cluster cloning feature provided by EMR can be used to create a cluster based on an existing cluster. | 2022-09-09 |
August 2022
Feature | Description | Release date | References |
Version update |
| 2022-08-05 | |
Deployment set | Deployment sets provided by Alibaba Cloud ECS can be used to manage the distribution of ECS instances. Deployment sets can help improve the disaster recovery capability and availability of ECS instances. | 2022-08-05 | |
Gateway deployment by using EMR-CLI | The EMR-CLI tool provided by EMR can be used to deploy a gateway on an ECS instance. | 2022-08-05 |
July 2022
Feature | Description | Release date | References |
EMR Doctor | EMR Doctor is provided. It is an intelligent O&M system developed by the Alibaba Cloud EMR team for open source big data clusters. | 2022-07-25 | |
Elastic scheduling of Flink jobs by using Elastic Container Instance | Flink jobs can be elastically scheduled by using Elastic Container Instance. This way, you can create pods without being limited by the computing capabilities of ACK clusters. This helps reduce computing costs. You can refer to the topic in the References column to use Elastic Container Instance to elastically schedule Flink jobs of an EMR cluster that is created on the EMR on ACK page. | 2022-07-18 | Use Elastic Container Instance to elastically schedule Flink jobs |
June 2022
Feature | Description | Release date | References |
DataLake cluster | DataLake clusters are supported. A DataLake cluster is a big data computing cluster that allows you to analyze data in a flexible, reliable, and efficient manner. You can create a DataLake cluster only in the new EMR console. | 2022-06-01 | |
Association of a Spark cluster with a Shuffle Service cluster | Remote Shuffle Service (RSS) is an extension provided by Alibaba Cloud EMR to improve the stability and performance of Spark Shuffle. You can associate a Spark cluster that is created on the EMR on ACK page with a Shuffle Service cluster. | 2022-06-09 |
May 2022
Feature | Description | Release date | References |
Memory management | Memory resources can be managed. The topic in the References column describes the memory usage categories and memory configuration parameters that are related to a backend (BE) in StarRocks. The topic also describes how to view memory usage. | 2022-05-10 |