All Products
Search
Document Center

E-MapReduce:Release notes

Last Updated:Sep 04, 2023

This topic describes the release notes for E-MapReduce (EMR) and provides links to the relevant references.

For more information about the release notes, see Overview.

2023

July 2023

Feature

Description

Release date

References

Auto scaling management

EMR provides a dedicated management module that allows you to manage the auto scaling feature in an efficient manner. You can use the module to manage auto scaling rules and view the elastic resource usage and cost allocation of your cluster. This way, you can evaluate the cost savings brought by auto scaling and optimize the resource utilization of your cluster.

2023-07-12

Automatic supplementation

The automatic supplementation feature of EMR is optimized. The feature can replace abnormal nodes in a cluster.

Information prompt and event notification capabilities are provided to help you learn how automatic supplementation is performed.

Note

From 18:00 (UTC+8) on July 10, 2023, Automatic Compensation is turned on for new pay-as-you-go task node groups by default.

2023-07-12

Manage automatic supplementation

Service configuration

Service configuration is optimized. The To Be Delivered prompt and Not Effective Yet prompt are added. This provides guidance on operations that users can perform after configurations are modified to ensure that configuration modifications take effect.

2023-07-12

Manage configuration items

Data Science clusters on top of ACK

You can deploy Data Science clusters on top of Container Service for Kubernetes (ACK). This way, you can use the benefits brought by ACK in service deployment and container application management to reduce O&M costs of underlying cluster resources. This allows you to focus on big data and AI tasks and reduces overall O&M costs. Data Science clusters support the CPU+GPU heterogeneous computing framework and a variety of model training frameworks, such as PyTorch and DeepSpeed. This helps meet your requirements for high computing performance.

2023-07-12

Create a Data Science cluster

Stateless clusters

Stateless clusters are supported. EMR provides a default data lake architecture, which does not depend on Hadoop Distributed File System (HDFS). If you do not need to use services that depend on core nodes, you can remove the core node group to build a completely stateless cluster. This helps further reduce the O&M costs of your cluster.

2023-07-12

Create a cluster

Per-second billing for pay-as-you-go resources

Per-second billing is supported for pay-as-you-go resources. The finer billing granularity helps effectively reduce resource costs.

2023-07-12

June 2023

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.12.0 is released.

  • EMR V3.X series: EMR V3.46.0 is released.

2023-06-01

Paimon

Paimon is added. Apache Paimon is a data lake platform that allows you to process data in streaming and batch modes. Apache Paimon supports high-throughput data writing and low-latency data queries.

2023-06-01

Presto

Presto is added. Presto (namely PrestoDB) is a flexible and scalable distributed SQL query engine.

2023-06-07

May 2023

Feature

Description

Release date

References

Running of Spark jobs on ARM-based nodes

Spark jobs in Spark clusters that are created on the EMR on ACK page can be run on elastic container instances that use the ARM architecture.

2023-05-05

Run Spark jobs on ARM-based nodes

April 2023

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.11.1 is released.

  • EMR V3.X series: EMR V3.45.1 is released.

2023-04-03

New capability in data lakehouse scenarios

Hologres and MaxCompute tables can be accessed by using the Spark and Trino compute engines.

2023-04-03

New capability in data lakehouse scenarios: EMR supports Hologres and MaxCompute data sources

Access to Hologres by using Spark

Spark can be used to read data from Hologres tables.

2023-04-03

Use Spark to access Hologres

Node configuration upgrade

The ECS instance configurations of a node group can be upgraded.

2023-04-03

Upgrade node configurations

Management of YARN partitions in the EMR console

EMR allows you to manage YARN partitions in the console in a visualized manner. You can establish mappings between multiple node groups and partitions at a time.

2023-04-13

Manage YARN partitions in the EMR console

March 2023

Feature

Description

Release date

References

Flink Table Store

Flink Table Store is added. Flink Table Store is a unified data lake storage that allows you to process data in streaming and batch modes. You can use Flink Table Store to write data at high throughput and query data at low latency.

2023-03-03

Export and import of service configurations

Service configurations can be exported in the XML or JSON format. This way, you can back up, migrate, or restore the service configurations of an EMR cluster.

2023-03-02

Export and import service configurations

February 2023

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.11.0 is released.

  • EMR V3.X series: EMR V3.45.0 is released.

2023-02-28

2022

December 2022

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.10.0 is released.

  • EMR V3.X series: EMR V3.44.0 is released.

2022-12-01

Node label of YARN

The node label feature of YARN is supported. This feature allows you to manage nodes on which NodeManagers are deployed in a cluster based on different partitions.

2022-12-14

Node labels

November 2022

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.9.1 is released.

  • EMR V3.X series: EMR V3.43.1 is released.

2022-11-08

Log management

The log management feature is supported. This feature allows you to query the logs that are generated for the open source components in the EMR console.

2022-11-29

Manage logs

October 2022

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.9.0 is released.

  • EMR V3.X series: EMR V3.43.0 is released.

2022-10-14

HBase Shell

HBase Shell can be used to connect to HBase that is deployed in an EMR cluster.

2022-10-21

Use HBase Shell

DataServing cluster

DataServing clusters based on Apache HBase are provided.

2022-10-28

DataServing cluster

September 2022

Feature

Description

Release date

References

Automatic supplementation

Automatic supplementation is supported. After you enable this feature for an EMR cluster, the abnormal ECS instances in the EMR cluster can be automatically replaced when EMR identifies that the ECS instances cannot run the engine services as expected.

2022-09-07

Manage automatic supplementation

Cluster cloning

The cluster cloning feature provided by EMR can be used to create a cluster based on an existing cluster.

2022-09-09

Clone a cluster

August 2022

Feature

Description

Release date

References

Version update

  • EMR V5.X series: EMR V5.8.0 is released.

  • EMR V3.X series: EMR V3.42.0 is released.

2022-08-05

Deployment set

Deployment sets provided by Alibaba Cloud ECS can be used to manage the distribution of ECS instances. Deployment sets can help improve the disaster recovery capability and availability of ECS instances.

2022-08-05

Add nodes to the deployment set

Gateway deployment by using EMR-CLI

The EMR-CLI tool provided by EMR can be used to deploy a gateway on an ECS instance.

2022-08-05

Use EMR-CLI to deploy a gateway

July 2022

Feature

Description

Release date

References

EMR Doctor

EMR Doctor is provided. It is an intelligent O&M system developed by the Alibaba Cloud EMR team for open source big data clusters.

2022-07-25

Overview

Elastic scheduling of Flink jobs by using Elastic Container Instance

Flink jobs can be elastically scheduled by using Elastic Container Instance. This way, you can create pods without being limited by the computing capabilities of ACK clusters. This helps reduce computing costs. You can refer to the topic in the References column to use Elastic Container Instance to elastically schedule Flink jobs of an EMR cluster that is created on the EMR on ACK page.

2022-07-18

Use Elastic Container Instance to elastically schedule Flink jobs

June 2022

Feature

Description

Release date

References

DataLake cluster

DataLake clusters are supported. A DataLake cluster is a big data computing cluster that allows you to analyze data in a flexible, reliable, and efficient manner. You can create a DataLake cluster only in the new EMR console.

2022-06-01

DataLake cluster

Association of a Spark cluster with a Shuffle Service cluster

Remote Shuffle Service (RSS) is an extension provided by Alibaba Cloud EMR to improve the stability and performance of Spark Shuffle. You can associate a Spark cluster that is created on the EMR on ACK page with a Shuffle Service cluster.

2022-06-09

Associate a Spark cluster with a Shuffle Service cluster

May 2022

Feature

Description

Release date

References

Memory management

Memory resources can be managed. The topic in the References column describes the memory usage categories and memory configuration parameters that are related to a backend (BE) in StarRocks. The topic also describes how to view memory usage.

2022-05-10

Manage memory resources