All Products
Search
Document Center

E-MapReduce:Features

Last Updated:Dec 03, 2025

EMR on ECS

Category

Feature

Description

References

Cluster management

Create a cluster

You can build and run open source big data frameworks, such as Hadoop, Spark, Hive, and Presto, for large-scale data processing and analysis.

Create a cluster

Release a cluster

If you no longer require an E-MapReduce (EMR) cluster, you can release the cluster at the earliest opportunity. This helps prevent unnecessary costs.

Release a cluster

View cluster information

You can view basic information about clusters that belong to your Alibaba Cloud account and the details of a cluster.

Query clusters

Log on to a cluster

After a connection to the master node of your cluster is established in SSH mode, you can run Linux commands to manage the cluster and interact with the cluster.

Log on to a cluster

View cluster types

Alibaba Cloud EMR provides various types of clusters, such as DataLake, online analytical processing (OLAP), Dataflow, and DataServing clusters. This provides powerful, flexible, and efficient computing resources for big data processing and analysis.

Node management

Manage node groups

Node groups are key resources used to manage nodes in an EMR cluster. In most cases, node groups consist of Elastic Compute Service (ECS) instances of the same instance type.

Scale out a node group

You can add core nodes or task nodes to scale out an EMR cluster that has insufficient computing or storage resources.

Scale in a node group

When large amounts of computing resources remain idle in an EMR cluster, you can reduce the number of task nodes in the cluster.

Expand a disk

If the disk space of a cluster is insufficient, you can expand the data disks of the cluster.

Upgrade node configurations

If the vCPUs or memory of ECS instances in a node group cannot meet your business requirements, you can upgrade the instance configurations of the node group.

View the health status of nodes

You can check whether a node runs as expected based on the health status of the node. The health status is formed based on the check results of multiple health check items.

Service management

Add services

After you create a cluster, you can add services that are not deployed to the cluster.

Restart a service

After you modify the configuration items of a service, you must restart the service for the modifications to take effect. If a service is faulty or abnormal, you can try to restore the service by restarting the service.

Manage configuration items

You can modify and view the configuration items of services that are deployed in a cluster in the EMR console. You can also add configuration items to the services.

Roll back the configurations of a service

After you perform operations on the configuration items of a service in the EMR console, you can roll back the configurations of the service.

Customize software configurations

You can use the Custom Software Configuration feature provided by EMR to modify existing configurations or add configuration items when you create a cluster.

Export and import service configurations

EMR allows you to export service configurations in the XML or JSON format. This way, you can back up, migrate, or restore the service configurations of an EMR cluster. You can import service configurations that are exported in the JSON format to a new cluster as the preset configurations of the cluster.

Access the web UIs of open source components

You can access the web UIs of open source components that are deployed in an EMR cluster on the Access Links and Ports tab of the cluster in the EMR console.

View information about cluster services

You can view information about the services that are deployed in an EMR cluster. For example, you can view the status, components, and configuration items of services such as HDFS and YARN.

View the health status of services

You can check whether a service runs as expected based on the health status of the service. The health status is formed based on the check results of multiple health check items.

Component management

Perform operations on components

A series of big data services can be deployed in an EMR cluster. This helps you process, analyze, and store a large amount of data. EMR provides operation guides and advanced practices on components.

View the deployment information of service components

You can view the deployment information of service components on each node of an EMR cluster.

View the health status of components

You can check whether a component runs as expected based on the health status of the component. The health status is formed based on the check results of multiple health check items.

User management

Add a user

You can add an existing RAM user as an EMR user. This way, you can use the RAM user to manage EMR clusters or other cloud service resources.

Remove a user

You can remove an existing user from an EMR cluster in the EMR console.

Reset the password of a user

You can reset the password of a user.

Download authentication credentials

Authentication credentials can be downloaded only in high-security clusters. You can download the keytab file of a user account.

Auto scaling

Add auto scaling rules

If your business workloads fluctuate, you can enable auto scaling for your EMR cluster and configure auto scaling rules. This way, EMR automatically increases or decreases the number of task nodes to handle business workloads during peak or off-peak hours. This ensures efficient task processing, maximizes resource utilization, and reduces operating costs.

View auto scaling activities

You can view the changes to nodes in a cluster and the execution records of auto scaling activities.

View the overview information about cluster resources

The auto scaling feature of EMR helps you analyze changes to cluster resources and provides recommended auto scaling rules.

View auto scaling cost analysis results in a visualized manner

You can view the resource usage and cost allocation from multiple dimensions. This way, you can evaluate the cost savings brought by auto scaling and optimize the resource usage of your cluster.

Bootstrap action management

Manage bootstrap actions

You can use bootstrap actions to install third-party software and modify the runtime environment of your cluster.

Manually run scripts

After you create a cluster, you can use the manual script execution feature to manually run a specific script on multiple nodes in the cluster at the same time based on your business requirements.

Operation records

View operation records

You can view the operation records of clusters.

-

Monitoring and alerting

Monitor clusters

You can view the details of the metrics for each service of your cluster.

-

Manage alert rules

EMR allows you to create alert rules to monitor the usage of service resources in EMR clusters. If resource metrics meet specific alert conditions, alerts are triggered and CloudMonitor sends alert notifications. This way, you can identify and handle the exceptions of monitored clusters at the earliest opportunity.

Manage logs

You can use the log management feature together with Simple Log Service to query the logs that are generated for open source components in the EMR console.

Health check

Enable real-time check and analysis

The real-time check feature of EMR Doctor can be used to check the status of a cluster in real time at an interval of 5 minutes. You can view the status of the cluster, related issues, and causes of the issues. Then, you can troubleshoot the issues based on the information. This helps ensure the stability of the cluster.

View daily cluster reports and analysis results in the reports

You can use the health check feature of an EMR cluster to obtain the health status of the cluster and resolve issues in the cluster based on suggestions. This helps ensure that the cluster remains in a healthy state.

Gateway

Create a gateway cluster

You can use a gateway cluster to balance loads and isolate clusters for data security. You can also use the gateway cluster to submit jobs to an EMR cluster.

Configure a gateway cluster

EMR provides the EMR-CLI tool that you can use to deploy a gateway on an Alibaba Cloud ECS instance.

EMR on ACK

Category

Feature

Description

References

Cluster management

Create a cluster

You can deploy open source big data services on top of Container Service for Kubernetes (ACK). You can use ACK to deploy services and manage containerized applications. This reduces the O&M costs of underlying cluster resources and helps you focus on big data jobs.

Release a cluster

If you no longer require a cluster, you can release the cluster to delete the namespace that corresponds to the cluster and all software services that are deployed in the namespace. However, no physical resource is released after you release the cluster.

Release a cluster

View cluster information

You can view the list of clusters and the details of a cluster within your Alibaba Cloud account.

View cluster types

You can create different types of clusters on the EMR on ACK page.

Create a cluster

Service management

Restart a service

After you modify the configuration items of a service, you must restart the service for the modifications to take effect.

Restart a service

Access the web UI of components

After you add a RAM user as an EMR user of an EMR cluster, you can use the RAM user to access the web UI of a service that is deployed in the EMR cluster.

-

Manage configuration items

You can modify and add configuration items for an EMR cluster.

Manage configuration items

Job management

View information about a job

You can view the information about jobs in an EMR cluster.

View jobs

EMR Workbench

Category

Feature

Description

References

EMR Workflow

Manage workspaces

You can perform all configurations and run tasks and workflows in a specific workspace. The administrator of a workspace can add users to the workspace as members and assign specific roles to the members. This way, workspace members to which different roles are assigned can collaborate with each other.

-

Manage resource groups for scheduling

Resource groups for scheduling are used to schedule and run tasks. If the default resource group for scheduling cannot meet your requirements, you can purchase a resource group for scheduling based on your business requirements.

-

Manage projects

You can edit tasks and schedule workflows in a specific project.

Create a project

Manage workflows

A workflow is a directed acyclic graph (DAG) that you can create by dragging tasks and associating the tasks with each other.

Manage a workflow

Manage workflow instances

A workflow instance is an instantiation of a workflow. A workflow instance is generated when a workflow is manually run or is scheduled to run.

Manage workflow instances

Manage tasks

After you save a workflow, you can perform operations on existing tasks.

Manage tasks

Manage task instances

After you save a workflow, you can perform operations on existing task instances.

Manage tasks

Manage manual tasks

You can create a manual task that is independent of a workflow. You must manually run this type of task.

-

Manage manual task instances

A manual task instance is an instantiation of a manual task. Each time a manual task is manually triggered to run, a manual task instance is generated for the manual task.

-

Manage resources

If you want to use a third-party JAR package or a custom script during scheduling, you can upload the required files in Resource Center.

Resource management

Manage data sources

You can configure different data sources to meet different data storage and access requirements.

Data source management

Use Security Center

Security Center allows you to manage users, alert instances, alert groups, and audit logs. This helps you implement fine-grained permission management and security monitoring of operations.

-