EMR AI Assistant Overview for Intelligent O&M - E-MapReduce

EMR AI Assistant (EMR Agent) is an intelligent O&M tool for Alibaba Cloud E-MapReduce (EMR) that lets you manage and maintain clusters through natural language. Instead of navigating scattered console menus or learning complex troubleshooting flows, describe what you need — the assistant handles the rest.

Applicability

Status: Public preview
Supported deployment types: EMR on ECS and EMR Serverless StarRocks
Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), and China (Chengdu)

Capabilities

EMR AI Assistant supports three capabilities:

Capability	What it does
AI chat	Answers questions about EMR product features, configurations, and best practices in real time.
Fault diagnosis	Diagnoses faults in open source components, analyzes error messages and monitoring metrics, and provides a diagnostic report with remediation steps.
Operation invocation	Invokes O&M operations — such as scaling out a cluster or modifying component configurations — through an operation card that you review and confirm before execution.

Use cases

Managing large-scale EMR clusters often presents the following challenges:

Scattered and hard-to-find operation entry points
Complex troubleshooting flows for component faults
Tedious configuration of access policies
High learning curve for new users

EMR AI Assistant uses natural language understanding to transform complex O&M actions into simple conversations. This significantly reduces response times, lowers the risk of human error, and improves O&M efficiency and user experience.

Prerequisites

Before you begin, ensure that you have:

For RAM users only: A root account must grant the following permissions.

Basic session permissions (required)

These permissions are required for AI chat, session history, and feedback. Add the following policy to the RAM user:

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "emr-agent:ListSession",
        "emr-agent:ListMessage",
        "emr-agent:DeleteSession",
        "emr-agent:FeedbackOnMessage",
        "emr-agent:ChatCompletion",
        "emr-agent:GetPresignedUrl",
        "emr-agent:UpdateCard"
      ],
      "Resource": "*"
    }
  ]
}

Alternatively, attach the AliyunEMRFullAccess or AliyunEMRDevelopAccess system policy.

O&M operation permissions (optional)

To perform cluster operations such as scaling out or modifying configurations, you also need the corresponding cluster operation permissions. See Grant permissions to a RAM user.

Get started

Important

Results from EMR AI Assistant are for reference only and cannot be used as final technical support conclusions or business decisions. Alibaba Cloud is not legally responsible for any direct or indirect loss resulting from actions taken based on EMR AI Assistant output. AI-generated information is not guaranteed to be accurate — verify as needed before acting on it.

Log on to the EMR on ECS console.
In the sidebar, click the icon to open the session interface.
In the input box, type your question or requirement in natural language. The assistant automatically detects your intent and returns a response.

Root account users can directly use the EMR AI Assistant feature without any additional permission setup.

AI chat

Ask any question about EMR in natural language. The assistant provides answers about product features, configurations, and operational procedures.

Example prompts:

Topic	Example prompt
Feature guidance	"How do I enable Auto Scaling for my cluster?"
Configuration	"How do I modify service configurations for cluster `<cluster-name>`?"
Resource lookup	"List all EMR clusters under my account."
Health check	"Are the services on cluster `<cluster-name>` running normally?"

For example, if you ask "How do I enable Auto Scaling?", the assistant returns step-by-step instructions for configuring Auto Scaling.

Fault diagnosis

When a component has an abnormal status or an O&M operation fails, describe the issue to the assistant. It uses intelligent diagnostic tools to analyze the fault, identifies possible causes, and provides a diagnostic report with remediation steps.

For more accurate results, include the cluster name, the time the issue occurred, and any error messages you observed.

Example prompts:

Component	Issue type	Example prompt
Zookeeper	Abnormal status	"The Zookeeper status is abnormal. What should I do?"
Node	Memory usage	"Node memory usage is too high. How do I fix it?"
Cluster scale-out	Operation failure	"Scale-out failed for cluster `<cluster-name>`. What went wrong?"
YARN	Job failure	"My YARN job failed. The cluster is `<cluster-name>` and the error occurred at `<time>`."
HDFS	Storage issues	"HDFS disk usage is critically high on cluster `<cluster-name>`."

Operation invocation

Ask the assistant to perform an O&M operation. It creates an operation card with the required parameters for your review. After you confirm the details, the operation runs.

How it works:

Describe the operation you need (for example, "Scale out <cluster-name> by adding two Task nodes").
The assistant generates an operation card with the proposed parameters.
Review the parameters on the card.
Confirm to start the operation.

Supported operations:

Modify service component configurations
Scale out a cluster
Modify cluster bootstrap action configurations
View resources: clusters, cluster templates, service components, node groups, bootstrap scripts, and scaling rules
View operation history

Example prompts:

Operation	Example prompt
Scale out	"Scale out `<cluster-name>` by adding two Task nodes."
Modify config	"Change the memory allocation for the YARN NodeManager on cluster `<cluster-name>`."
View resources	"Show me the node groups for cluster `<cluster-name>`."
View history	"What operations were performed on cluster `<cluster-name>` in the past week?"

Feedback

Rate responses using the or icons. Your feedback helps improve the accuracy and relevance of future responses.