All Products
Search
Document Center

E-MapReduce:AIOps: Get started with EMR AI Assistant

Last Updated:Mar 26, 2026

EMR AI Assistant (EMR Agent) is an intelligent O&M tool for Alibaba Cloud E-MapReduce (EMR) that lets you manage and maintain clusters through natural language. Instead of navigating scattered console menus or learning complex troubleshooting flows, describe what you need — the assistant handles the rest.

Applicability

  • Status: Public preview

  • Supported deployment types: EMR on ECS and EMR Serverless StarRocks

  • Supported regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), and China (Chengdu)

Capabilities

EMR AI Assistant supports three capabilities:

Capability What it does
AI chat Answers questions about EMR product features, configurations, and best practices in real time.
Fault diagnosis Diagnoses faults in open source components, analyzes error messages and monitoring metrics, and provides a diagnostic report with remediation steps.
Operation invocation Invokes O&M operations — such as scaling out a cluster or modifying component configurations — through an operation card that you review and confirm before execution.

Use cases

Managing large-scale EMR clusters often presents the following challenges:

  • Scattered and hard-to-find operation entry points

  • Complex troubleshooting flows for component faults

  • Tedious configuration of access policies

  • High learning curve for new users

EMR AI Assistant uses natural language understanding to transform complex O&M actions into simple conversations. This significantly reduces response times, lowers the risk of human error, and improves O&M efficiency and user experience.

Prerequisites

Before you begin, ensure that you have:

For RAM users only: A root account must grant the following permissions.

Basic session permissions (required)

These permissions are required for AI chat, session history, and feedback. Add the following policy to the RAM user:

{
  "Version": "1",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "emr-agent:ListSession",
        "emr-agent:ListMessage",
        "emr-agent:DeleteSession",
        "emr-agent:FeedbackOnMessage",
        "emr-agent:ChatCompletion",
        "emr-agent:GetPresignedUrl",
        "emr-agent:UpdateCard"
      ],
      "Resource": "*"
    }
  ]
}

Alternatively, attach the AliyunEMRFullAccess or AliyunEMRDevelopAccess system policy.

O&M operation permissions (optional)

To perform cluster operations such as scaling out or modifying configurations, you also need the corresponding cluster operation permissions. See Grant permissions to a RAM user.

Get started

Important

Results from EMR AI Assistant are for reference only and cannot be used as final technical support conclusions or business decisions. Alibaba Cloud is not legally responsible for any direct or indirect loss resulting from actions taken based on EMR AI Assistant output. AI-generated information is not guaranteed to be accurate — verify as needed before acting on it.

  1. Log on to the EMR on ECS console.

  2. In the sidebar, click the image icon to open the session interface.

  3. In the input box, type your question or requirement in natural language. The assistant automatically detects your intent and returns a response.

Root account users can directly use the EMR AI Assistant feature without any additional permission setup.

AI chat

Ask any question about EMR in natural language. The assistant provides answers about product features, configurations, and operational procedures.

Example prompts:

Topic Example prompt
Feature guidance "How do I enable Auto Scaling for my cluster?"
Configuration "How do I modify service configurations for cluster <cluster-name>?"
Resource lookup "List all EMR clusters under my account."
Health check "Are the services on cluster <cluster-name> running normally?"

For example, if you ask "How do I enable Auto Scaling?", the assistant returns step-by-step instructions for configuring Auto Scaling.

Fault diagnosis

When a component has an abnormal status or an O&M operation fails, describe the issue to the assistant. It uses intelligent diagnostic tools to analyze the fault, identifies possible causes, and provides a diagnostic report with remediation steps.

For more accurate results, include the cluster name, the time the issue occurred, and any error messages you observed.

Example prompts:

Component Issue type Example prompt
Zookeeper Abnormal status "The Zookeeper status is abnormal. What should I do?"
Node Memory usage "Node memory usage is too high. How do I fix it?"
Cluster scale-out Operation failure "Scale-out failed for cluster <cluster-name>. What went wrong?"
YARN Job failure "My YARN job failed. The cluster is <cluster-name> and the error occurred at <time>."
HDFS Storage issues "HDFS disk usage is critically high on cluster <cluster-name>."

Operation invocation

Ask the assistant to perform an O&M operation. It creates an operation card with the required parameters for your review. After you confirm the details, the operation runs.

How it works:

  1. Describe the operation you need (for example, "Scale out <cluster-name> by adding two Task nodes").

  2. The assistant generates an operation card with the proposed parameters.

  3. Review the parameters on the card.

  4. Confirm to start the operation.

Supported operations:

  • Modify service component configurations

  • Scale out a cluster

  • Modify cluster bootstrap action configurations

  • View resources: clusters, cluster templates, service components, node groups, bootstrap scripts, and scaling rules

  • View operation history

Example prompts:

Operation Example prompt
Scale out "Scale out <cluster-name> by adding two Task nodes."
Modify config "Change the memory allocation for the YARN NodeManager on cluster <cluster-name>."
View resources "Show me the node groups for cluster <cluster-name>."
View history "What operations were performed on cluster <cluster-name> in the past week?"

Feedback

Rate responses using the image or image icons. Your feedback helps improve the accuracy and relevance of future responses.