EMR Doctor is an open source big data cluster manager that provides an end-to-end intelligent diagnostic and optimization service. It runs daily health reports and real-time scans to detect issues before they affect your workloads. For Hadoop clusters running EMR V3.x earlier than V3.41.0, V4.x, or V5.x earlier than V5.6.0, you must request activation from the Alibaba Cloud EMR Doctor team.
EMR Doctor is enabled by default for DataLake clusters, DataServing clusters, and custom clusters. This topic applies only to the Hadoop cluster versions listed above.
Prerequisites
Before you begin, make sure that:
Your Hadoop cluster runs one of the following versions:
EMR V3.x earlier than V3.41.0
EMR V4.x (all minor versions)
EMR V5.x earlier than V5.6.0
You can join a DingTalk group to contact the EMR Doctor team (group ID: 44846846)
How EMR Doctor works
EMR Doctor collects metrics and events from your cluster's computing tasks, then scores each task in the backend. This gives you two types of visibility:
Daily health reports: Analyzes storage and compute engines, performs cluster-wide health checks, and generates a daily report of cluster health status.
Real-time cluster reports: Continuously scans computing tasks and services to surface issues and notify operations and maintenance (O&M) personnel for troubleshooting.
Activation impacts
EMR Doctor does not affect existing tasks in your cluster. It collects only metrics and events from tasks, then analyzes and scores them in the backend. Installation is transparent to running and queued tasks.
Configuration changes applied during installation
When EMR Doctor is installed, it automatically appends the following settings to your cluster's service configuration files:
| Service | Configuration file | Parameters added |
|---|---|---|
| Hive | hive-env.sh | Environment variables |
| YARN | mapred-site.xml | yarn.app.mapreduce.am.command-opts, mapreduce.map.java.opts, mapreduce.reduce.java.opts → -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr |
| YARN | yarn-env.sh | Environment variables |
| Spark | spark-defaults.conf | spark.driver.extraJavaOptions, spark.executor.extraJavaOptions, spark.yarn.am.extraJavaOptions → -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark |
| Tez | tez-site.xml | tez.task.launch.cmd-opts, tez.am.launch.cmd-opts → -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez |
Pending configurations
During installation, EMR may deliver service configurations that were saved but not yet applied to your cluster. Before activation, check whether any pending configurations exist for Hive, Spark, YARN, or Tez, and assess the impact of applying them.
Service restarts after installation
After installation, restart the following services:
App Timeline Server
HiveServer2
Spark Thrift Server
Schedule restarts during off-peak hours or a maintenance window. EMR Doctor works without these restarts, but some data may not be collected for certain jobs — for example, Hive on MapReduce jobs.
The Alibaba Cloud EMR Doctor team will notify you when installation is complete and confirm which services need to be restarted.
Request activation
The Alibaba Cloud EMR Doctor team handles installation end-to-end to protect cluster stability.
Join the DingTalk group with ID 44846846. An engineer from the EMR Doctor team will contact you.
The engineer reviews your cluster status and schedules an installation window.
The EMR Doctor team installs EMR Doctor at the agreed time.
After installation, log on to the EMR console to view the reports that are generated based on EMR Doctor analysis.
Contact us
The Alibaba Cloud EMR team provides comprehensive installation support to help you enable this feature. If you need assistance, use DingTalk to search for the group number 44846846 and join the DingTalk group. An engineer will be assigned to discuss a specific plan with you.