EMR Doctor is pre-configured during installation. The default settings cover metadata collection for your cluster and its tasks. In most cases, you do not need to change them.
This topic describes which configuration items you can modify and how to modify them.
Only modify the configuration items listed in this topic. Modifying other configuration items can cause severe issues, including task failures.
Cluster version definitions:
| Cluster version | Cluster types |
|---|---|
| New-version clusters | DataLake clusters, DataServing clusters, and clusters in the custom cluster scenario |
| Old-version clusters | Hadoop clusters and gateway clusters in the old EMR console |
Access the TAIHAODOCTOR Configure tab
EMR Doctor does not appear directly in the EMR console. Access it by modifying the URL of an existing service's Configure tab.
-
In the EMR console, find your cluster and click Services in the Actions column.
-
On the Services tab, click Configure in the HDFS section.
-
In the browser address bar, replace the service name in the URL with
TAIHAODOCTORand press Enter.
The Configure tab of the TAIHAODOCTOR service appears.

-
Modify the configuration items, save your changes, and apply them.
For detailed steps, see Modify configuration items.
Configure storage metadata collection
Modify these configuration items on the Configure tab of the TAIHAODOCTOR service. See Access the TAIHAODOCTOR Configure tab for navigation steps.
| Configuration item | Default value | Unit | Description |
|---|---|---|---|
collect.storage.enable |
false |
— | Controls whether EMR Doctor collects storage metadata. To enable storage metadata collection, go to Monitoring and Diagnostics > Daily Cluster Reports and turn on Collect Information About Storage Resources. |
collect.storage.intermediate.path |
/mnt/disk1/log/doctor/derby/ |
— | Path for intermediate data generated during storage metadata collection. The volume of intermediate data is proportional to the size of the FSImage file. |
collect.storage.max.depth |
6 |
Levels | Advanced. Maximum directory levels traversed during storage metadata collection, counting from root directories (including those starting with /). Do not set this value too high — it increases analysis time and intermediate storage volume. |
collect.storage.top.size |
100 |
Directories | Advanced. Number of largest-size directories returned at each level during storage metadata collection. Do not set this value too high — it increases analysis time and intermediate storage volume. |
collect.oss.bucket |
No default value | — | Name of the Object Storage Service (OSS) bucket to analyze. Required when analyzing OSS data. For details, see Enable and configure the storage analysis feature. |
collect.oss.manifest.dir |
No default value | — | Directory where generated OSS inventory lists are stored. Required when analyzing OSS data. For details, see Enable and configure the storage analysis feature. |
Configure scheduler collection
Modify these configuration items on the Configure tab of the TAIHAODOCTOR service. See Access the TAIHAODOCTOR Configure tab for navigation steps.
| Configuration item | Default value | Unit | Description |
|---|---|---|---|
collect.job.interval |
120 |
Seconds | Interval at which EMR Doctor collects the status of YARN-based scheduling tasks. |
collect.jobs.intermediate.path |
/mnt/disk1/log/doctor/jobs/ |
— | Path for intermediate data generated when collecting YARN scheduling task status. |
Configure general information
Modify these configuration items on the Configure tab of the TAIHAODOCTOR service. See Access the TAIHAODOCTOR Configure tab for navigation steps.
| Configuration item | Default value | Unit | Description |
|---|---|---|---|
collect.metrics.interval |
15 |
Seconds | Interval at which EMR Doctor collects engine task metrics. Do not set this value too high (produces inaccurate diagnostic suggestions) or too low (causes excessive collection pressure on the cluster). |
collect.rate.limit |
5000 |
Records/second | Maximum records collected per second per process. Records beyond this limit are discarded to protect process stability. |
Configure MapReduce task collection
Modify these configuration items on the Configure tab of the YARN service. For details, see Modify configuration items.
In each default value, ${user_config} represents your cluster's existing settings. The content after ${user_config} is the EMR Doctor agent configuration appended to your cluster settings.
| Configuration item | Cluster version | Default value | Description |
|---|---|---|---|
yarn.app.mapreduce.am.command-opts |
New-version | ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr |
Collects AppMaster metadata for MapReduce tasks. |
yarn.app.mapreduce.am.command-opts |
Old-version | ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr |
Collects AppMaster metadata for MapReduce tasks. |
mapreduce.map.java.opts |
New-version | ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr |
Collects metadata for map tasks. |
mapreduce.map.java.opts |
Old-version | ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr |
Collects metadata for map tasks. |
mapreduce.reduce.java.opts |
New-version | ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr |
Collects metadata for reduce tasks. |
mapreduce.reduce.java.opts |
Old-version | ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr |
Collects metadata for reduce tasks. |
Configure Tez task collection
Modify these configuration items on the Configure tab of the Tez service. For details, see Modify configuration items.
In each default value, ${user_config} represents your cluster's existing settings. The content after ${user_config} is the EMR Doctor agent configuration.
| Configuration item | Cluster version | Default value | Description |
|---|---|---|---|
tez.am.launch.cmd-opts |
New-version | ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez |
Collects AppMaster metadata for Tez tasks. |
tez.am.launch.cmd-opts |
Old-version | ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez |
Collects AppMaster metadata for Tez tasks. |
tez.task.launch.cmd-opts |
New-version | ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez |
Collects metadata for Tez tasks. |
tez.task.launch.cmd-opts |
Old-version | ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez |
Collects metadata for Tez tasks. |
Configure Spark task collection
Modify these configuration items on the Configure tab of the Spark service. For details, see Modify configuration items.
In each default value, ${user_config} represents your cluster's existing settings. The content after ${user_config} is the EMR Doctor agent configuration.
| Configuration item | Cluster version | Default value | Description |
|---|---|---|---|
spark.driver.extraJavaOptions |
New-version | ${user_config} -noverify -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark |
Collects Spark driver metadata. |
spark.driver.extraJavaOptions |
Old-version | ${user_config} -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark |
Collects Spark driver metadata. |
spark.executor.extraJavaOptions |
New-version | ${user_config} -noverify -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark |
Collects Spark executor metadata. |
spark.executor.extraJavaOptions |
Old-version | ${user_config} -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark |
Collects Spark executor metadata. |
Configure YARN Timeline Server
Modify this configuration item on the Configure tab of the YARN service. For details, see Modify configuration items.
In the default value, ${user_config} represents your cluster's existing settings. The content after ${user_config} is the EMR Doctor agent configuration.
| Configuration item | Cluster version | Default value | Description |
|---|---|---|---|
YARN_TIMELINESERVER_OPTS |
New-version | ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=history |
Marks the end of a collection task for the YARN Timeline Server. |
YARN_TIMELINESERVER_OPTS |
Old-version | ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=history |
Marks the end of a collection task for the YARN Timeline Server. |