All Products
Search
Document Center

E-MapReduce:Configuration details

Last Updated:Mar 26, 2026

EMR Doctor is pre-configured during installation. The default settings cover metadata collection for your cluster and its tasks. In most cases, you do not need to change them.

This topic describes which configuration items you can modify and how to modify them.

Important

Only modify the configuration items listed in this topic. Modifying other configuration items can cause severe issues, including task failures.

Cluster version definitions:

Cluster version Cluster types
New-version clusters DataLake clusters, DataServing clusters, and clusters in the custom cluster scenario
Old-version clusters Hadoop clusters and gateway clusters in the old EMR console

Access the TAIHAODOCTOR Configure tab

EMR Doctor does not appear directly in the EMR console. Access it by modifying the URL of an existing service's Configure tab.

  1. In the EMR console, find your cluster and click Services in the Actions column.

  2. On the Services tab, click Configure in the HDFS section.

  3. In the browser address bar, replace the service name in the URL with TAIHAODOCTOR and press Enter.

    HDFS Configure tab URL

    The Configure tab of the TAIHAODOCTOR service appears.

    TAIHAODOCTOR Configure tab

  4. Modify the configuration items, save your changes, and apply them.

For detailed steps, see Modify configuration items.

Configure storage metadata collection

Modify these configuration items on the Configure tab of the TAIHAODOCTOR service. See Access the TAIHAODOCTOR Configure tab for navigation steps.

Configuration item Default value Unit Description
collect.storage.enable false Controls whether EMR Doctor collects storage metadata. To enable storage metadata collection, go to Monitoring and Diagnostics > Daily Cluster Reports and turn on Collect Information About Storage Resources.
collect.storage.intermediate.path /mnt/disk1/log/doctor/derby/ Path for intermediate data generated during storage metadata collection. The volume of intermediate data is proportional to the size of the FSImage file.
collect.storage.max.depth 6 Levels Advanced. Maximum directory levels traversed during storage metadata collection, counting from root directories (including those starting with /). Do not set this value too high — it increases analysis time and intermediate storage volume.
collect.storage.top.size 100 Directories Advanced. Number of largest-size directories returned at each level during storage metadata collection. Do not set this value too high — it increases analysis time and intermediate storage volume.
collect.oss.bucket No default value Name of the Object Storage Service (OSS) bucket to analyze. Required when analyzing OSS data. For details, see Enable and configure the storage analysis feature.
collect.oss.manifest.dir No default value Directory where generated OSS inventory lists are stored. Required when analyzing OSS data. For details, see Enable and configure the storage analysis feature.

Configure scheduler collection

Modify these configuration items on the Configure tab of the TAIHAODOCTOR service. See Access the TAIHAODOCTOR Configure tab for navigation steps.

Configuration item Default value Unit Description
collect.job.interval 120 Seconds Interval at which EMR Doctor collects the status of YARN-based scheduling tasks.
collect.jobs.intermediate.path /mnt/disk1/log/doctor/jobs/ Path for intermediate data generated when collecting YARN scheduling task status.

Configure general information

Modify these configuration items on the Configure tab of the TAIHAODOCTOR service. See Access the TAIHAODOCTOR Configure tab for navigation steps.

Configuration item Default value Unit Description
collect.metrics.interval 15 Seconds Interval at which EMR Doctor collects engine task metrics. Do not set this value too high (produces inaccurate diagnostic suggestions) or too low (causes excessive collection pressure on the cluster).
collect.rate.limit 5000 Records/second Maximum records collected per second per process. Records beyond this limit are discarded to protect process stability.

Configure MapReduce task collection

Modify these configuration items on the Configure tab of the YARN service. For details, see Modify configuration items.

In each default value, ${user_config} represents your cluster's existing settings. The content after ${user_config} is the EMR Doctor agent configuration appended to your cluster settings.

Configuration item Cluster version Default value Description
yarn.app.mapreduce.am.command-opts New-version ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr Collects AppMaster metadata for MapReduce tasks.
yarn.app.mapreduce.am.command-opts Old-version ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr Collects AppMaster metadata for MapReduce tasks.
mapreduce.map.java.opts New-version ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr Collects metadata for map tasks.
mapreduce.map.java.opts Old-version ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr Collects metadata for map tasks.
mapreduce.reduce.java.opts New-version ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr Collects metadata for reduce tasks.
mapreduce.reduce.java.opts Old-version ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr Collects metadata for reduce tasks.

Configure Tez task collection

Modify these configuration items on the Configure tab of the Tez service. For details, see Modify configuration items.

In each default value, ${user_config} represents your cluster's existing settings. The content after ${user_config} is the EMR Doctor agent configuration.

Configuration item Cluster version Default value Description
tez.am.launch.cmd-opts New-version ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez Collects AppMaster metadata for Tez tasks.
tez.am.launch.cmd-opts Old-version ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez Collects AppMaster metadata for Tez tasks.
tez.task.launch.cmd-opts New-version ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez Collects metadata for Tez tasks.
tez.task.launch.cmd-opts Old-version ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez Collects metadata for Tez tasks.

Configure Spark task collection

Modify these configuration items on the Configure tab of the Spark service. For details, see Modify configuration items.

In each default value, ${user_config} represents your cluster's existing settings. The content after ${user_config} is the EMR Doctor agent configuration.

Configuration item Cluster version Default value Description
spark.driver.extraJavaOptions New-version ${user_config} -noverify -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark Collects Spark driver metadata.
spark.driver.extraJavaOptions Old-version ${user_config} -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark Collects Spark driver metadata.
spark.executor.extraJavaOptions New-version ${user_config} -noverify -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark Collects Spark executor metadata.
spark.executor.extraJavaOptions Old-version ${user_config} -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark Collects Spark executor metadata.

Configure YARN Timeline Server

Modify this configuration item on the Configure tab of the YARN service. For details, see Modify configuration items.

In the default value, ${user_config} represents your cluster's existing settings. The content after ${user_config} is the EMR Doctor agent configuration.

Configuration item Cluster version Default value Description
YARN_TIMELINESERVER_OPTS New-version ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=history Marks the end of a collection task for the YARN Timeline Server.
YARN_TIMELINESERVER_OPTS Old-version ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=history Marks the end of a collection task for the YARN Timeline Server.