The basic information about E-MapReduce (EMR) Doctor is configured during installation. The basic configurations are related to metadata collection for your EMR cluster and for the tasks that are run in the cluster. In most cases, the basic configurations do not need to be modified or delivered. This topic describes the basic configurations of EMR Doctor and how to modify the basic configurations.

Important
  • The following sections describe the EMR Doctor configurations that can be modified. Do not modify the other configurations. Otherwise, severe issues such as task failures may occur.
  • In this topic, new-version clusters refer to DataLake clusters, DataServing clusters, and clusters in the custom cluster scenario. Old-version clusters refer to Hadoop clusters and gateway clusters in the old EMR console.

Configure storage metadata collection

Configuration itemDefault valueDescription
collect.storage.enablefalseSpecifies whether to collect storage metadata. By default, EMR Doctor does not collect storage metadata.
On the Health Check tab, you can turn on Collect Information About Storage Resources to enable this configuration item. open
collect.storage.intermediate.path/mnt/disk1/log/doctor/derby/The path that stores the intermediate data generated when storage metadata is collected.

Intermediate data is generated when storage metadata is collected. The volume of intermediate data is proportional to the size of the FSImage file.

collect.storage.max.depth6An advanced configuration item. The maximum levels of directories that are traversed when storage metadata is collected, including the directories that start with a forward slash (/).
Note We recommend that you do not specify an excessively large value for this parameter. An excessively large value may result in issues such as a long period of analysis time and a large volume of intermediate storage data.
collect.storage.top.size100An advanced configuration item. The number of the largest-size directories that can be obtained at each level when storage metadata is collected. Default value: 100.
Note We recommend that you do not specify an excessively large value for this parameter. An excessively large value may result in issues such as a long period of analysis time and a large volume of intermediate storage data.
collect.oss.bucketNo default valueThe name of the bucket whose objects you want to analyze.

This configuration item is required when you analyze data stored in Object Storage Service (OSS). For more information, see Enable and configure the storage analysis feature.

collect.oss.manifest.dirNo default valueThe directory in which the generated inventory lists are stored.

This configuration item is required when you analyze data stored in OSS. For more information, see Enable and configure the storage analysis feature.

You can view or modify the preceding configuration items on the Configure tab of the TAIHAODOCTOR service page in the EMR console. Procedure:
  1. Go to the Configure tab of a service.
    Note EMR Doctor is not displayed in the EMR console. To access EMR Doctor, you must modify the link for the Configure tab of the current service. In this topic, the HDFS service is used.
    1. In the EMR console, find the desired cluster and click Services in the Actions column.
    2. On the Services tab, click Configure in the HDFS section.
  2. Replace the service name in the link in the address bar of the browser with TAIHAODOCTOR and press Enter. HDFS
    The Configure tab of the TAIHAODOCTOR service appears. TAIHAODOCTOR
  3. On the Configure tab of the TAIHAODOCTOR service, you can modify the preceding configuration items based on your business requirements, save the configurations, and make the configurations take effect.

    For more information about how to modify configuration items, see Modify configuration items.

Configure scheduler collection

You can view or modify the following configuration items on the Configure tab of the TAIHAODOCTOR service in the EMR console. For more information about how to modify the following configuration items, see Configure storage metadata collection.

Configuration itemDefault valueDescription
collect.job.interval120The interval at which the status of the YARN-based scheduling tasks is collected. Default value: 120. Unit: seconds.
collect.jobs.intermediate.path/mnt/disk1/log/doctor/jobs/The path that stores the intermediate data generated when the status of YARN-based scheduling tasks is collected.

Configure general information

You can view or modify the following configuration items on the Configure tab of the TAIHAODOCTOR service in the EMR console. For more information about how to modify the following configuration items, see Configure storage metadata collection.

Configuration itemDefault valueDescription
collect.metrics.interval15The interval at which the information about the metrics of an engine task is collected. Default value: 15. Unit: seconds.
Note To prevent the stability of tasks from being affected, we recommend that you do not specify an excessively large or small value for this parameter. An excessively large value may result in inauthentic suggestions for tasks. An excessively small value may result in excessive collection pressure.
collect.rate.limit5000The maximum number of records collected per second for each process. Excess data records are discarded to prevent process stability from being affected.

Configure MapReduce task collection

You can view or modify the following configuration items on the Configure tab of the YARN service in the EMR console.

For more information about how to modify configuration items, see Modify configuration items.

Configuration itemDefault valueDescription
yarn.app.mapreduce.am.command-opts
  • New-version clusters

    ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr

  • Old-version clusters

    ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr

Collects the AppMaster metadata for a MapReduce task.

${user_config} indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.

mapreduce.map.java.opts
  • New-version clusters

    ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr

  • Old-version clusters

    ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr

Collects the metadata of a map task.

${user_config} indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.

mapreduce.reduce.java.opts
  • New-version clusters

    ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr

  • Old-version clusters

    ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr

Collects the metadata of a reduce task.

${user_config} indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.

Configure Tez task collection

You can view or modify the following configuration items on the Configure tab of the Tez service in the EMR console.

For more information about how to modify configuration items, see Modify configuration items.

Configuration itemDefault valueDescription
tez.am.launch.cmd-opts
  • New-version clusters

    ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez

  • Old-version clusters

    ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez

Collects the AppMaster metadata for a Tez task.
tez.task.launch.cmd-opts
  • New-version clusters

    ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez

  • Old-version clusters

    ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez

Collects the metadata of a Tez task.

Configure Spark task collection

You can view or modify the following configuration items on the Configure tab of the Spark service in the EMR console.

For more information about how to modify configuration items, see Modify configuration items.

Configuration itemDefault valueDescription
spark.driver.extraJavaOptions
  • New-version clusters

    ${user_config} -noverify -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark

  • Old-version clusters

    ${user_config} -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark

Collects the metadata of a Spark driver.

${user_config} indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.

spark.executor.extraJavaOptions
  • New-version clusters

    ${user_config} -noverify -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark

  • Old-version clusters

    ${user_config} -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark

Collects the metadata of a Spark executor.

${user_config} indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.

Configure YARN Timeline Server

You can view or modify the following configuration items on the Configure tab of the YARN service in the EMR console.

For more information about how to modify configuration items, see Modify configuration items.

Configuration itemDefault valueDescription
YARN_TIMELINESERVER_OPTS
  • New-version clusters

    ${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=history

  • Old-version clusters

    ${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=history

The end flag of the collection task.

${user_config} indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.