Configuration - E-MapReduce - Alibaba Cloud Documentation Center

The basic information about E-MapReduce (EMR) Doctor is configured during installation. The basic configurations are related to metadata collection for your EMR cluster and for the tasks that are run in the cluster. In most cases, the basic configurations do not need to be modified or delivered. This topic describes the basic configurations of EMR Doctor and how to modify the basic configurations.

Important

The following sections describe the EMR Doctor configurations that can be modified. Do not modify the other configurations. Otherwise, severe issues such as task failures may occur.
In this topic, new-version clusters refer to DataLake clusters, DataServing clusters, and clusters in the custom cluster scenario. Old-version clusters refer to Hadoop clusters and gateway clusters in the old EMR console.

Configure storage metadata collection


Configuration item	Default value	Description
collect.storage.enable	false	Specifies whether to collect storage metadata. By default, EMR Doctor does not collect storage metadata. On the Health Check tab, you can turn on Collect Information About Storage Resources to enable this configuration item.
collect.storage.intermediate.path	/mnt/disk1/log/doctor/derby/	The path that stores the intermediate data generated when storage metadata is collected. Intermediate data is generated when storage metadata is collected. The volume of intermediate data is proportional to the size of the FSImage file.
collect.storage.max.depth	6	An advanced configuration item. The maximum levels of directories that are traversed when storage metadata is collected, including the directories that start with a forward slash (/). Note We recommend that you do not specify an excessively large value for this parameter. An excessively large value may result in issues such as a long period of analysis time and a large volume of intermediate storage data.
collect.storage.top.size	100	An advanced configuration item. The number of the largest-size directories that can be obtained at each level when storage metadata is collected. Default value: 100. Note We recommend that you do not specify an excessively large value for this parameter. An excessively large value may result in issues such as a long period of analysis time and a large volume of intermediate storage data.
collect.oss.bucket	No default value	The name of the bucket whose objects you want to analyze. This configuration item is required when you analyze data stored in Object Storage Service (OSS). For more information, see Enable and configure the storage analysis feature.
collect.oss.manifest.dir	No default value	The directory in which the generated inventory lists are stored. This configuration item is required when you analyze data stored in OSS. For more information, see Enable and configure the storage analysis feature.

You can view or modify the preceding configuration items on the Configure tab of the TAIHAODOCTOR service page in the EMR console. Procedure:

Go to the Configure tab of a service.
Note EMR Doctor is not displayed in the EMR console. To access EMR Doctor, you must modify the link for the Configure tab of the current service. In this topic, the HDFS service is used.
1. In the EMR console, find the desired cluster and click Services in the Actions column.
2. On the Services tab, click Configure in the HDFS section.
Replace the service name in the link in the address bar of the browser with TAIHAODOCTOR and press Enter.
The Configure tab of the TAIHAODOCTOR service appears.
On the Configure tab of the TAIHAODOCTOR service, you can modify the preceding configuration items based on your business requirements, save the configurations, and make the configurations take effect.
For more information about how to modify configuration items, see Modify configuration items.

Configure scheduler collection

You can view or modify the following configuration items on the Configure tab of the TAIHAODOCTOR service in the EMR console. For more information about how to modify the following configuration items, see Configure storage metadata collection.


Configuration item	Default value	Description
collect.job.interval	120	The interval at which the status of the YARN-based scheduling tasks is collected. Default value: 120. Unit: seconds.
collect.jobs.intermediate.path	/mnt/disk1/log/doctor/jobs/	The path that stores the intermediate data generated when the status of YARN-based scheduling tasks is collected.

Configure general information


Configuration item	Default value	Description
collect.metrics.interval	15	The interval at which the information about the metrics of an engine task is collected. Default value: 15. Unit: seconds. Note To prevent the stability of tasks from being affected, we recommend that you do not specify an excessively large or small value for this parameter. An excessively large value may result in inauthentic suggestions for tasks. An excessively small value may result in excessive collection pressure.
collect.rate.limit	5000	The maximum number of records collected per second for each process. Excess data records are discarded to prevent process stability from being affected.

Configure MapReduce task collection

You can view or modify the following configuration items on the Configure tab of the YARN service in the EMR console.

For more information about how to modify configuration items, see Modify configuration items.


Configuration item	Default value	Description
yarn.app.mapreduce.am.command-opts	New-version clusters `${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr` Old-version clusters `${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr`	Collects the AppMaster metadata for a MapReduce task. `${user_config}` indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.
mapreduce.map.java.opts	New-version clusters `${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr` Old-version clusters `${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr`	Collects the metadata of a map task. `${user_config}` indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.
mapreduce.reduce.java.opts	New-version clusters `${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr` Old-version clusters `${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr`	Collects the metadata of a reduce task. `${user_config}` indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.

Configure Tez task collection

You can view or modify the following configuration items on the Configure tab of the Tez service in the EMR console.

For more information about how to modify configuration items, see Modify configuration items.


Configuration item	Default value	Description
tez.am.launch.cmd-opts	New-version clusters `${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez` Old-version clusters `${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez`	Collects the AppMaster metadata for a Tez task.
tez.task.launch.cmd-opts	New-version clusters `${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez` Old-version clusters `${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez`	Collects the metadata of a Tez task.

Configure Spark task collection

You can view or modify the following configuration items on the Configure tab of the Spark service in the EMR console.

For more information about how to modify configuration items, see Modify configuration items.


Configuration item	Default value	Description
spark.driver.extraJavaOptions	New-version clusters `${user_config} -noverify -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark` Old-version clusters `${user_config} -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark`	Collects the metadata of a Spark driver. `${user_config}` indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.
spark.executor.extraJavaOptions	New-version clusters `${user_config} -noverify -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark` Old-version clusters `${user_config} -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark`	Collects the metadata of a Spark executor. `${user_config}` indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.

Configure YARN Timeline Server

You can view or modify the following configuration items on the Configure tab of the YARN service in the EMR console.

For more information about how to modify configuration items, see Modify configuration items.


Configuration item	Default value	Description
YARN_TIMELINESERVER_OPTS	New-version clusters `${user_config} -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=history` Old-version clusters `${user_config} -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=history`	The end flag of the collection task. `${user_config}` indicates the configurations of your cluster. The content after ${user_config} indicates the configurations of the EMR Doctor service.