Use the job collection feature of EMR Doctor in different scheduling systems - E-MapReduce

When you create an E-MapReduce (EMR) cluster, the EMR Doctor environment is automatically installed and the job collection feature is enabled for health status evaluation by default. Some client settings may cause job collection configurations to become invalid. This topic describes how to append the parameters that are used to collect jobs of different engine types to a client to ensure that EMR Doctor can collect the jobs as expected.

EMR Doctor settings

In most cases, you do not need to configure EMR Doctor parameters. The client settings are automatically configured by default when an EMR cluster is created. If you modify or configure the parameters in the following table for a job, the default settings of the parameters in the cluster on which the job is run are overwritten. In this case, you must manually add the EMR Doctor settings in the following table to the parameters that you modified or configured.

EMR Doctor uses Java agents to collect job metrics. The following table describes the Java agent parameters that you need to configure for EMR Doctor.

Important In this topic, new-version clusters refer to DataLake clusters, DataServing clusters, and clusters in the custom cluster scenario. Old-version clusters refer to Hadoop clusters and gateway clusters in the old EMR console.


Engine name	Parameter	Appended EMR Doctor setting
MapReduce	yarn.app.mapreduce.am.command-opts	New-version clusters `-javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr` Old-version clusters `-javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr`
	mapreduce.map.java.opts
	mapreduce.reduce.java.opts
Tez	tez.task.launch.cmd-opts	New-version clusters `-javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez` Old-version clusters `-javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez`
Tez	tez.am.launch.cmd-opts
Spark	spark.driver.extraJavaOptions	New-version clusters `-noverify -javaagent:/opt/apps/TAIHAODOCTOR/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark` Old-version clusters `-noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark`
	spark.executor.extraJavaOptions
	spark.yarn.am.extraJavaOptions

Use the job collection feature of EMR Doctor for EMR nodes in DataWorks

If you configure EMR Doctor settings for EMR nodes when you configure the EMR nodes, and you want to schedule the EMR nodes in DataWorks, you must append the EMR Doctor settings to the related parameters.

For example, if you configure the spark.driver.extraJavaOptions parameter for an EMR Spark node and you want to use the job collection feature of EMR Doctor, you must append the related EMR Doctor setting to the end of the parameter. DataWorks on EMR

Use the job collection feature of EMR Doctor in DolphinScheduler

We recommend that you use an EMR gateway environment because the scheduling system that runs in the EMR gateway environment contains information such as the software packages of EMR Doctor.

If DolphinScheduler is deployed in the gateway environment, you can add the EMR Doctor settings that are described in the preceding table to the parameters that you configure in the Optional Parameter section when you define a workflow. This way, EMR Doctor can collect jobs in the workflow when the workflow runs and analyze the jobs later.

Use the job collection feature of EMR Doctor when you develop data by using Data Platform in the old EMR console

For example, if you configure the spark.driver.extraJavaOptions parameter in Data Platform in the old EMR console and you want to use the job collection feature of EMR Doctor, you must append the -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark parameter that is described in the preceding table. Before job collection parameters are appended

Before job collection parameters are appended