The basic information about E-MapReduce (EMR) Doctor is configured during installation. The basic configurations are related to metadata collection for your EMR cluster and for the tasks that are run in the cluster. In most cases, the basic configurations do not need to be modified or delivered. This topic describes the basic configurations of EMR Doctor and how to modify the basic configurations.
- The following sections describe the EMR Doctor configurations that can be modified. Do not modify the other configurations. Otherwise, severe issues such as task failures may occur.
- In this topic, new-version clusters refer to DataLake clusters, DataServing clusters, and clusters in the custom cluster scenario. Old-version clusters refer to Hadoop clusters and gateway clusters in the old EMR console.
Configure storage metadata collection
Configuration item | Default value | Description |
---|---|---|
collect.storage.enable | false | Specifies whether to collect storage metadata. By default, EMR Doctor does not collect storage metadata. On the Health Check tab, you can turn on Collect Information About Storage Resources to enable this configuration item. |
collect.storage.intermediate.path | /mnt/disk1/log/doctor/derby/ | The path that stores the intermediate data generated when storage metadata is collected. Intermediate data is generated when storage metadata is collected. The volume of intermediate data is proportional to the size of the FSImage file. |
collect.storage.max.depth | 6 | An advanced configuration item. The maximum levels of directories that are traversed when storage metadata is collected, including the directories that start with a forward slash (/). Note We recommend that you do not specify an excessively large value for this parameter. An excessively large value may result in issues such as a long period of analysis time and a large volume of intermediate storage data. |
collect.storage.top.size | 100 | An advanced configuration item. The number of the largest-size directories that can be obtained at each level when storage metadata is collected. Default value: 100. Note We recommend that you do not specify an excessively large value for this parameter. An excessively large value may result in issues such as a long period of analysis time and a large volume of intermediate storage data. |
collect.oss.bucket | No default value | The name of the bucket whose objects you want to analyze. This configuration item is required when you analyze data stored in Object Storage Service (OSS). For more information, see Enable and configure the storage analysis feature. |
collect.oss.manifest.dir | No default value | The directory in which the generated inventory lists are stored. This configuration item is required when you analyze data stored in OSS. For more information, see Enable and configure the storage analysis feature. |
- Go to the Configure tab of a service. Note EMR Doctor is not displayed in the EMR console. To access EMR Doctor, you must modify the link for the Configure tab of the current service. In this topic, the HDFS service is used.
- In the EMR console, find the desired cluster and click Services in the Actions column.
- On the Services tab, click Configure in the HDFS section.
- Replace the service name in the link in the address bar of the browser with TAIHAODOCTOR and press Enter. The Configure tab of the TAIHAODOCTOR service appears.
- On the Configure tab of the TAIHAODOCTOR service, you can modify the preceding configuration items based on your business requirements, save the configurations, and make the configurations take effect.
For more information about how to modify configuration items, see Modify configuration items.
Configure scheduler collection
You can view or modify the following configuration items on the Configure tab of the TAIHAODOCTOR service in the EMR console. For more information about how to modify the following configuration items, see Configure storage metadata collection.
Configuration item | Default value | Description |
---|---|---|
collect.job.interval | 120 | The interval at which the status of the YARN-based scheduling tasks is collected. Default value: 120. Unit: seconds. |
collect.jobs.intermediate.path | /mnt/disk1/log/doctor/jobs/ | The path that stores the intermediate data generated when the status of YARN-based scheduling tasks is collected. |
Configure general information
You can view or modify the following configuration items on the Configure tab of the TAIHAODOCTOR service in the EMR console. For more information about how to modify the following configuration items, see Configure storage metadata collection.
Configuration item | Default value | Description |
---|---|---|
collect.metrics.interval | 15 | The interval at which the information about the metrics of an engine task is collected. Default value: 15. Unit: seconds. Note To prevent the stability of tasks from being affected, we recommend that you do not specify an excessively large or small value for this parameter. An excessively large value may result in inauthentic suggestions for tasks. An excessively small value may result in excessive collection pressure. |
collect.rate.limit | 5000 | The maximum number of records collected per second for each process. Excess data records are discarded to prevent process stability from being affected. |
Configure MapReduce task collection
You can view or modify the following configuration items on the Configure tab of the YARN service in the EMR console.
For more information about how to modify configuration items, see Modify configuration items.
Configuration item | Default value | Description |
---|---|---|
yarn.app.mapreduce.am.command-opts |
| Collects the AppMaster metadata for a MapReduce task.
|
mapreduce.map.java.opts |
| Collects the metadata of a map task.
|
mapreduce.reduce.java.opts |
| Collects the metadata of a reduce task.
|
Configure Tez task collection
You can view or modify the following configuration items on the Configure tab of the Tez service in the EMR console.
For more information about how to modify configuration items, see Modify configuration items.
Configuration item | Default value | Description |
---|---|---|
tez.am.launch.cmd-opts |
| Collects the AppMaster metadata for a Tez task. |
tez.task.launch.cmd-opts |
| Collects the metadata of a Tez task. |
Configure Spark task collection
You can view or modify the following configuration items on the Configure tab of the Spark service in the EMR console.
For more information about how to modify configuration items, see Modify configuration items.
Configuration item | Default value | Description |
---|---|---|
spark.driver.extraJavaOptions |
| Collects the metadata of a Spark driver.
|
spark.executor.extraJavaOptions |
| Collects the metadata of a Spark executor.
|
Configure YARN Timeline Server
You can view or modify the following configuration items on the Configure tab of the YARN service in the EMR console.
For more information about how to modify configuration items, see Modify configuration items.
Configuration item | Default value | Description |
---|---|---|
YARN_TIMELINESERVER_OPTS |
| The end flag of the collection task.
|