All Products
Search
Document Center

E-MapReduce:Activate EMR Doctor (Hadoop clusters)

Last Updated:Mar 29, 2024

By default, E-MapReduce (EMR) Doctor is provided for DataLake clusters, DataServing clusters, and custom clusters. If you want to use EMR Doctor in EMR Hadoop clusters of minor versions earlier than V3.41.0, minor versions earlier than V5.6.0, or V4.X, you must submit a request to activate EMR Doctor. This topic describes how to activate EMR Doctor.

Overview

EMR Doctor is an open source big data cluster manager. It provides an end-to-end intelligent diagnostic and optimization service. EMR Doctor helps you perform O&M operations on big data clusters and services of the clusters in an efficient manner and consistently optimize resource utilization of the clusters. This ensures that the clusters are healthy and stable and are able to provide a better computing service for upper-layer business.

EMR Doctor can be used to perform the following operations:

  • Generate daily reports about cluster health: EMR Doctor is used to analyze storage and compute engines, perform overall health checks on clusters, and display the health status of the clusters on a daily basis by using reports.

  • Generate real-time cluster reports: EMR Doctor is used to scan computing tasks and services of clusters in real time to identify issues and report the issues to O&M personnel for troubleshooting.

Note

EMR Doctor is released in EMR V3.41.0 and later minor versions, and also in EMR V5.6.0 and later minor versions. If you want to use this service for EMR clusters of other versions, perform the operations that are described in Activation procedure.

Activation impacts

EMR Doctor does not affect existing tasks in your clusters. EMR Doctor collects only required data, such as metrics and events of the tasks. Then, EMR Doctor analyzes the metrics and events, and scores the tasks based on analysis results in the backend.

You are not aware of the installation of EMR Doctor. The installation does not affect tasks that are running and tasks that are to be run.

During the installation of EMR Doctor, EMR delivers service configurations that are saved but are not delivered to clusters. Before you install EMR Doctor, we recommend that you check whether some service configurations are not delivered and evaluate the impacts of delivering the service configurations to clusters. The service whose configurations need to be delivered may be Hive, Spark, YARN, or Tez.

After EMR Doctor is installed, EMR automatically configures the following parameters.

Service name

Configuration file

Appended content

Hive

hive-env.sh

Environment variables

YARN

mapred-site.xml

  • Configuration item name: yarn.app.mapreduce.am.command-opts, mapreduce.map.java.opts, and mapreduce.reduce.java.opts

  • Value: -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr

yanr-env.sh

Environment variables

Spark

spark-defaults.conf

  • Configuration item name: spark.driver.extraJavaOptions, spark.executor.extraJavaOptions, and spark.yarn.am.extraJavaOptions

  • Value: -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=spark

Tez

tez-site.xml

  • Configuration item name: tez.task.launch.cmd-opts and tez.am.launch.cmd-opts

  • Value: -noverify -javaagent:/usr/lib/taihaodoctor-current/emr-agent/btrace-agent.jar=libs=mr,config=tez

After EMR Doctor is installed, the Alibaba Cloud EMR Doctor team notifies you that the installation is complete. You must restart the App Timeline Server, HiveServer2, and Spark Thrift Server services. For information about some frequently asked questions about EMR Doctor, see FAQ about cluster management.

Important

You can restart the preceding services during off-peak hours or a maintenance window. You can still use EMR Doctor even if you do not restart the services. However, some data fails to be collected for some jobs, such as Hive on MapReduce jobs.

Activation procedure

The Alibaba Cloud EMR Doctor team provides comprehensive support to install EMR Doctor. This ensures the stability of your clusters and existing tasks in the clusters during the installation process.

  1. Join the DingTalk group whose ID is 44846846. The Alibaba Cloud EMR Doctor team dispatches engineers to contact you.

  2. The dispatched engineers check the status of your cluster and arrange time to activate EMR Doctor for your cluster.

  3. The EMR Doctor team installs EMR Doctor for your cluster at the specified time.

  4. After EMR Doctor is installed, you can log on to the EMR console to view the reports that are generated based on EMR Doctor analysis.

Contact us

The Alibaba Cloud EMR Doctor team provides comprehensive support to install EMR Doctor. If you want to use EMR Doctor, join the DingTalk group whose ID is 44846846. The Alibaba Cloud EMR Doctor team dispatches engineers to contact you.