CloudMonitor is a service that monitors Internet applications and Alibaba Cloud resources. CloudMonitor collects the data of metrics for Alibaba Cloud resources, detects network availability, and allows you to set alerts for specific metrics. This topic describes how to use CloudMonitor to monitor the status of core services in an EMR cluster and send alert notifications. Notification methods include phone calls, text messages, emails, and DingTalk chatbot.

View monitoring data

  1. Log on to the CloudMonitor console.
  2. In the left-side navigation pane, click Cloud products.
  3. On the Cloud products page, click E-MapReduce.
  4. On the Clusters tab of the E-MapReduce page, find your cluster and click the cluster ID or Monitoring Charts in the Actions column.
  5. Click a time range above the monitoring charts or click the Custom icon to customize a time range.
    You can view the monitoring data of various metrics.

Create an alert rule

  1. Log on to the CloudMonitor console.
  2. In the left-side navigation pane, click Cloud products.
  3. On the Cloud products page, click E-MapReduce.
  4. On the E-MapReduce page, click Create Alert Rule in the upper-right corner.
    1. In the Related Resource section, specify Resource Range.
      • If you select All Resources, the alert rule applies to all clusters of your account. In this case, CloudMonitor sends an alert notification no matter which of the clusters meets the alert rule.
      • If you select Cluster, you must specify a cluster. In this case, CloudMonitor sends an alert notification only if the specified cluster meets the alert rule.
    2. In the Set Alarm Rules section, configure an alert rule.

      Configure the parameters. For the metrics of core services, see Metrics of core services.

      For example, to trigger an alert when the HTTP port of DataNodes is unreachable for longer than 5 minutes, select the DataNodeHttpPortOpen metric and specify the other parameters, as shown in the following figure.

      role
      Note If you select All from the role drop-down list, the alert rule applies to all nodes in the cluster, including nodes that are added to the cluster in the future.
    3. In the Notification Method section, configure Notification Contact and Notification Methods.
      Notification Contact must be set to one or more contact groups. You can click Quickly create a contact group to create a contact group.

      You can set Notification Methods to a combination of two or more of the following methods: phone calls, text messages, emails, and DingTalk chatbot. To use the phone call method, you must purchase a phone alert resource plan.

    4. Click Confirm.

Metrics of core services

Service Metric Description
HDFS NameNodeHttpPortOpen The status of HTTP port 50070 of NameNode.
DataNodeHttpPortOpen The status of HTTP port 50075 of DataNodes.
DataNodeIpcPortOpen The status of IPC port 50020 of DataNodes.
TotalDFSUsedPercent The total DFS capacity usage of the cluster.
MaxDFSUsedPercent The maximum DFS capacity usage of all DataNodes.
DataNodeDfsUsedPercent The DFS capacity usage of a single DataNode.
NumDeadDataNode The number of dead DataNodes.
Note This metric is obtained by using Java Management Extensions (JMX) of NameNode. If the heartbeat processes of DataNodes or the standby NameNode are stopped, the monitoring on this metric is suspended.
YARN ResourceManagerWebappPortOpen The status of web port 8088 of ResourceManager.
NodeManagerHttpPortOpen The status of HTTP port 8042 of NodeManager.
NumUnhealthyNMs The number of unhealthy NodeManagers.
Hive HiveServer2PortOpen The status of port 10000 of HiveServer.
MetastorePortOpen The status of port 9083 of Hive MetaStore.
ZooKeeper ZKClientPortOpen The status of client port 2181 of ZooKeeper.
ZkOutstandingRequests The number of queued requests. If the ZooKeeper server receives more requests than it can process, this metric goes up.
HBase HMasterHttpPortOpen The status of HTTP port 16010 of HMaster.
HMasterIpcPortOpen The status of IPC port 16000 of HMaster.
HRegionServerHttpPortOpen The status of HTTP port 16030 of HRegionServer.
HRegionServerIpcPortOpen The status of IPC port 16020 of HRegionServer.
Note For all the port metrics in this table, 1 indicates that the port is enabled, and 0 indicates that the port is disabled.