CloudMonitor is a service that monitors Internet applications and Alibaba Cloud resources. CloudMonitor collects data of metrics for Alibaba Cloud resources, detects network availability, and allows you to set alerts for specific metrics. This topic describes how to use CloudMonitor to monitor the status of core services in an EMR cluster and send alert notifications. Notification methods include phone calls, text messages, emails, and DingTalk chatbot.
View monitoring data
Create an alert rule
Follow these steps to create an alert rule:
- Log on to the CloudMonitor console.
- In the left-side navigation pane, click Cloud Service Monitoring and then E-MapReduce.
- On the Clusters tab of the E-MapReduce Monitoring List page, find the target cluster and click Alarm Rules in the Actions column.
- Click Create Alarm Rule in the upper-right corner or click here in the message that prompts you to create an alert rule.
Metrics of core services
Service | Metric | Description |
---|---|---|
HDFS | NameNodeHttpPortOpen | The status of HTTP port 50070 of NameNode. |
DataNodeHttpPortOpen | The status of HTTP port 50075 of DataNodes. | |
DataNodeIpcPortOpen | The status of IPC port 50020 of DataNodes. | |
TotalDFSUsedPercent | The total HDFS capacity usage of the cluster. | |
MaxDFSUsedPercent | The maximum HDFS capacity usage of all DataNodes. | |
DataNodeDfsUsedPercent | The HDFS capacity usage of a single DataNode. | |
NumDeadDataNode | The number of dead DataNodes.
Note This metric is obtained by using Java Management Extensions (JMX) of NameNode. If
the heartbeat processes of DataNodes or the standby NameNode are stopped, the monitoring
on this metric is suspended.
|
|
YARN | ResourceManagerWebappPortOpen | The status of web port 8088 of ResourceManager. |
NodeManagerHttpPortOpen | The status of HTTP port 8042 of NodeManager. | |
NumUnhealthyNMs | The number of unhealthy NodeManagers. | |
HIVE | HiveServer2PortOpen | The status of port 10000 of HiveServer. |
MetastorePortOpen | The status of port 9083 of Hive MetaStore. | |
ZooKeeper | ZKClientPortOpen | The status of client port 2181 of ZooKeeper. |
ZkOutstandingRequests | The number of queued requests. If the ZooKeeper server receives more requests than it can process, this metric goes up. | |
HBase | HMasterHttpPortOpen | The status of HTTP port 16010 of HMaster. |
HMasterIpcPortOpen | The status of IPC port 16000 of HMaster. | |
HRegionServerHttpPortOpen | The status of HTTP port 16030 of HRegionServer. | |
HRegionServerIpcPortOpen | The status of IPC port 16020 of HRegionServer. |
Note For all the port metrics in this table, 1 indicates that the port is open, and 0 indicates
that the port is closed.