CloudMonitor is a monitoring and alerting service provided by Alibaba Cloud. You can create threshold-triggered alert rules in the CloudMonitor console to monitor the usage of E-MapReduce (EMR) resources. If the value of a metric exceeds the threshold that is specified in a rule, CloudMonitor automatically sends an alert notification. This way, you can receive the notification and handle the related exceptions at the earliest opportunity.
An EMR cluster is created. For more information, see Create a cluster.
- Log on to the CloudMonitor console.
- In the left-side navigation pane, choose .
- On the Threshold Value Alert tab, click Create Alert Rule.
- On the Create Alert Rule page, set the parameters for an alert rule.
Parameter Description Product Select E-MapReduce from the drop-down list. Resource Range The resources to which the alert rule is applied. Valid values:
- All Resources: The alert rule is applied to all the EMR clusters of the current Alibaba Cloud account.
- Cluster: The alert rule is applied only to a specific cluster.
Region All the Alibaba Cloud regions supported by EMR are listed. Select the region that you want to associate with the alert rule from the drop-down list.Note This parameter appears if you select Cluster for Resource Range. Cluster All existing clusters of the current Alibaba Cloud account are listed. Select the cluster that you want to associate with the alert rule from the drop-down list.Note This parameter appears if you select Cluster for Resource Range. Alert Rule The name of the alert rule. Rule Description The content of the alert rule. This parameter defines the condition that triggers an alert. For example, if you specify a condition in which the average CPU utilization every 5 minutes is greater than or equal to 90% for three consecutive cycles, CloudMonitor checks the metric every 5 minutes for three consecutive cycles.Note For more information, see Metrics. Mute for The period during which an alert is muted. This parameter specifies the interval at which an alert notification is sent to the specified contacts again if the alert is not cleared. Effective Period The period during which the alert rule is effective. The system monitors the metric and generates an alert only if the alert rule is in effect. Notification Contact The contact groups to which alert notifications are sent.
You can select an existing contact group or create a contact group. For more information about how to create a contact group, see Create an alert contact or alert group.
Valid value: Email + DingTalk (Info).
Auto Scaling If you select Auto Scaling, a specific scaling rule is triggered when an alert is generated. You must set the Region, ESS Group, and ESS Rule parameters. Log Service If you select Log Service, the alert information is written to Log Service when an alert is generated. You must set the Region, Project, and Logstore parameters.
For more information about how to create a project and a Logstore, see Quick start.
Email Remark The additional information that you want to include in the alert notification email. HTTP CallBack The URL that can be accessed from the Internet. CloudMonitor uses a POST request to send alert information to this URL. Only HTTP requests are supported.
- Click Confirm.
|Service||APM metric name||Description|
|HDFS||NameNodeIpcPortOpen||The availability of the IPC port of the NameNode.
|TotalDFSUsedPercent||The total HDFS capacity usage of a cluster.|
|DataNodeDfsUsedPercent||The HDFS capacity usage of a DataNode.|
|DataNodeIpcPortOpen||The availability of the IPC port of a DataNode.
|JournalNodeRpcPortOpen||The availability of the RPC port of a JournalNode.
|ZKFCPortOpen||The availability of the ZKFailoverController (ZKFC) port.
|dfs.FSNamesystem.MissingBlocks||The number of missing blocks.|
|dfs.datanode.VolumeFailures||The number of damaged disks detected by HDFS.|
|YARN||ResourceManagerPortOpen||The availability of the service port of the ResourceManager.
|JobHistoryPortOpen||The availability of the service port of Job History:
|yarn.ClusterMetrics.NumUnhealthyNM||The number of unhealthy NodeManagers.|
|ProxyServerPortOpen||The availability of the WebAppProxy port.
|TimelineServerPortOpen||The availability of the service port of Timeline Server.
|Hive||MetastorePortOpen||The availability of the Hive Metastore port.
|HiveServer2PortOpen||The availability of the service port of HiveServer2.
|ThriftServerPortOpen||The availability of the service port of Thrift Server.
|HBase||HMasterIpcPortOpen||The availability of the IPC port of HMaster.
|HRegionServerIpcPortOpen||The availability of the IPC port of HRegionServer.
|ZooKeeper||ZKClientPortOpen||The availability of the listening port of the ZooKeeper client.
|Hue||HuePortOpen||The availability of the Hue port.
|Storm||StormNimbusThriftPortOpen||The availability of the Thrift port of Storm Nimbus:
|Host||proc_total||The total number of processes.|
|part_max_used||The maximum usage of a disk partition.|
|disk_free_percent_mnt_disk1||The percentage of disk space occupied by the /mnt/disk1 directory.|
|disk_free_percent_rootfs||The percentage of disk space occupied by the root file system.|