You can check whether a node is run as expected based on the health status of the node. The health status is formed based on the check results of multiple health check items. This topic describes how to view the health status of a node and related health check items.
Prerequisites
An E-MapReduce (EMR) cluster is created. For more information, see Create a cluster.
Limits
This topic is applicable only to DataLake, Dataflow, online analytical processing (OLAP), DataServing, and custom clusters.
View the latest health status of nodes
Go to the Nodes tab.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
On the EMR on ECS page, find the desired cluster and click Nodes in the Actions column.
On the Nodes tab, view the health status of nodes in each node group.
Green number in the Health Status column: indicates the number of nodes in the Good state in the current node group.
Yellow number in the Health Status column: indicates the number of nodes in the Warning state in the current node group.
Red number in the Health Status column: indicates the number of nodes in the Abnormal state in the current node group.
Gray number in the Health Status column: indicates the number of nodes in the Unknown state and nodes in the Stateless state in the current node group.
On the Nodes tab, click the icon on the left of the name of a node group. In the node list that appears, you can view the health status of each node in the Health Status column.
A node may be in the following states: Good, Warning, Abnormal, Unknown, and Stateless. Different states are indicated by different icons.
Icon
Health status
Description
Good
The node is run as expected.
Warning
The node is run as expected, but hidden risks are detected based on the health check items of the node. You need to focus on the hidden risks.
Abnormal
The node is unavailable. Serious issues are detected based on the health check items of the node. You must troubleshoot the issues at the earliest opportunity.
Stateless
No health check is performed on the node after an installation process or a manual stop. You do not need to focus on nodes that are in this state.
Unknown
The results of health check items of the node cannot be obtained. If no issue occurs in the business, you do not need to focus on nodes that are in this state.
View health check items of a node
On the Nodes tab, find the desired node group and click the icon on the left of the name of the node group.
Find the desired node and click View Check Items to the right of the health status in the Health Status column.
In the panel that appears, view the latest results of health check items and the health check history of the current node.
The following table describes the health check items. The value of each check item is indicated by u.
Name
Description
Threshold
Unit
host_memory_utilization_check
Checks the average memory usage in the past 3 minutes.
Good: 0 ≤ u < 85
Warning: 85 ≤ u < 95
Abnormal: 95 ≤ u < 100
Percentage
host_cpu_utilization_check
Checks the average CPU utilization in the past 3 minutes.
Good: 0 ≤ u < 85
Warning: 85 ≤ u < 95
Abnormal: 95 ≤ u < 100
Percentage
host_cpu_load5_check
Checks the average CPU load in the past 5 minutes.
Good: u < Number of vCPU cores × 1.5
Warning: u ≥ Number of vCPU cores × 1.5
-
host_network_transmission_check
Checks the packet loss rate or error package rate during network transmission in the past 3 minutes.
Good: u < 1
Abnormal: u ≥ 1
Percentage
host_disk_space_check
Checks the disk usage.
Good: 0 ≤ u < 90
Warning: 90 ≤ u < 95
Abnormal: 95 ≤ u < 100
Percentage
host_system_environment_check
Checks important configuration items of the system environment. For example, the /etc/hostname and /etc/resolve.conf files, Java version, and Python version are checked.
No threshold is specified. The current node is considered abnormal if one of the configuration items is detected as abnormal.
-
host_application_environment_check
Checks the configuration items for the execution environment of each application that is installed on the current node, such as the installation package version, symbolic link, and log directory.
No threshold is specified. The current node is considered abnormal if one of the configuration items is detected as abnormal.
-
host_user_permission_check
Check the permissions of important users, such as the hadoop user and hdfs user.
No threshold is specified. The current node is considered abnormal if the permissions of one of the important users are detected as abnormal.
-
host_fault_compensation_check
Checks whether fault compensation occurs.
No threshold is specified. The current node is considered abnormal if fault compensation occurs.
-