All Products
Search
Document Center

E-MapReduce:View node health status

Last Updated:Mar 26, 2026

Node health status shows whether a node is running as expected. Each node's status is derived from the results of multiple health check items. Nodes in the Warning or Abnormal state have specific check items you can inspect to diagnose and resolve issues.

Prerequisites

Before you begin, ensure that you have:

Limitations

This feature applies only to DataLake, Dataflow, online analytical processing (OLAP), DataServing, and custom clusters.

View the health status of nodes

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

  2. In the top navigation bar, select the region where your cluster resides and select a resource group.

  3. On the EMR on ECS page, find your cluster and click Nodes in the Actions column.

  4. On the Nodes tab, check the Health Status column for each node group. The column shows colored numbers that indicate how many nodes are in each state:

    Color State Meaning Recommended action
    Green Good The node is running as expected. No action needed.
    Yellow Warning The node is running, but hidden risks are detected. Click View Check Items to review the check items and monitor the node.
    Red Abnormal The node is unavailable due to serious issues. Click View Check Items and troubleshoot immediately.
    Gray Unknown Health check results cannot be retrieved. No action needed if your workloads are unaffected.
    Gray Stateless No health check has run since the node was installed or manually stopped. No action needed.
  5. To see the health status of individual nodes within a group, click the image.png icon to the left of the node group name. Each node's status appears in the Health Status column.

View health check items of a node

When a node is in the Warning or Abnormal state, inspect its check items to identify what triggered the status.

  1. On the Nodes tab, click the image.png icon to the left of the node group name.

  2. Find the node, then click View Check Items to the right of its health status.

  3. In the panel that appears, review the latest check item results and the health check history for that node.

Health check items reference

Each check item's threshold value is represented by u. Items with no threshold perform a binary pass/fail check.

Items that require immediate action when Abnormal:

Name Description Unit
status_alive Whether the node status is normal
host_disk_fault Whether a disk exception exists at the underlying layer
host_system_fault Whether a system exception exists at the underlying layer
host_system_env Availability of important configuration files, Java, and Python
host_service_env Availability of storage directories and package files that cluster services depend on

Items with Warning and Abnormal thresholds:

Name Description Warning Abnormal Unit
host_cpu_usage CPU load 95 ≤ u < 99 u ≥ 99 %
host_mem_usage Memory usage 95 ≤ u < 99 u ≥ 99 %
host_disk_space_usage Disk usage 90 ≤ u < 99 u ≥ 99 %
host_disk_inode_usage Index node (inode) usage of disks 90 ≤ u < 99 u ≥ 99 %
host_disk_io_latency Average disk read/write latency 400 ≤ u < 800 u ≥ 800 ms
host_fd_usage File descriptor usage 95 ≤ u < 99 u ≥ 99 %
host_network_transmit_drop_rate Outbound packet loss rate 1.0 ≤ u < 2.5 u ≥ 2.5 %
host_network_receive_error_rate Inbound packet error rate 0.1 ≤ u < 0.5 u ≥ 0.5 %
host_network_transmit_error_rate Outbound packet error rate 0.1 ≤ u < 0.5 u ≥ 0.5 %
host_network_receive_error_rate Inbound packet loss rate 1.0 ≤ u < 2.5 u ≥ 2.5 %