This topic provides an overview of the monitoring for the YARN service.

Prerequisites

A Hadoop or Flink cluster is created.

Enter the YARN Monitoring page

  1. Log on to the Alibaba Cloud E-MapReduce console.
  2. Click the Monitor tab.
  3. In the left-side navigation pane, click Cluster Monitoring.
  4. On the Cluster Status page, click Details in the Action column that corresponds to the Hadoop cluster.
  5. In the left-side navigation pane, choose Service Monitoring > YARN to enter the YARN Monitoring page.

YARN Monitoring page

The YARN Monitoring page displays basic metric charts, recent alerts and exceptions, queue resource usage, and information of processes such as ResourceManager, NodeManager, and JobHistory. All the process information and queue resource usage support playback.

YARN_monitor_browse
  • Basic metric charts. This section includes Alerts (Today), VCores, Memory, NodeManager Statistics, Pending Resources, Jobs, Containers, and ResourceManager Active/Standby Status by default.
  • Alerts and Details. This section displays critical exception events related to the YARN service on the current day.
  • ResourceManager. This section displays the latest status of the ResourceManager process.
    Parameter Description
    Instance Name The name of the ResourceManager process. Click the name to view monitoring details.
    Status The ResourceManager process in an HA cluster can be in the Active or Standby state. In a non-HA cluster, the ResourceManager process is always in the Active state.
    Port Status The port status of the ResourceManager process. Green indicates that the port is normal. Red indicates that the port is abnormal.
    Process CPU Usage The CPU utilization of the ResourceManager process.
    Memory The memory usage of the ResourceManager process. The memory usage items include Heap Committed, Heap Init, Heap Max, Heap Used, NonHeap Committed, NonHeap Init, and NonHeap Used.
    JVM Garbage Collection Statistics The garbage collection statistics of the ResourceManager java process displayed in jstat -gcutil format.
    • S0: the capacity usage of survivor space 1 (%)
    • S1: the capacity usage of survivor space 2 (%)
    • O: the capacity usage of the old generation space (%)
    • E: the capacity usage of the Eden space (%)
    • M: the capacity usage of the metaspace (%)
    • CCS: the capacity usage of the Compressed Class Space (%)
    • YGCT: the time consumed by garbage collection in the young generation space
    • FGCT: the time consumed by garbage collection in the old generation space
    • GCT: the total time consumed by garbage collection
    • YGC: the number of times garbage collection is performed in the young generation space
    • FGC: the number of times garbage collection is performed in the old generation space
    RPC Request Queue Length The length of the RPC call queue on the ResourceManager RPC port. You can use the length to determine whether RPC requests are stacked.
    RPC Request Processing Time The time for processing RPC requests.
    RPC Request Queue Time The queuing time of RPC requests.
  • NodeManager. This section displays the latest status of all NodeManager processes.
    Parameter Description
    Instance Name The name of a NodeManager process. Click the name to view monitoring details.
    Status The status of a NodeManager process. The value can be Lost, Running, or Unhealthy.
    Rack The rack on which you deploy a NodeManager process.
    Node Address The IP address of a NodeManager process.
    Node HTTP Address The HTTP address of a NodeManager process.
    Last Health Status Update The last heartbeat time.
    Health Report The health report. If a NodeManager process is abnormal, the details about the exceptions are recorded in this report.
    Containers The number of containers on a NodeManager process.
    Memory In Use The memory usage of a NodeManager process.
    Memory Available The memory space available on a NodeManager process.
    vCores In Use The number of vCores used on a NodeManager process.
    vCores Available The number of vCores available on a NodeManager process.
  • JobHistory. This section displays the latest status of all JobHistory processes.JobHistory section
    Parameter Description
    Instance Name The name of a JobHistory process. Click the name to view monitoring details.
    Port Status The port status of a JobHistory process. Green indicates that the port is normal. Red indicates that the port is abnormal.
    Process CPU Usage The CPU utilization of a JobHistory process.
    Heap Memory The heap memory usage of a JobHistory process. The heap memory usage items include Heap Used, Heap Committed, Heap Max, and Heap Init.
    Non-Heap Memory The non-heap memory usage of a JobHistory process. The non-heap memory usage items include NonHeap Used, NonHeap Committed, and NonHeap Init.
    JVM Garbage Collection Statistics
    • S0: the capacity usage of survivor space 1 (%)
    • S1: the capacity usage of survivor space 2 (%)
    • O: the capacity usage of the old generation space (%)
    • E: the capacity usage of the Eden space (%)
    • M: the capacity usage of the metaspace (%)
    • CCS: the capacity usage of the Compressed Class Space (%)
    • YGCT: the time consumed by garbage collection in the young generation space
    • FGCT: the time consumed by garbage collection in the old generation space
    • GCT: the total time consumed by garbage collection
    • YGC: the number of times garbage collection is performed in the young generation space
    • FGC: the number of times garbage collection is performed in the old generation space
  • Queue. This section displays the detailed resource usage of all queues in the YARN scheduler. Click a queue to view detailed resource usage.Scheduler Queue_status

Monitoring details page for ResourceManager

On the YARN Monitoring page, you can click the name in the ResourceManager section to enter the monitoring details page for the ResourceManager process.

  • ResourceManager Process JVM Indicators. This section includes ResourceManager Process Memory Usage, ResourceManager Process Garbage Collection Time, ResourceManager Process Garbage Collections, Heap Memory, and Non-Heap Memory.
  • ResourceManager Process File Descriptor. This section displays the maximum number of file descriptors that the ResourceManager process can use and the number of file descriptors that are in use.
  • ResourceManager PRC Indicators. This section includes Queue Length, Bytes Received, Bytes Sent, Open Connections, Average RPC Request Queue Time, and Average RPC Request Processing Time.
  • ResourceManager Process History.
    Parameter Description
    Date The time when an operation is performed.
    Start/Restart/Stop The operation type, which can be start, stop, or restart.
    Auto Resume Whether an operation is automatically resumed by the keepalive mechanism of EMR. The EMR agent automatically resumes components that exit abnormally to ensure service availability.
    Started By The Linux user who performs an operation. This parameter is left empty for a process in the Stop state.
    PID The ID of a process generated in an operation. This parameter is left empty for a process in the Stop state.
    PPID The ID of a parent process generated in an operation. This parameter is left empty for a process in the Stop state.
    Startup Parameters The detailed start parameters for a process generated in an operation. This parameter is left empty for a process in the Stop state.

Monitoring details page for NodeManager

On the YARN Monitoring page, you can click a name in the NodeManager section to enter the monitoring details page for the required NodeManager process.

  • NodeManager Process JVM Indicators. This section includes NodeManager Process Memory Usage, NodeManager Process Garbage Collection Time, NodeManager Process Garbage Collections, Heap Memory, Non-Heap Memory, and NodeManager Container.
  • NodeManager Process History.
    Parameter Description
    Date The time when an operation is performed.
    Start/Restart/Stop The operation type, which can be start, stop, or restart.
    Auto Resume Whether an operation is automatically resumed by the keepalive mechanism of EMR. The EMR agent automatically resumes components that exit abnormally to ensure service availability.
    Started By The Linux user who performs an operation. This parameter is left empty for a process in the Stop state.
    PID The ID of a process generated in an operation. This parameter is left empty for a process in the Stop state.
    PPID The ID of a parent process generated in an operation. This parameter is left empty for a process in the Stop state.
    Startup Parameters The detailed start parameters for a process generated in an operation. This parameter is left empty for a process in the Stop state.

Monitoring details page for JobHistory

On the YARN Monitoring page, you can click the name in the JobHistory section to enter the monitoring details page for the JobHistory process.

  • JobHistory Process JVM Indicators. This section includes JobHistory Process Memory Usage, JobHistory Process Garbage Collection Time, JobHistory Process Garbage Collections, Heap Memory, and Non-Heap Memory.
  • JobHistory Process File Descriptor. This section displays the maximum number of file descriptors that the JobHistory process can use and the number of file descriptors that are in use.
  • JobHistory Process History.
    Parameter Description
    Date The time when an operation is performed.
    Start/Restart/Stop The operation type, which can be start, stop, or restart.
    Auto Resume Whether an operation is automatically resumed by the keepalive mechanism of EMR. The EMR agent automatically resumes components that exit abnormally to ensure service availability.
    Started By The Linux user who performs an operation. This parameter is left empty for a process in the Stop state.
    PID The ID of a process generated in an operation. This parameter is left empty for a process in the Stop state.
    PPID The ID of a parent process generated in an operation. This parameter is left empty for a process in the Stop state.
    Startup Parameters The detailed start parameters for a process generated in an operation. This parameter is left empty for a process in the Stop state.