This topic provides an overview of the monitoring for the HDFS service.

Prerequisites

A Hadoop cluster is created.

Enter the HDFS Monitoring page

  1. Log on to the Alibaba Cloud E-MapReduce console.
  2. Click the Monitor tab.
  3. In the left-side navigation pane, click Cluster Monitoring.
  4. On the Cluster Status page, click Details in the Action column that corresponds to the Hadoop cluster.
  5. In the left-side navigation pane, choose Service Monitoring > HDFS to enter the HDFS Monitoring page.

HDFS Monitoring page

The HDFS Monitoring page displays basic metric charts, recent alerts and exceptions, overview information, and information of processes such as NameNode and DataNode for the HDFS service.

HDFS overview
  • Basic metric charts. This section includes Alerts (Today), HDFS Capacity, Blocks, Total Files, Average Block Checksum Time, Average Block Reporting Time, NameNode Active/Standby Status, and Whether Security Mode Triggered by default.
  • Alerts and Details. This section displays critical exception events related to the HDFS service on the current day.
  • Overview section.Overview
  • NameNode. This section displays the status of the NameNode process.NameNode status
    Parameter Description
    Instance Name The name of the NameNode process. Click the name to view monitoring details.
    Status The NameNode process in an HA cluster can be in the Active or Standby state. In a non-HA cluster, the NameNode process is always in the Active state.
    Whether Security Mode Triggered Valid values: Yes and No.
    Port Status The port status of the NameNode process. Green indicates that the port is normal. Red indicates that the port is abnormal.
    Process CPU Usage The CPU utilization of the NameNode process.
    Memory The memory usage of the NameNode process. The memory usage items include Heap Committed, Heap Init, Heap Max, Heap Used, NonHeap Committed, NonHeap Init, and NonHeap Used.
    JVM Garbage Collection Statistics The garbage collection statistics of the NameNode java process displayed in jstat -gcutil format.
    • O: the capacity usage of the old generation space (%)
    • E: the capacity usage of the Eden space (%)
    • M: the capacity usage of the metaspace (%)
    • CCS: the capacity usage of the Compressed Class Space (%)
    • YGCT: the time consumed by garbage collection in the young generation space
    • FGCT: the time consumed by garbage collection in the old generation space
    • GCT: the total time consumed by garbage collection
    • YGC: the number of times garbage collection is performed in the young generation space
    • FGC: the number of times garbage collection is performed in the old generation space
  • DataNode. This section displays the status of all DataNode processes.DataNode section
    Parameter Description
    Node The name of a DataNode process.
    From Last Heartbeat to Current The duration from the time when the last heartbeat occurred to the current time. Unit: seconds.
    Status The status of a DataNode process. The value can be Running, Decommissioning, Decommissioned, Switching to Maintenance, or Maintaining.
    HDFS Storage Capacity The HDFS capacity configured for a DataNode process.
    HDFS Storage Space In Use The HDFS capacity that has been occupied on a DataNode process.
    Non-HDFS Storage Space Usage The non-HDFS capacity that has been occupied on a DataNode process.
    Remaining HDFS Storage Space The remaining HDFS capacity of a DataNode process.
    Blocks The number of blocks on a DataNode process.
    Block Pool Usage The usage of the block pool on a DataNode process.
    Corrupt Volumes The number of volumes failed on a DataNode process.
    Version The HDFS version deployed.

Monitoring details page for NameNode

On the HDFS Monitoring page, you can click the name in the NameNode section to access the monitoring details page for the NameNode process.

  • NameNode Process JVM Indicators. This section displays JVM garbage collection statistics of different memory partitions for the NameNode process.JVM metrics
    • NameNode Process Memory Usage (You can select a time range and time granularity for the chart you want to view.)
      Parameter Description
      S0 The capacity usage of survivor space 1 (%).
      S1 The capacity usage of survivor space 2 (%).
      E The capacity usage of the Eden space (%).
      O The capacity usage of the old generation space (%).
      M The capacity usage of the metaspace (%).
      CCS The capacity usage of the Compressed Class Space (%).
    • NameNode Process Garbage Collection Time
      Parameter Description
      Young Garbage Collection Time The time consumed by garbage collection in the young generation space.
      Full Garbage Collection Time The time consumed by garbage collection in the old generation space.
      Total Garbage Collection Time The total time consumed by garbage collection.
    • NameNode Process Garbage Collections
      Parameter Description
      Young Garbage Collections The number of times garbage collection is performed in the young generation space.
      Full Garbage Collections The number of times garbage collection is performed in the old generation space.
    • Heap Memory, including the following items: Maximum Heap Memory, Initialized Heap Memory, Committed Heap Memory, and Heap Memory In Use
    • Non-Heap Memory, including the following items: Initialized Non-Heap Memory, Committed Non-Heap Memory, and Non-Heap Memory In Use
  • NameNode Process File Descriptor. This section displays the maximum number of file descriptors that the NameNode process can use and the number of file descriptors that are in use.
  • NameNode PRC Indicators:
    • Queue Length: the length of the RPC call queue on the NameNode RPC port. You can use the length to determine whether RPC requests are stacked.
    • Bytes Received: the total data volume received on the NameNode RPC port.
    • Bytes Sent: the total data volume sent on the NameNode RPC port.
    • Open Connections: the number of connections established on the NameNode RPC port.
    • Average RPC Request Queue Time: the average queue time of RPC requests.
    • Average RPC Request Processing Time: the average time for processing RPC requests.
  • NameNode Process History.NameNode Process History section
    Parameter Description
    Date The time when an operation is performed.
    Start/Restart/Stop The operation type, which can be start, stop, or restart.
    Auto Resume Whether an operation is automatically resumed by the keepalive mechanism of EMR. The EMR agent automatically resumes components that exit abnormally to ensure service availability.
    Started By The Linux user who performs an operation. This parameter is left empty for a process in the Stop state.
    PID The ID of a process generated in an operation. This parameter is left empty for a process in the Stop state.
    PPID The ID of a parent process generated in an operation. This parameter is left empty for a process in the Stop state.
    Startup Parameters The detailed start parameters for a process generated in an operation. This parameter is left empty for a process in the Stop state.

    This section lists all the records of starting or stopping processes in the EMR console and the processes automatically resumed by the EMR agent after an abnormal exit.

Monitoring details page for DataNode

On the HDFS Monitoring page, you can click a name in the DataNode section to enter the monitoring details page for the required DataNode process.

  • DataNode Process JVM Indicators. This section includes DataNode Process Memory Usage, DataNode Process Garbage Collection Time, DataNode Process Garbage Collections, Heap Memory, Non-Heap Memory, Bytes Read/Written, Block Operation Count, Operation Average Time (1), and Operation Average Time (2).
  • DataNode Process History.
    Parameter Description
    Date The time when an operation is performed.
    Start/Restart/Stop The operation type, which can be start, stop, or restart.
    Auto Resume Whether an operation is automatically resumed by the keepalive mechanism of EMR. The EMR agent automatically resumes components that exit abnormally to ensure service availability.
    Started By The Linux user who performs an operation. This parameter is left empty for a process in the Stop state.
    PID The ID of a process generated in an operation. This parameter is left empty for a process in the Stop state.
    PPID The ID of a parent process generated in an operation. This parameter is left empty for a process in the Stop state.
    Startup Parameters The detailed start parameters for a process generated in an operation. This parameter is left empty for a process in the Stop state.