Use CloudMonitor, eadm, iproute2, and the diagnose tool to monitor eRDMA traffic, locate faults, and evaluate network performance.
Prerequisites
eRDMA is installed and deployed on the target ECS instance. See Enable eRDMA for an ECS instance.
Monitor eRDMA with CloudMonitor
CloudMonitor tracks eRDMA status and supports custom alerts. See Custom monitoring.
View the monitoring metrics supported by eRDMA
Log on to the CloudMonitor console.
In the metric list search box, enter eri to view the monitoring metrics for eRDMA.

Diagnose eRDMA with the eadm tool
eadm is a user-space tool deployed with the eRDMA driver. It provides real-time monitoring and diagnostics to help locate faults. Key features include:
Traffic monitoring and assisted diagnostics: Real-time traffic statistics for the entire device.
Query and set configurations: Configure the delay ACK feature and the CC algorithm.
The following are common eadm commands. For other commands, run eadm -h.
This tool is for diagnostics and debugging only. It may change in the future, and its availability is not guaranteed in all scenarios.
View help for the
eadmcommand.eadm -h
Monitor real-time traffic of an eRDMA device
Requires driver version
0.2.34or later.eadm stat -d <ibdev_name> -l<ibdev_name>: the eRDMA device name. Runibv_devinfoto find it. If only one eRDMA device exists, you can omit-d <ibdev_name>.
Get eRDMA device statistics, such as CM and verbs message counts and traffic counts.
eadm stat -d <ibdev_name><ibdev_name>: the eRDMA device name. Runibv_devinfoto find it. If only one eRDMA device exists, you can omit-d <ibdev_name>.
Get the current eRDMA driver version.
eadm ver
Other commands such as info, dump, and conf have usage constraints. Do not run them unless necessary.
Monitor and diagnose eRDMA with iproute2
iproute2 is a Linux networking toolkit that provides utilities such as ip and ss for managing network interfaces, routing tables, and traffic control. Use its built-in rdma command to monitor and diagnose the RDMA subsystem.
Most Linux distributions, such as Alibaba Cloud Linux 3 and Ubuntu 20.04 or later, include iproute2 by default. See your operating system documentation for details.
Query eRDMA device status.
rdma link
Query eRDMA resource usage, such as the number of CQs, QPs, and MRs.
NoteIn RDMA network communication, Queue Pair (QP), Completion Queue (CQ), Memory Region (MR), and verbs Opcode are the core components. They play important roles in RDMA communication and ensure high efficiency and low latency of RDMA network communication.
rdma res
Query eRDMA performance statistics, such as connection counts, connection status, and packet counts.
rdma -p stat
Diagnose and evaluate eRDMA with the diagnose tool
The diagnose tool supports basic function checks, HPC environment checks, and latency checks for eRDMA.
Possible results of a diagnose check
PASS: The check passed.SKIP: The current system version does not support this check.FAIL: The check tool is not installed or the check failed. The failed command is listed infail info.Other INFO messages: eRDMA configuration details, such as installation mode, driver version, and CC algorithm.
Install diagnose
On an eRDMA-configured instance, download the diagnose tool:
Download from an internal URL
wget http://mirrors.cloud.aliyuncs.com/erdma/tools/diagnose.pyDownload from a public URL
wget https://mirrors.aliyun.com/erdma/tools/diagnose.py
Diagnose basic eRDMA functions
The basic functional test verifies driver installation, network connectivity, and the eRDMA kernel driver's installation mode.
Procedure:
Log on to the eRDMA-configured instance.
Download the diagnose tool.
Download from an internal URL
wget http://mirrors.cloud.aliyuncs.com/erdma/tools/diagnose.pyDownload from a public URL
wget https://mirrors.aliyun.com/erdma/tools/diagnose.py
Check the basic functions of eRDMA:
python diagnose.py -dThe following is an example output. For diagnostic item descriptions, see eRDMA basic function diagnostic items.

Diagnose the eRDMA HPC environment
HPC applications in an eRDMA environment may require additional dependencies and configurations. The diagnose tool checks for these dependencies.
Procedure:
Log on to the eRDMA-configured instance.
Download the diagnose tool.
Download from an internal URL
wget http://mirrors.cloud.aliyuncs.com/erdma/tools/diagnose.pyDownload from a public URL
wget https://mirrors.aliyun.com/erdma/tools/diagnose.py
Check HPC environment dependencies:
python diagnose.py --hpc-checkExample output. For diagnostic item descriptions, see eRDMA HPC environment dependency checks.

Evaluate eRDMA network performance
Use the perftest feature in the diagnose tool to test network performance between instances.
Prerequisites
Before running the test:
eRDMA is installed and deployed on all test nodes. See Enable eRDMA for an ECS instance.
Passwordless SSH is configured among all test nodes. See Configure passwordless SSH logon.
Python paramiko is installed on all test nodes.
NoteThe diagnose tool uses paramiko for remote connections.
Install paramiko with the following commands. Python 3 is recommended.
Alibaba Cloud Linux/CentOS
# python3 sudo python3 -m pip install --upgrade pip sudo python3 -m pip install paramiko # python2 # If the pip module is not installed for Python 2, install python2-pip. sudo yum -y install python2-pip sudo python2 -m pip install --upgrade pip==20.3.4 sudo python2 -m pip install paramikoUbuntu
# python3 sudo python3 -m pip install --upgrade pip sudo python3 -m pip install paramiko # python2 # If python2-pip is not installed on the current node, install it. sudo apt install software-properties-common sudo add-apt-repository universe sudo apt update sudo apt install python2 sudo curl https://bootstrap.pypa.io/pip/2.7/get-pip.py --output get-pip.py sudo python2 get-pip.py sudo python2 -m pip install --upgrade pip==20.3.4 sudo python2 -m pip install paramiko
Procedure
Log on to the eRDMA-configured instance.
Download the diagnose tool.
Download from an internal URL
wget http://mirrors.cloud.aliyuncs.com/erdma/tools/diagnose.pyDownload from a public URL
wget https://mirrors.aliyun.com/erdma/tools/diagnose.py
Test eRDMA latency:
python diagnose.py --perftest --hosts <n1> <n2> --user <username> --key-file </path/to/private_key>Parameters:
--hosts <n1> <n2>: Test nodes, separated by spaces. Replace<n1> <n2>with the private IP address of the eRDMA-enabled ENI on each node.--user <username>: Username for passwordless SSH.--key-file </path/to/private_key>: Absolute path to the private key file for SSH.
Example output for two-instance latency test. See eRDMA network performance tests.
Each table shows latency for different operations. Rows represent requesters, columns represent responders. Cell values show average latency in microseconds (99.9th percentile in parentheses).

