Enable eRDMA on gpu-accelerated instances - Elastic Compute Service

After you attach an Elastic RDMA Interface (ERI) to a gpu-accelerated instance, you can enable Remote Direct Memory Access (RDMA) to accelerate communication between gpu-accelerated instances in a Virtual Private Cloud (VPC). eRDMA transfers data more efficiently than traditional RDMA, improving communication between gpu-accelerated instances and reducing task processing time. This topic describes how to enable eRDMA on a gpu-accelerated instance.

Limitations

Item	Description
Instance type	The Elastic RDMA Interface (ERI) supports the following instance types: gn8is, ebmgn8is, gn8v, ebmgn8v, ecs.ebmgn9g, ecs.ebmgn9gc, ecs.ebmgn9ge
Image	Use one of the following images: (Recommended) Alibaba Cloud Linux 3 CentOS 8.5/8.4/7.9 Ubuntu 24.04/22.04/20.04/18.04
Number of eRDMA devices	The gn8is and gn8v instance families support only one Elastic RDMA Interface. ebmgn8is, ebmgn8v, ecs.ebmgn9g, ecs.ebmgn9gc, and ecs.ebmgn9ge ECS Bare Metal Instances support two Elastic RDMA Interfaces.
Network limitations	After you enable the Elastic RDMA Interface for an elastic network interface, you cannot assign an IPv6 address to it. Communication between two instances using ERIs cannot pass through intermediate network devices, such as Server Load Balancer (SLB).

Procedure

To use the eRDMA feature, an instance must have the eRDMA software stack installed and an ERI-enabled elastic network interface attached.

Configure eRDMA during instance creation

Go to the Custom Launch page in the ECS console.
Create a GPU-accelerated instance that supports ERI.
When you create the instance, note the following configurations. For other parameters, see Create an instance by using the wizard.
- Instance Type: Select an instance type that supports ERI. For more information, see Limitations. This topic uses ebmgn8is as an example.
- Images: When you select a public image, the Auto-install GPU Driver and Install eRDMA Software Stack options are selected by default. After the instance is created, the system automatically installs the GPU driver, CUDA, cuDNN, and the eRDMA software stack.
  In this example, the operating system is Alibaba Cloud Linux 3.2104 LTS 64-bit, and the GPU driver version is CUDA 12.4.1 / Driver 550.127.08 / CUDNN 9.2.0.82.
  Notes on installing the eRDMA software stack
  - On the Public Image tab, if you select an image operating system and version that supports Install eRDMA Software Stack (which means you can select the Install eRDMA Software Stack option), but you do not select the Install eRDMA Software Stack option, you can install the eRDMA software stack by using a script or by manual installation after the instance is created.
  - On the Public Image tab, if you select an image whose operating system and version do not support Install eRDMA Software Stack (that is, you cannot select the Install eRDMA Software Stack option), you cannot enable and use the eRDMA network interface after the instance is created by running a script or by manual installation.
  - On the Public Image tab, if you clear the Install eRDMA Software Stack option, more operating systems and versions become available.
- (Optional) Jumbo Frame: If the selected instance supports jumbo frames, you can enable this feature to improve eRDMA communication performance.
  Enabling jumbo frames allows you to set a larger MTU. When you use NCCL and the LL128 low-latency protocol for communication, the MTU must be 8500. If jumbo frames are not enabled, the MTU should be 1400. An incorrect MTU setting can cause data consistency issues.
- ENIs: When you create a GPU instance, an eRDMA primary NIC and an eRDMA secondary network interface are created by default on the Bandwidths & Security Groups page of the configuration wizard. The eRDMA Interface option to the right of the primary and secondary network interfaces is automatically selected.
  Note
  - You cannot enable or disable eRDMA for an elastic network interface while the instance is running.
  - The two eRDMA-enabled ENIs are automatically bound to different channels. You do not need to specify the channels.
  - The primary NIC cannot be detached from the GPU-accelerated instance. It is created and deleted with the instance.
Go to the details page of the created instance and click the ENIs tab to view the NIC type.
If the NIC type for the primary NIC or a secondary ENI contains "(eRDMA Interface)", it indicates that an ENI with the Elastic RDMA Interface (ERI) enabled is attached to the instance.

Configure eRDMA for an existing GPU-accelerated instance

Log on to the ECS console.
Find the target instance, go to its details page, and then click the ENIs tab to check whether an elastic network interface with the Elastic RDMA Interface (ERI) enabled is attached to the instance.
When eRDMA is enabled for an elastic network interface, the NIC Type column displays Primary NIC(eRDMA Interface) or Secondary NIC(eRDMA Interface).
- If ERI is enabled, skip the following steps.
- If ERI is not enabled, configure it for the primary NIC or a secondary ENI.

Configure eRDMA for the primary NIC or a secondary ENI.

Note

For instance types that support jumbo frames, you can enable jumbo frames to improve eRDMA communication performance.
Enabling jumbo frames allows you to set a larger MTU. When you use NCCL and the LL128 low-latency protocol for communication, the MTU must be 8500. If jumbo frames are not enabled, the MTU should be 1400. An incorrect MTU setting can cause data consistency issues.
If you did not select the eRDMA Interface option for either the primary NIC or the secondary ENI during instance creation, you can create and enable two eRDMA-enabled secondary ENIs after the instance is created.
If you selected the eRDMA Interface option for only one of the NICs (primary or secondary) when you created the GPU-accelerated instance, you can create and enable only one more eRDMA-enabled secondary ENI after the instance is created.

Configure eRDMA for the primary NIC

Configure eRDMA for the primary NIC by calling the ModifyNetworkInterfaceAttribute operation.

Key parameters

Parameter	Description
RegionId	The ID of the region where the primary NIC resides.
NetworkInterfaceId	The ID of the primary NIC.
NetworkInterfaceTrafficMode	The communication mode of the primary NIC. Valid values: Standard: TCP communication mode. HighPerformance: RDMA communication mode with the Elastic RDMA Interface (ERI) enabled. In this procedure, set the value to `HighPerformance`.

Configure eRDMA for a secondary ENI

When you create and attach an eRDMA-enabled ENI from the ECS console, you cannot bind it to a specific channel. This limitation can halve the total bandwidth of two eRDMA-enabled ENIs. Therefore, we recommend that you attach eRDMA-enabled ENIs by using the OpenAPI.

OpenAPI

Method 1: Create and attach an eRDMA-enabled ENI

Each GPU-accelerated instance supports a maximum of two eRDMA devices. You must bind the devices to different channels by using the NetworkCardIndex parameter.

Create an eRDMA-enabled ENI.

For more information, see CreateNetworkInterface.

Key parameters

Parameter	Description
RegionId	The ID of the region in which to create the ENI.
VSwitchId	The ID of the vSwitch in the Virtual Private Cloud (VPC). The ENI is assigned a private IP address from the vSwitch's CIDR block.
SecurityGroupId	The ID of a security group for the ENI. The security group and the ENI must be in the same VPC.
NetworkInterfaceTrafficMode	The communication mode of the ENI. Valid values: Standard: TCP communication mode. HighPerformance: RDMA communication mode with the Elastic RDMA Interface (ERI) enabled. In this procedure, set the value to `HighPerformance`.

After the call succeeds, record the returned elastic network interface ID, which is the value for NetworkInterfaceId.

Attach the eRDMA-enabled ENI.

For more information, see AttachNetworkInterface.

Key parameters

Parameter	Description
RegionId	The ID of the region in which the instance resides.
NetworkInterfaceId	The ID of the eRDMA-enabled ENI that you created.
InstanceId	The ID of the instance.
NetworkCardIndex	The index of the physical NIC to which the ENI is bound. When you attach an eRDMA-enabled ENI to an instance, you must manually specify a channel (physical NIC index). The valid values are 0 and 1. If you attach two eRDMA-enabled ENIs, assign a different value to each ENI. Note To achieve the maximum network bandwidth, you must bind the two eRDMA-enabled ENIs to different channels.

After the API call succeeds and the ENI is attached, you can view the two bound ENIs on the ENIs tab of the instance details page. Their NIC Type is displayed as Secondary NIC(eRDMA Interface) and their status is Bound.

Method 2: Modify the attributes of an existing ENI

Note

This method does not support specifying the NetworkCardIndex parameter, which is the physical NIC index. If you use this method to configure a secondary ENI when two RDMA-enabled ENIs are attached, you might not achieve maximum bandwidth.

For more information, see ModifyNetworkInterfaceAttribute.

Key parameters

Parameter	Description
RegionId	The ID of the region in which the secondary ENI resides.
NetworkInterfaceId	The ID of the secondary network interface.
NetworkInterfaceTrafficMode	The communication mode of the secondary ENI. Valid values: Standard: TCP communication mode. HighPerformance: RDMA communication mode with the Elastic RDMA Interface (ERI) enabled. In this procedure, set the value to `HighPerformance`.

After the API call succeeds, you can view the attached eRDMA-enabled ENI on the ENIs tab of the GPU-accelerated instance details page.

Console

Create a secondary ENI.
For more information, see Create an ENI. When you create a secondary ENI and enable ERI for it, turn on the eRDMA Interface switch. ERI uses the settings of the secondary ENI, including its IP address and applied security group rules. On the Create ENI page, set ENI Name, VPC, vSwitch, and Security Group. Turn on the eRDMA Interface switch, configure the primary and secondary private IP addresses as needed, and then click Create ENI.
Attach the secondary ENI to the GPU-accelerated instance.
For more information, see Attach a secondary ENI.
Note
A single instance can have a maximum of two attached secondary ENIs with ERI enabled.
After you attach a secondary ENI with ERI enabled to a GPU-accelerated instance, you must stop the instance before you can detach the ENI. For more information, see Stop an instance.
Remotely connect to the GPU-accelerated instance.
For more information, see Connect to a Linux instance by using Workbench.
Run the ifconfig command to check whether the newly attached secondary ENI is available.
If the newly attached secondary ENI is not displayed, you must manually configure it. For more information, see Configure a secondary ENI. Otherwise, skip this step.
Note
Some images may not automatically recognize a newly attached secondary ENI, requiring you to configure it from within the instance.

(Optional) Install the Elastic RDMA Interface (ERI) software stack on the instance.

If you did not select the Install eRDMA Software Stack option when you selected the public image, you must install the eRDMA software stack by using a script or by manual installation.

Script-based installation

After the GPU-accelerated instance is created, you can use the following example script to install the eRDMA software stack, GPU driver, CUDA, and cuDNN:

#!/bin/sh
#Please input version to install
DRIVER_VERSION="570.133.20"
CUDA_VERSION="12.8.1"
CUDNN_VERSION="9.8.0.87"
IS_INSTALL_eRDMA="TRUE"
IS_INSTALL_RDMA="FALSE"
INSTALL_DIR="/root/auto_install"
#using .run to install driver and cuda
auto_install_script="auto_install_v4.0.sh"
script_download_url=$(curl http://100.100.100.200/latest/meta-data/source-address | head -1)"/opsx/ecs/linux/binary/script/${auto_install_script}"
echo $script_download_url
rm -rf $INSTALL_DIR
mkdir -p $INSTALL_DIR
cd $INSTALL_DIR && wget -t 10 --timeout=10 $script_download_url && bash ${INSTALL_DIR}/${auto_install_script} $DRIVER_VERSION $CUDA_VERSION $CUDNN_VERSION $IS_INSTALL_RDMA $IS_INSTALL_eRDMA

Manual installation

After you create a GPU-accelerated instance, you can manually install the OFED driver, eRDMA driver, and GPU driver, and then load the nv_peer_mem service component.

Remotely connect to the GPU-accelerated instance.
For more information, see Connect to a Linux instance by using Workbench.

Install the OFED driver.

Run the following command to install the required packages.

Alibaba Cloud Linux 3

yum install rpm-build flex iptables-devel systemd-devel gdb-headless elfutils-devel python3-Cython bison numactl-devel libmnl-devel libnl3-devel libdb-devel libselinux-devel perl-generators elfutils-libelf-devel kernel-rpm-macros valgrind-devel cmake lsof -y

CentOS 8.5/8.4/7.9

CentOS 8.5/8.4

wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/centos8/python3-Cython-0.29.32-3.16.x86_64.rpm
yum install python3-Cython-0.29.32-3.16.x86_64.rpm -y
yum install kernel-rpm-macros perl-generators libmnl-devel valgrind-devel rpm-build systemd-devel libdb-devel iptables-devel lsof elfutils-devel bison libnl3-devel libselinux-devel flex cmake numactl-devel -y

CentOS 7.9

sudo yum install  python-devel python3-Cython kernel-rpm-macros perl-generators libmnl-devel valgrind-devel rpm-build systemd-devel libdb-devel iptables-devel lsof elfutils-devel bison libnl3-devel libselinux-devel flex cmake numactl-devel -y

Ubuntu 24.04/22.04/20.04/18.04

Ubuntu 24.04

sudo apt-get update -y
sudo apt-get install -y pkg-config

Ubuntu 22.04

sudo apt-get update -y
sudo apt-get install -y pkg-config

Ubuntu 20.04

sudo apt-get update -y
sudo apt-get install -y pkg-config

Ubuntu 18.04

sudo apt-get update
sudo apt-get install -y pkg-config
sudo apt install -y make dh-python libdb-dev libselinux1-dev flex dpatch swig graphviz chrpath quilt python3-distutils bison libmnl-dev libelf-dev gcc sudo python3

Run the following command to download the OFED package configuration file.

Alibaba Cloud Linux 3

sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/MLNX_OFED_SRC-24.10-3.2.5.0.tgz
sudo tar -xvf MLNX_OFED_SRC-24.10-3.2.5.0.tgz && cd MLNX_OFED_SRC-24.10-3.2.5.0
sudo wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/alibaba_cloud3/3/ofed_alibaba_cloud3.conf
sudo rm -rf SRPMS/mlnx-ofa_kernel-24.10-OFED.24.10.3.2.5.1.src.rpm
sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/mlnx-ofa_kernel-24.10-OFED.24.10.3.2.5.1.egs.1.src.rpm  -O SRPMS/mlnx-ofa_kernel-24.10-OFED.24.10.3.2.5.1.egs.1.src.rpm

CentOS 8.5/8.4/7.9

CentOS 8.5/8.4

cd /root
wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/MLNX_OFED_SRC-5.4-3.5.8.0.tgz
tar -xvf MLNX_OFED_SRC-5.4-3.5.8.0.tgz && cd MLNX_OFED_SRC-5.4-3.5.8.0/
wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/alibaba_cloud3/3/ofed_alibaba_cloud3.conf
rm -rf SRPMS/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.src.rpm
wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.egs.1.src.rpm  -O SRPMS/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.egs.1.src.rpm

CentOS 7.9

sudo wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/MLNX_OFED_SRC-5.4-3.5.8.0.tgz
sudo tar -xvf MLNX_OFED_SRC-5.4-3.5.8.0.tgz && cd MLNX_OFED_SRC-5.4-3.5.8.0/
sudo wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/alibaba_cloud3/3/ofed_alibaba_cloud3.conf
sudo rm -rf SRPMS/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.src.rpm
sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.egs.1.src.rpm  -O SRPMS/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.egs.1.src.rpm

Ubuntu 24.04/22.04/20.04/18.04

Ubuntu 24.04/22.04/20.04

sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/MLNX_OFED_SRC-debian-24.10-3.2.5.0.tgz
sudo tar -xvf MLNX_OFED_SRC-debian-24.10-3.2.5.0.tgz && cd MLNX_OFED_SRC-24.10-3.2.5.0 && curl -O http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/ofed_debian.conf
sudo rm -rf SOURCES/mlnx-ofed-kernel_24.10.OFED.24.10.3.2.5.1.orig.tar.gz
wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/mlnx-ofed-kernel_24.10.egs.1.OFED.24.10.3.2.5.1.orig.tar.gz -O SOURCES/mlnx-ofed-kernel_24.10.egs.1.OFED.24.10.3.2.5.1.orig.tar.gz

Ubuntu 18.04

sudo wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/MLNX_OFED_SRC-debian-5.4-3.6.8.1.tgz
sudo tar -xvf MLNX_OFED_SRC-debian-5.4-3.6.8.1.tgz && cd MLNX_OFED_SRC-5.4-3.6.8.1 && curl -O http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/ofed_debian.conf
sudo rm -rf SOURCES/mlnx-ofed-kernel_5.4.orig.tar.gz
sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/mlnx-ofed-kernel_5.4.egs.orig.tar.gz -O SOURCES/mlnx-ofed-kernel_5.4.egs.orig.tar.gz

Run the command for your operating system to install the OFED driver.

Alibaba Cloud Linux 3

sudo ./install.pl --config ./ofed_alibaba_cloud3.conf --distro RHEL8
sudo dracut -f

CentOS 8.5/8.4/7.9

CentOS 8.5/8.4

./install.pl --config ./ofed_alibaba_cloud3.conf --distro RHEL8

CentOS 7.9

sudo ./install.pl --config ./ofed_alibaba_cloud3.conf --distro RHEL7

Ubuntu 24.04/22.04/20.04/18.04

Replace ${VERSION_ID} with your Ubuntu version, such as 24.04.

sudo curl -O http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/ofed_debian.conf
sudo ./install.pl --config ./ofed_debian.conf --without-dkms --build-only --kernel-only 
sudo /usr/bin/dpkg -i --force-confmiss DEBS/ubuntu`lsb_release -s -r`/x86_64/*.deb
update-initramfs -u

Run the following command to check whether the /usr/src/ofa_kernel/`uname -r` directory exists.
- If the directory exists, proceed to the next step.
```
ls /usr/src/ofa_kernel/`uname -r`
```
- If the directory does not exist, run the following command to create a symbolic link, and then proceed to the next step.
```
sudo ln -s /usr/src/ofa_kernel/default /usr/src/ofa_kernel/`uname -r`
```
Restart the instance.
After the OFED driver is installed, you must restart the instance for the new kernel module to take effect. For more information, see Restart an instance.

Install the eRDMA driver.
1. Download and install the eRDMA driver.
```
sudo wget http://mirrors.cloud.aliyuncs.com/erdma/env_setup.sh
sudo bash env_setup.sh --egs
```
2. Run the following command to verify the eRDMA driver installation by using the eadm tool.
```
eadm ver
```
  An output similar to the following indicates a successful installation.
```
[root@xxx ~]# eadm ver
Query kernel driver version: 0.2.35
```
  Note
  This topic uses driver version 0.2.35 as an example. If a "command not found" error is returned or the command fails to run, reinstall the eRDMA driver.
Install the GPU driver.
For more information, see Manually install a NVIDIA GPU driver on a Linux instance.
Load the nv_peer_mem service component.
- (Recommended) GPU driver 470.xx.xx or later
  To enable GPUDirect RDMA, you must load the nv_peer_mem service component. NVIDIA GPU drivers 470.xx.xx and later include this component pre-installed, and you can directly run the following commands to load the nvidia_peermem module.
```
sudo modprobe nvidia_peermem
# You can run the lsmod|grep nvidia command to check whether the nvidia_peermem module is loaded.
```
  Note
  If the instance is restarted, you must reload the nvidia_peermem module.
- GPU drivers earlier than 470.xx.xx
  You must manually download and install the service component. The following code shows how to download, compile, and install the component.
```
sudo git clone https://github.com/Mellanox/nv_peer_memory.git
# Compile and install nv_peer_mem.ko.
cd nv_peer_memory && make
cp nv_peer_mem.ko /lib/modules/$(uname -r)/kernel/drivers/video
depmod -a
modprobe nv_peer_mem
# You can run the lsmod|grep nv_peer_mem command to check the result.
service nv_peer_mem start
```

Verify the bandwidth.

Remotely connect to the GPU-accelerated instance.
For more information, see Connect to a Linux instance by using Workbench.

Run the following command to check whether the two eRDMA devices are working as expected.

sudo ibv_devinfo

By default, the eRDMA driver installation script installs the latest driver version. If you need to install an earlier version of the eRDMA driver, submit a ticket for assistance.

This topic uses eRDMA driver version 0.2.37 or later as an example. The following output shows two eRDMA devices working correctly. A device is in a normal state if its port state is PORT_ACTIVE.

[ecs-xxx...xxx4gnd0hZ ~]$ sudo ibv_devinfo
hca_id:	erdma_0
	transport:			eRDMA (0)
	fw_ver:				0.2.0
	node_guid:			0216:3eff:fe36:1eb4
	sys_image_guid:			0216:3eff:fe36:1eb4
	vendor_id:			0x1ded
	vendor_part_id:			4223
	hw_ver:				0x0
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		1024 (3)
			active_mtu:		1024 (3)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet
hca_id:	erdma_1
	transport:			eRDMA (0)
	fw_ver:				0.2.0
	node_guid:			0216:3eff:fe43:9c2a
	sys_image_guid:			0216:3eff:fe43:9c2a
	vendor_id:			0x1ded
	vendor_part_id:			4223
	hw_ver:				0x0
	phys_port_cnt:			1
		port:	1
			state:			PORT_ACTIVE (4)
			max_mtu:		1024 (3)
			active_mtu:		1024 (3)
			sm_lid:			0
			port_lid:		0
			port_lmc:		0x00
			link_layer:		Ethernet

Note

If the port state of an eRDMA device is invalid state, the device is in an abnormal state. We recommend that you first check whether the secondary ENI is configured correctly. For example, run the ifconfig command to verify that the configurations and IP addresses for all NICs exist.

Run the following command to install the perftest tool.
```
sudo yum install perftest -y
```

Run the following commands to test whether the RDMA network bandwidth meets hardware expectations.

On the server, run the following command to listen for connection requests from the client.
```
sudo ib_write_bw -d erdma_0 -F -q 16 --run_infinitely --report_gbits -p 18515
```
To send connection requests and data packets, run the following command on the client.
```
sudo ib_write_bw -d erdma_0 -F -q 16 --run_infinitely --report_gbits -p 18515 server_ip
```
In this command, server_ip is the private IP address of the eRDMA-enabled ENI on the server instance. To find this IP address, see View IP addresses.

Note

The preceding perftest benchmark uses one NIC for communication. If your service requires two NICs for communication, you must start two perftest processes. Then, use the -d parameter to specify an eRDMA device for each process and the -p parameter to specify different communication ports. For more information, see perftest details.

The test results include the average bandwidth. Output similar to the following indicates that eRDMA communication is normal.

Command output details

---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : erdma_0
 Number of qps   : 16           Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : OFF
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 1
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x0002 PSN 0xa66b22 RKey 0x000100 VAddr 0x007f09922fd000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x0003 PSN 0x3b9364 RKey 0x000100 VAddr 0x007f099230d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x0004 PSN 0x6b1ade RKey 0x000100 VAddr 0x007f099231d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x0005 PSN 0x8c83d5 RKey 0x000100 VAddr 0x007f099232d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x0006 PSN 0x1335c4 RKey 0x000100 VAddr 0x007f099233d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x0007 PSN 0xc451d6 RKey 0x000100 VAddr 0x007f099234d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x0008 PSN 0x4edd7d RKey 0x000100 VAddr 0x007f099235d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x0009 PSN 0x93d832 RKey 0x000100 VAddr 0x007f099236d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x000a PSN 0x16d2ee RKey 0x000100 VAddr 0x007f099237d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x000b PSN 0x6820d8 RKey 0x000100 VAddr 0x007f099238d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x000c PSN 0x9419c RKey 0x000100 VAddr 0x007f099239d000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x000d PSN 0xedd7ff RKey 0x000100 VAddr 0x007f09923ad000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x000e PSN 0x70ff7f RKey 0x000100 VAddr 0x007f09923bd000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x000f PSN 0x8ccc0 RKey 0x000100 VAddr 0x007f09923cd000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x0010 PSN 0x33327e RKey 0x000100 VAddr 0x007f09923dd000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 local address: LID 0000 QPN 0x0011 PSN 0x9b836a RKey 0x000100 VAddr 0x007f09923ed000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:146
 remote address: LID 0000 QPN 0x0002 PSN 0x651666 RKey 0x000100 VAddr 0x007f5011099000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x0003 PSN 0xf99758 RKey 0x000100 VAddr 0x007f50110a9000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x0004 PSN 0xd001c2 RKey 0x000100 VAddr 0x007f50110b9000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x0005 PSN 0x23aae9 RKey 0x000100 VAddr 0x007f50110c9000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x0006 PSN 0xfad148 RKey 0x000100 VAddr 0x007f50110d9000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x0007 PSN 0xca210a RKey 0x000100 VAddr 0x007f50110e9000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x0008 PSN 0xe0cea1 RKey 0x000100 VAddr 0x007f50110f9000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x0009 PSN 0x8ddc86 RKey 0x000100 VAddr 0x007f5011109000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x000a PSN 0xde22b2 RKey 0x000100 VAddr 0x007f5011119000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x000b PSN 0x9f2f4c RKey 0x000100 VAddr 0x007f5011129000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x000c PSN 0x66a100 RKey 0x000100 VAddr 0x007f5011139000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x000d PSN 0x934d93 RKey 0x000100 VAddr 0x007f5011149000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x000e PSN 0xf70783 RKey 0x000100 VAddr 0x007f5011159000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x000f PSN 0xfdce74 RKey 0x000100 VAddr 0x007f5011169000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x0010 PSN 0xfca422 RKey 0x000100 VAddr 0x007f5011179000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
 remote address: LID 0000 QPN 0x0011 PSN 0xaa3e3e RKey 0x000100 VAddr 0x007f5011189000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:149
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 65536      910045           0.00               95.42              0.182003

Test and verify

This topic uses nccl-tests as an example to demonstrate how to test the application performance of GPU instances with eRDMA networking. For more information, see nccl-tests.

Install NCCL with the following commands.
Note
You can also download an installation package from the official NVIDIA NCCL website and then install it.
This example installs NCCL to /usr/local/nccl. You can specify a different path based on your requirements.
```
# build nccl
cd /root
git clone https://github.com/NVIDIA/nccl.git
cd nccl/
make -j src.lib PREFIX=/usr/local/nccl
make install PREFIX=/usr/local/nccl
```
Verify the NCCL installation and the presence of the libnccl.so library by running the following commands.
```
# Check for NCCL
ls /usr/local/nccl
# Check for the libnccl.so library
ls /usr/local/nccl/lib
```

Install Open MPI and the required compilers with the following commands.

wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.3.tar.gz
tar -xzf openmpi-4.1.3.tar.gz
cd openmpi-4.1.3
./configure --prefix=/usr/local/openmpi
make -j && make install

Set the environment variables.
```
NCCL_HOME=/usr/local/nccl
CUDA_HOME=/usr/local/cuda
MPI_HOME=/usr/local/openmpi
export LD_LIBRARY_PATH=${NCCL_HOME}/lib:${CUDA_HOME}/lib64:${MPI_HOME}/lib:$LD_LIBRARY_PATH
export PATH=${CUDA_HOME}/bin:${MPI_HOME}/bin:$PATH
```
The preceding commands use the following example paths: NCCL_HOME points to the NCCL installation path (/usr/local/nccl), CUDA_HOME points to the CUDA installation path (/usr/local/cuda), and MPI_HOME points to the Open MPI installation path (/usr/local/openmpi). Replace these with your actual installation paths.
After editing the ~/.bashrc file to set the PATH and LD_LIBRARY_PATH variables, run the following command to apply the changes.
```
source ~/.bashrc
```

Download and compile the test code.

git clone https://github.com/NVIDIA/nccl-tests
cd nccl-tests/
make MPI=1 CUDA_HOME=/usr/local/cuda MPI_HOME=/usr/local/openmpi

Establish passwordless SSH access between instances.

To establish passwordless SSH access, generate a public key on host1 and copy it to host2.

# On host1
ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub ${host2}
# On host1, run this command to test the connection. A successful login without a password prompt confirms the setup.
ssh root@${host2}

Test the NCCL all-reduce performance with the following command.

# Replace host1 and host2 with the IP addresses of your instances.
mpirun --allow-run-as-root -np 16 -npernode 8 -H host1:8,host2:8 \
--bind-to none \
-mca btl_tcp_if_include eth0 \
-x NCCL_SOCKET_IFNAME=eth0 \
-x NCCL_GIN_TYPE=0 \
-x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH \
-x PATH \
./build/all_reduce_perf -b 4M -e 4M -f 2 -g 1 -t 1 -n 20

Verify eRDMA configuration

After configuring eRDMA, use the following checklist to verify that the feature functions correctly.

This section applies to the following eRDMA-capable GPU Elastic Bare Metal Instance types: ecs.ebmgn9g, ecs.ebmgn9gc, ecs.ebmgn9ge, ecs.ebmgn8is, and ecs.ebmgn8v.

One-click check script (Recommended)

Use this one-click script to quickly verify all the following items:

wget http://mirrors.cloud.aliyuncs.com/erdma/tools/env_check.py
python3 env_check.py -s egs_l20n

If all checks return PASS, the environment is configured correctly. To get the output in JSON format, add the --json parameter.

Note

The script does not check if GPU Access Control Services (ACS) is disabled; you must verify this manually. Additionally, you must set the NCCL_GRAPH_FILE environment variable when running NCCL.

Category	Item	Verification method	Description
Network interface configuration	The instance has two eRDMA network interfaces	Run the `ibv_devices` command and verify that the output contains two devices: `erdma_0` and `erdma_1`.	When you create an instance from the console, two eRDMA network interfaces are configured by default. No further action is required.
	Network interface state is PORT_ACTIVE	Run the following command and verify that the state of all ports is `PORT_ACTIVE`: `ibv_devinfo \| grep state`	If the check fails, run the `ifconfig` command to verify that the corresponding Ethernet devices are in the UP state.
	The two network interfaces are attached to different NUMA nodes	Run the following commands and verify that the returned values are `0` and `1`: `cat /sys/class/infiniband/erdma_0/device/numa_node cat /sys/class/infiniband/erdma_1/device/numa_node`	If both commands return 0, the secondary network interface is attached to the wrong NUMA node. To fix this, follow these steps: Detach the secondary network interface. Call the AttachNetworkInterface API operation to reattach the secondary network interface, and set the `NetworkCardIndex` parameter to `1`.
	Jumbo frames are enabled	Run the following command and verify that the MTU of the network interfaces is 4096: `ibv_devinfo \| grep mtu`	If the MTU is not 4096, enable jumbo frames and then run the check again.
	The MPCC congestion control algorithm is enabled	Run the following commands to check the congestion control algorithm: `eadm conf -d erdma_0 -t cc eadm conf -d erdma_1 -t cc`	eRDMA driver 1.5.6 and later versions use the MPCC algorithm by default. If not, run the following commands to enable it: `eadm conf -d erdma_0 -t cc -v 4 eadm conf -d erdma_1 -t cc -v 4`
	No IP address conflicts exist for the eRDMA network interfaces	Run the `show_gids` command and verify that the IP addresses of the eRDMA network interfaces are unique.	This issue often occurs when using the Terway network plug-in for ACK. If you encounter this issue, see Terway whitelist configuration.
NCCL	Specify the NCCL topology file (required only for ecs.ebmgn9g, ecs.ebmgn9gc, and ecs.ebmgn9ge instance types)	Save the topology file (l20n.xml) to a local path on the instance, such as `/root/l20n.xml`. Before you start an NCCL task, set the following environment variables: `export NCCL_GRAPH_FILE=/root/l20n.xml`	Topology file download URL: Public download URL: l20n.xml Internal network download address: l20n.xml
GPU	Disable GPU ACS to improve P2P communication (for ecs.ebmgn9g, ecs.ebmgn9gc, and ecs.ebmgn9ge instance types only)	Run the following command to check the ACS status: `lspci -vvv \| grep ACSCtl` If `SrcValid-` is displayed, ACS is disabled and no action is required.	If `SrcValid+` is displayed, run the following script to manually disable ACS: `for BDF in $(lspci -d "::*" \| awk '{print $1}'); do sudo setpci -v -s ${BDF} ECAP_ACS+0x6.w > /dev/null 2>&1 if [ $? -ne 0 ]; then continue fi sudo setpci -v -s ${BDF} ECAP_ACS+0x6.w=0000 done`

References

Configure eRDMA on enterprise-level ECS instances for an ultra-low latency, high-throughput, and elastic RDMA network service without changing your business network. For more information, see Enable eRDMA on enterprise-level instances.
For applications that require large-scale data transfer and high-performance network communication in containers, you can integrate eRDMA into the container (Docker) environment. This allows containerized applications to bypass the OS kernel and directly access the host's physical eRDMA devices, providing faster data transfer and more efficient communication. For more information, see Enable eRDMA in a container (Docker).
To monitor or diagnose eRDMA and track its real-time status, see Monitor and diagnose eRDMA.