After you attach an Elastic RDMA Interface (ERI) to a gpu-accelerated instance, you can enable Remote Direct Memory Access (RDMA) to accelerate communication between gpu-accelerated instances in a Virtual Private Cloud (VPC). eRDMA transfers data more efficiently than traditional RDMA, improving communication between gpu-accelerated instances and reducing task processing time. This topic describes how to enable eRDMA on a gpu-accelerated instance.
Limitations
Item | Description |
Instance type | The Elastic RDMA Interface (ERI) supports the following instance types:
|
Image | Use one of the following images:
|
Number of eRDMA devices |
|
Network limitations |
|
Procedure
To use the eRDMA feature, an instance must have the eRDMA software stack installed and an ERI-enabled elastic network interface attached.
Configure eRDMA during instance creation
Go to the Custom Launch page in the ECS console.
Create a GPU-accelerated instance that supports ERI.
When you create the instance, note the following configurations. For other parameters, see Create an instance by using the wizard.
Instance Type: Select an instance type that supports ERI. For more information, see Limitations. This topic uses ebmgn8is as an example.
Images: When you select a public image, the Auto-install GPU Driver and Install eRDMA Software Stack options are selected by default. After the instance is created, the system automatically installs the GPU driver, CUDA, cuDNN, and the eRDMA software stack.
In this example, the operating system is Alibaba Cloud Linux 3.2104 LTS 64-bit, and the GPU driver version is CUDA 12.4.1 / Driver 550.127.08 / CUDNN 9.2.0.82.
(Optional) Jumbo Frame: If the selected instance supports jumbo frames, you can enable this feature to improve
eRDMAcommunication performance.Enabling jumbo frames allows you to set a larger MTU. When you use NCCL and the LL128 low-latency protocol for communication, the MTU must be 8500. If jumbo frames are not enabled, the MTU should be 1400. An incorrect MTU setting can cause data consistency issues.
ENIs: When you create a GPU instance, an eRDMA primary NIC and an eRDMA secondary network interface are created by default on the Bandwidths & Security Groups page of the configuration wizard. The eRDMA Interface option to the right of the primary and secondary network interfaces is automatically selected.
NoteYou cannot enable or disable eRDMA for an elastic network interface while the instance is running.
The two eRDMA-enabled ENIs are automatically bound to different channels. You do not need to specify the channels.
The primary NIC cannot be detached from the GPU-accelerated instance. It is created and deleted with the instance.
Go to the details page of the created instance and click the ENIs tab to view the NIC type.
If the NIC type for the primary NIC or a secondary ENI contains "(eRDMA Interface)", it indicates that an ENI with the Elastic RDMA Interface (ERI) enabled is attached to the instance.
Configure eRDMA for an existing GPU-accelerated instance
Log on to the ECS console.
Find the target instance, go to its details page, and then click the ENIs tab to check whether an elastic network interface with the Elastic RDMA Interface (ERI) enabled is attached to the instance.
When eRDMA is enabled for an elastic network interface, the NIC Type column displays Primary NIC(eRDMA Interface) or Secondary NIC(eRDMA Interface).
If ERI is enabled, skip the following steps.
If ERI is not enabled, configure it for the primary NIC or a secondary ENI.
Configure eRDMA for the primary NIC or a secondary ENI.
NoteFor instance types that support jumbo frames, you can enable jumbo frames to improve
eRDMAcommunication performance.Enabling jumbo frames allows you to set a larger MTU. When you use NCCL and the LL128 low-latency protocol for communication, the MTU must be 8500. If jumbo frames are not enabled, the MTU should be 1400. An incorrect MTU setting can cause data consistency issues.
If you did not select the eRDMA Interface option for either the primary NIC or the secondary ENI during instance creation, you can create and enable two eRDMA-enabled secondary ENIs after the instance is created.
If you selected the eRDMA Interface option for only one of the NICs (primary or secondary) when you created the GPU-accelerated instance, you can create and enable only one more eRDMA-enabled secondary ENI after the instance is created.
(Optional) Install the Elastic RDMA Interface (ERI) software stack on the instance.
If you did not select the Install eRDMA Software Stack option when you selected the public image, you must install the eRDMA software stack by using a script or by manual installation.
Script-based installation
After the GPU-accelerated instance is created, you can use the following example script to install the eRDMA software stack, GPU driver, CUDA, and cuDNN:
#!/bin/sh #Please input version to install DRIVER_VERSION="570.133.20" CUDA_VERSION="12.8.1" CUDNN_VERSION="9.8.0.87" IS_INSTALL_eRDMA="TRUE" IS_INSTALL_RDMA="FALSE" INSTALL_DIR="/root/auto_install" #using .run to install driver and cuda auto_install_script="auto_install_v4.0.sh" script_download_url=$(curl http://100.100.100.200/latest/meta-data/source-address | head -1)"/opsx/ecs/linux/binary/script/${auto_install_script}" echo $script_download_url rm -rf $INSTALL_DIR mkdir -p $INSTALL_DIR cd $INSTALL_DIR && wget -t 10 --timeout=10 $script_download_url && bash ${INSTALL_DIR}/${auto_install_script} $DRIVER_VERSION $CUDA_VERSION $CUDNN_VERSION $IS_INSTALL_RDMA $IS_INSTALL_eRDMAManual installation
After you create a GPU-accelerated instance, you can manually install the OFED driver, eRDMA driver, and GPU driver, and then load the
nv_peer_memservice component.Remotely connect to the GPU-accelerated instance.
For more information, see Connect to a Linux instance by using Workbench.
Install the OFED driver.
Run the following command to install the required packages.
Alibaba Cloud Linux 3
yum install rpm-build flex iptables-devel systemd-devel gdb-headless elfutils-devel python3-Cython bison numactl-devel libmnl-devel libnl3-devel libdb-devel libselinux-devel perl-generators elfutils-libelf-devel kernel-rpm-macros valgrind-devel cmake lsof -yCentOS 8.5/8.4/7.9
CentOS 8.5/8.4
wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/centos8/python3-Cython-0.29.32-3.16.x86_64.rpm yum install python3-Cython-0.29.32-3.16.x86_64.rpm -y yum install kernel-rpm-macros perl-generators libmnl-devel valgrind-devel rpm-build systemd-devel libdb-devel iptables-devel lsof elfutils-devel bison libnl3-devel libselinux-devel flex cmake numactl-devel -yCentOS 7.9
sudo yum install python-devel python3-Cython kernel-rpm-macros perl-generators libmnl-devel valgrind-devel rpm-build systemd-devel libdb-devel iptables-devel lsof elfutils-devel bison libnl3-devel libselinux-devel flex cmake numactl-devel -y
Ubuntu 24.04/22.04/20.04/18.04
Ubuntu 24.04
sudo apt-get update -y sudo apt-get install -y pkg-configUbuntu 22.04
sudo apt-get update -y sudo apt-get install -y pkg-configUbuntu 20.04
sudo apt-get update -y sudo apt-get install -y pkg-configUbuntu 18.04
sudo apt-get update sudo apt-get install -y pkg-config sudo apt install -y make dh-python libdb-dev libselinux1-dev flex dpatch swig graphviz chrpath quilt python3-distutils bison libmnl-dev libelf-dev gcc sudo python3
Run the following command to download the OFED package configuration file.
Alibaba Cloud Linux 3
sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/MLNX_OFED_SRC-24.10-3.2.5.0.tgz sudo tar -xvf MLNX_OFED_SRC-24.10-3.2.5.0.tgz && cd MLNX_OFED_SRC-24.10-3.2.5.0 sudo wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/alibaba_cloud3/3/ofed_alibaba_cloud3.conf sudo rm -rf SRPMS/mlnx-ofa_kernel-24.10-OFED.24.10.3.2.5.1.src.rpm sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/mlnx-ofa_kernel-24.10-OFED.24.10.3.2.5.1.egs.1.src.rpm -O SRPMS/mlnx-ofa_kernel-24.10-OFED.24.10.3.2.5.1.egs.1.src.rpmCentOS 8.5/8.4/7.9
CentOS 8.5/8.4
cd /root wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/MLNX_OFED_SRC-5.4-3.5.8.0.tgz tar -xvf MLNX_OFED_SRC-5.4-3.5.8.0.tgz && cd MLNX_OFED_SRC-5.4-3.5.8.0/ wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/alibaba_cloud3/3/ofed_alibaba_cloud3.conf rm -rf SRPMS/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.src.rpm wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.egs.1.src.rpm -O SRPMS/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.egs.1.src.rpmCentOS 7.9
sudo wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/MLNX_OFED_SRC-5.4-3.5.8.0.tgz sudo tar -xvf MLNX_OFED_SRC-5.4-3.5.8.0.tgz && cd MLNX_OFED_SRC-5.4-3.5.8.0/ sudo wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/alibaba_cloud3/3/ofed_alibaba_cloud3.conf sudo rm -rf SRPMS/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.src.rpm sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.egs.1.src.rpm -O SRPMS/mlnx-ofa_kernel-5.4-OFED.5.4.3.5.8.1.egs.1.src.rpm
Ubuntu 24.04/22.04/20.04/18.04
Ubuntu 24.04/22.04/20.04
sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/MLNX_OFED_SRC-debian-24.10-3.2.5.0.tgz sudo tar -xvf MLNX_OFED_SRC-debian-24.10-3.2.5.0.tgz && cd MLNX_OFED_SRC-24.10-3.2.5.0 && curl -O http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/ofed_debian.conf sudo rm -rf SOURCES/mlnx-ofed-kernel_24.10.OFED.24.10.3.2.5.1.orig.tar.gz wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/mlnx-ofed-kernel_24.10.egs.1.OFED.24.10.3.2.5.1.orig.tar.gz -O SOURCES/mlnx-ofed-kernel_24.10.egs.1.OFED.24.10.3.2.5.1.orig.tar.gzUbuntu 18.04
sudo wget http://mirrors.cloud.aliyuncs.com/opsx/ecs/linux/binary/erdma/ofed/MLNX_OFED_SRC-debian-5.4-3.6.8.1.tgz sudo tar -xvf MLNX_OFED_SRC-debian-5.4-3.6.8.1.tgz && cd MLNX_OFED_SRC-5.4-3.6.8.1 && curl -O http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/ofed_debian.conf sudo rm -rf SOURCES/mlnx-ofed-kernel_5.4.orig.tar.gz sudo wget http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/mlnx-ofed-kernel_5.4.egs.orig.tar.gz -O SOURCES/mlnx-ofed-kernel_5.4.egs.orig.tar.gz
Run the command for your operating system to install the OFED driver.
Alibaba Cloud Linux 3
sudo ./install.pl --config ./ofed_alibaba_cloud3.conf --distro RHEL8 sudo dracut -fCentOS 8.5/8.4/7.9
CentOS 8.5/8.4
./install.pl --config ./ofed_alibaba_cloud3.conf --distro RHEL8CentOS 7.9
sudo ./install.pl --config ./ofed_alibaba_cloud3.conf --distro RHEL7
Ubuntu 24.04/22.04/20.04/18.04
Replace
${VERSION_ID}with your Ubuntu version, such as 24.04.sudo curl -O http://mirrors.cloud.aliyuncs.com/erdma/kernel-fix/deb/ofed_debian.conf sudo ./install.pl --config ./ofed_debian.conf --without-dkms --build-only --kernel-only sudo /usr/bin/dpkg -i --force-confmiss DEBS/ubuntu`lsb_release -s -r`/x86_64/*.deb update-initramfs -uRun the following command to check whether the
/usr/src/ofa_kernel/`uname -r`directory exists.If the directory exists, proceed to the next step.
ls /usr/src/ofa_kernel/`uname -r`If the directory does not exist, run the following command to create a symbolic link, and then proceed to the next step.
sudo ln -s /usr/src/ofa_kernel/default /usr/src/ofa_kernel/`uname -r`
Restart the instance.
After the OFED driver is installed, you must restart the instance for the new kernel module to take effect. For more information, see Restart an instance.
Install the eRDMA driver.
Download and install the eRDMA driver.
sudo wget http://mirrors.cloud.aliyuncs.com/erdma/env_setup.sh sudo bash env_setup.sh --egsRun the following command to verify the eRDMA driver installation by using the eadm tool.
eadm verAn output similar to the following indicates a successful installation.
[root@xxx ~]# eadm ver Query kernel driver version: 0.2.35NoteThis topic uses driver version 0.2.35 as an example. If a "command not found" error is returned or the command fails to run, reinstall the eRDMA driver.
Install the GPU driver.
For more information, see Manually install a NVIDIA GPU driver on a Linux instance.
Load the nv_peer_mem service component.
(Recommended) GPU driver 470.xx.xx or later
To enable GPUDirect RDMA, you must load the nv_peer_mem service component. NVIDIA GPU drivers 470.xx.xx and later include this component pre-installed, and you can directly run the following commands to load the nvidia_peermem module.
sudo modprobe nvidia_peermem # You can run the lsmod|grep nvidia command to check whether the nvidia_peermem module is loaded.NoteIf the instance is restarted, you must reload the nvidia_peermem module.
GPU drivers earlier than 470.xx.xx
You must manually download and install the service component. The following code shows how to download, compile, and install the component.
sudo git clone https://github.com/Mellanox/nv_peer_memory.git # Compile and install nv_peer_mem.ko. cd nv_peer_memory && make cp nv_peer_mem.ko /lib/modules/$(uname -r)/kernel/drivers/video depmod -a modprobe nv_peer_mem # You can run the lsmod|grep nv_peer_mem command to check the result. service nv_peer_mem start
Verify the bandwidth.
Remotely connect to the GPU-accelerated instance.
For more information, see Connect to a Linux instance by using Workbench.
Run the following command to check whether the two eRDMA devices are working as expected.
sudo ibv_devinfoBy default, the eRDMA driver installation script installs the latest driver version. If you need to install an earlier version of the eRDMA driver, submit a ticket for assistance.
This topic uses eRDMA driver version 0.2.37 or later as an example. The following output shows two eRDMA devices working correctly. A device is in a normal state if its port
stateisPORT_ACTIVE.[ecs-xxx...xxx4gnd0hZ ~]$ sudo ibv_devinfo hca_id: erdma_0 transport: eRDMA (0) fw_ver: 0.2.0 node_guid: 0216:3eff:fe36:1eb4 sys_image_guid: 0216:3eff:fe36:1eb4 vendor_id: 0x1ded vendor_part_id: 4223 hw_ver: 0x0 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 1024 (3) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet hca_id: erdma_1 transport: eRDMA (0) fw_ver: 0.2.0 node_guid: 0216:3eff:fe43:9c2a sys_image_guid: 0216:3eff:fe43:9c2a vendor_id: 0x1ded vendor_part_id: 4223 hw_ver: 0x0 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 1024 (3) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: EthernetNoteIf the port
stateof an eRDMA device isinvalid state, the device is in an abnormal state. We recommend that you first check whether the secondary ENI is configured correctly. For example, run theifconfigcommand to verify that the configurations and IP addresses for all NICs exist.Run the following command to install the perftest tool.
sudo yum install perftest -yRun the following commands to test whether the RDMA network bandwidth meets hardware expectations.
On the server, run the following command to listen for connection requests from the client.
sudo ib_write_bw -d erdma_0 -F -q 16 --run_infinitely --report_gbits -p 18515To send connection requests and data packets, run the following command on the client.
sudo ib_write_bw -d erdma_0 -F -q 16 --run_infinitely --report_gbits -p 18515 server_ipIn this command,
server_ipis the private IP address of the eRDMA-enabled ENI on the server instance. To find this IP address, see View IP addresses.
NoteThe preceding perftest benchmark uses one NIC for communication. If your service requires two NICs for communication, you must start two perftest processes. Then, use the
-dparameter to specify an eRDMA device for each process and the-pparameter to specify different communication ports. For more information, see perftest details.The test results include the average bandwidth. Output similar to the following indicates that eRDMA communication is normal.
Test and verify
This topic uses nccl-tests as an example to demonstrate how to test the application performance of GPU instances with eRDMA networking. For more information, see nccl-tests.
Install NCCL with the following commands.
NoteYou can also download an installation package from the official NVIDIA NCCL website and then install it.
This example installs NCCL to
/usr/local/nccl. You can specify a different path based on your requirements.# build nccl cd /root git clone https://github.com/NVIDIA/nccl.git cd nccl/ make -j src.lib PREFIX=/usr/local/nccl make install PREFIX=/usr/local/ncclVerify the NCCL installation and the presence of the libnccl.so library by running the following commands.
# Check for NCCL ls /usr/local/nccl # Check for the libnccl.so library ls /usr/local/nccl/libInstall Open MPI and the required compilers with the following commands.
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.3.tar.gz tar -xzf openmpi-4.1.3.tar.gz cd openmpi-4.1.3 ./configure --prefix=/usr/local/openmpi make -j && make installSet the environment variables.
NCCL_HOME=/usr/local/nccl CUDA_HOME=/usr/local/cuda MPI_HOME=/usr/local/openmpi export LD_LIBRARY_PATH=${NCCL_HOME}/lib:${CUDA_HOME}/lib64:${MPI_HOME}/lib:$LD_LIBRARY_PATH export PATH=${CUDA_HOME}/bin:${MPI_HOME}/bin:$PATHThe preceding commands use the following example paths:
NCCL_HOMEpoints to the NCCL installation path (/usr/local/nccl),CUDA_HOMEpoints to the CUDA installation path (/usr/local/cuda), andMPI_HOMEpoints to the Open MPI installation path (/usr/local/openmpi). Replace these with your actual installation paths.After editing the
~/.bashrcfile to set thePATHandLD_LIBRARY_PATHvariables, run the following command to apply the changes.source ~/.bashrcDownload and compile the test code.
git clone https://github.com/NVIDIA/nccl-tests cd nccl-tests/ make MPI=1 CUDA_HOME=/usr/local/cuda MPI_HOME=/usr/local/openmpiEstablish passwordless SSH access between instances.
To establish passwordless SSH access, generate a public key on host1 and copy it to host2.
# On host1 ssh-keygen ssh-copy-id -i ~/.ssh/id_rsa.pub ${host2} # On host1, run this command to test the connection. A successful login without a password prompt confirms the setup. ssh root@${host2}Test the NCCL all-reduce performance with the following command.
# Replace host1 and host2 with the IP addresses of your instances. mpirun --allow-run-as-root -np 16 -npernode 8 -H host1:8,host2:8 \ --bind-to none \ -mca btl_tcp_if_include eth0 \ -x NCCL_SOCKET_IFNAME=eth0 \ -x NCCL_GIN_TYPE=0 \ -x NCCL_DEBUG=INFO \ -x LD_LIBRARY_PATH \ -x PATH \ ./build/all_reduce_perf -b 4M -e 4M -f 2 -g 1 -t 1 -n 20
Verify eRDMA configuration
After configuring eRDMA, use the following checklist to verify that the feature functions correctly.
This section applies to the following eRDMA-capable GPU Elastic Bare Metal Instance types: ecs.ebmgn9g, ecs.ebmgn9gc, ecs.ebmgn9ge, ecs.ebmgn8is, and ecs.ebmgn8v.
Category | Item | Verification method | Description |
Network interface configuration | The instance has two eRDMA network interfaces | Run the | When you create an instance from the console, two eRDMA network interfaces are configured by default. No further action is required. |
Network interface state is PORT_ACTIVE | Run the following command and verify that the state of all ports is | If the check fails, run the | |
The two network interfaces are attached to different NUMA nodes | Run the following commands and verify that the returned values are | If both commands return 0, the secondary network interface is attached to the wrong NUMA node. To fix this, follow these steps:
| |
Jumbo frames are enabled | Run the following command and verify that the MTU of the network interfaces is 4096: | If the MTU is not 4096, enable jumbo frames and then run the check again. | |
The MPCC congestion control algorithm is enabled | Run the following commands to check the congestion control algorithm: | eRDMA driver 1.5.6 and later versions use the MPCC algorithm by default. If not, run the following commands to enable it: | |
No IP address conflicts exist for the eRDMA network interfaces | Run the | This issue often occurs when using the Terway network plug-in for ACK. If you encounter this issue, see Terway whitelist configuration. | |
NCCL | Specify the NCCL topology file (required only for ecs.ebmgn9g, ecs.ebmgn9gc, and ecs.ebmgn9ge instance types) | Save the topology file (l20n.xml) to a local path on the instance, such as Before you start an NCCL task, set the following environment variables: | Topology file download URL: |
GPU | Disable GPU ACS to improve P2P communication (for ecs.ebmgn9g, ecs.ebmgn9gc, and ecs.ebmgn9ge instance types only) | Run the following command to check the ACS status: If | If |
References
Configure eRDMA on enterprise-level ECS instances for an ultra-low latency, high-throughput, and elastic RDMA network service without changing your business network. For more information, see Enable eRDMA on enterprise-level instances.
For applications that require large-scale data transfer and high-performance network communication in containers, you can integrate eRDMA into the container (Docker) environment. This allows containerized applications to bypass the OS kernel and directly access the host's physical eRDMA devices, providing faster data transfer and more efficient communication. For more information, see Enable eRDMA in a container (Docker).
To monitor or diagnose eRDMA and track its real-time status, see Monitor and diagnose eRDMA.