eRDMA (Elastic Remote Direct Memory Access) is a high-performance network communication technology. Using eRDMA in a Docker container allows applications to bypass the operating system kernel and directly access the host's physical eRDMA devices. This provides faster data transmission and improves communication efficiency. This is suitable for scenarios that require large-scale data transmission and high-performance network communication in containers. This topic describes how to configure eRDMA in Docker containers. It also describes how to use the eRDMA Controller component to quickly configure eRDMA on pods in a self-managed Kubernetes cluster.
Limits
The eRDMA feature is supported only on the following Docker images.
Image sources for Alibaba Cloud Linux 3, Ubuntu 22.04, and Ubuntu 24.04 (supported on both ARM and x86 instances)
Image sources for Alibaba Cloud Linux 2, CentOS 7, CentOS 8, Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 24.04 (supported only on x86 instances)
Image sources for Anolis OS are not supported.
How it works
To use the eRDMA feature in a container environment, you can use the --device option of Docker to map the /dev/infiniband/rdma_cm and /dev/infiniband/uverbsX character devices to the container. This allows user mode programs inside the container to bypass the operating system kernel and directly access the eRDMA device to send and receive data.
/dev/infiniband/rdma_cm: A character device for eRDMA connection management. User mode programs can perform operations on this character device to establish, destroy, and manage connections with the eRDMA device. These operations include creating and destroying connections, and sending and receiving connection events./dev/infiniband/uverbsX: A character device for user space eRDMA operations. User mode programs can perform operations on this character device to communicate with the eRDMA device. These operations include opening the device, creating and destroying eRDMA communication endpoints, and registering and unregistering memory buffers.NoteIn
/dev/infiniband/uverbsX,Xis the device index number and may vary based on the system and configuration. You can run thels /dev/infiniband | grep uverbscommand to view the character device name.
Configure eRDMA in a Docker container
Step 1: Configure eRDMA for the instance
Confirm that the instance type where Docker is located supports eRDMA. Also, confirm that an Elastic RDMA Interface is attached, the eRDMA driver is deployed, and the eRDMA device is working correctly.
For Enterprise-level CPU instances, see Enable eRDMA on an enterprise-level instance.
For GPU-accelerated instances, see Enable eRDMA on a GPU-accelerated instance.
Step 2: (Optional) Deploy Docker on the instance
If Docker is not deployed on your instance, install it.
Follow these steps:
Remotely connect to the instance.
For more information, see Log on to a Linux instance using Workbench.
Run the following command to check whether a Docker environment is deployed on the instance.
sudo docker -vIf Docker is deployed correctly, a specific version number is returned, as shown in the following figure:

If Docker is not deployed or an error occurs, the following result may appear. In this case, see Deploy Docker on the instance.

Deploy Docker on the instance.
For Alibaba Cloud Linux or CentOS images, see Install Docker.
For Ubuntu images, see the Ubuntu documentation.
Step 3: (Optional) Deploy an image for Docker
If your Docker environment does not have any images, deploy one as needed.
This topic uses an Alibaba Cloud Linux base image as an example to show how to download an Alibaba Cloud Linux image in Docker. This operation requires Internet access.
Follow these steps:
Run the following command to download an Alibaba Cloud Linux Docker image:
sudo docker pull alibaba-cloud-linux-<image_version>-registry.<region_ID>.cr.aliyuncs.com/alinux<image_version>/alinux<image_version>:<TAG><image_version>: the Alibaba Cloud Linux version. Example: 2 or 3.<region_ID>: the region ID of the Docker image. Example:cn-hangzhou.<TAG>: optional. The tag of the Docker image. If you specify this parameter, the specified Docker image version is downloaded. Otherwise, the latest Docker image version is downloaded.
Sample commands:
Download version 220901.1 of the Alibaba Cloud Linux 3 Docker image in the China (Hangzhou) region:
sudo docker pull alibaba-cloud-linux-3-registry.cn-hangzhou.cr.aliyuncs.com/alinux3/alinux3:220901.1Download the latest version of the Alibaba Cloud Linux 2 Docker image in the China (Hangzhou) region:
sudo docker pull alibaba-cloud-linux-2-registry.cn-hangzhou.cr.aliyuncs.com/alinux2/alinux2
Run the following command to check whether the images are downloaded:
sudo docker imagesThe following output shows that version 220901.1 of the Alibaba Cloud Linux 3 Docker image and the latest version of the Alibaba Cloud Linux 2 Docker image are downloaded.

Step 4: Start the container and attach the eRDMA device
You can use the --device option of Docker to map the /dev/infiniband/rdma_cm and /dev/infiniband/uverbsX character devices to the container. This allows user mode programs inside the container to bypass the operating system kernel and directly access the eRDMA device to send and receive data. For more information, see How it works.
Follow these steps:
Remotely connect to the ECS instance.
For more information, see Log on to a Linux instance using Workbench.
Run the following command to start a Docker container instance and map the eRDMA character devices to the container.
sudo docker run --net=host --device=/dev/infiniband/uverbsX --device=/dev/infiniband/rdma_cm --ulimit memlock=-1 -t -i <IMAGE ID> /bin/bashThe following table describes the parameters.
--net=host: Sets the communication mode of the container tohost. Applications in the container can directly use the network interfaces and network configurations of the host. This provides the same network communication capabilities as the host.--device=/dev/infiniband/uverbsXand--device=/dev/infiniband/rdma_cm: Expose the eRDMA user mode character devices to the container.Run the following command to view the character device name (the X in
uverbsX):ls /dev/infiniband | grep uverbs
--ulimit memlock=-1: Setsmaxlockedmemorytounlimited. This means there is no limit on the amount of memory that a non-root user can lock. This ensures that eRDMA applications can lock the required amount of memory when run by non-root users, allowing them to use the eRDMA feature effectively.<IMAGE ID>: Replace this with the ID of your Docker image. You can run thesudo docker imagescommand to view the image ID.Run the following command to view the target image ID:
sudo docker imagesIn this example, the Alibaba Cloud Linux image deployed in Step 3 is used. The command output is as follows:

Step 5: Deploy the eRDMA driver package for the Docker container
To use eRDMA in a container, you must also deploy the eRDMA user mode driver package. Otherwise, the eRDMA device information cannot be detected from within the container. Alibaba Cloud provides yum and apt sources to help you easily deploy the required packages.
Follow these steps:
Remotely connect to the instance.
For more information, see Log on to a Linux instance using Workbench.
Enter the target container.
If you ran the command in Step 4 as shown in the example, you are already inside the Docker container. You can proceed to install the user mode driver package in the container.
Run the following command to view the target container ID.
sudo docker psThis example uses the container started in Step 4. The command output is as follows:

Run the following command to enter the container.
sudo docker exec -it <CONTAINER ID> /bin/bashReplace CONTAINER ID with the target container ID that you obtained in the previous step.
After you enter the container, install the user mode driver package.
ImportantIn the following examples, http://mirrors.cloud.aliyuncs.com is an internal source address. To access the source over the Internet, replace http://mirrors.cloud.aliyuncs.com with https://mirrors.aliyun.com. Using the Internet generates Internet traffic, which may incur additional fees. For more information about the billing rules for Internet traffic, see Public bandwidth billing.
CentOS 7/CentOS 8
Run the following command in the container to create an
erdma.reposource file in the/etc/yum.repos.dfolder.sudo vim /etc/yum.repos.d/erdma.repoAdd the following content to the
erdma.repofile and save the file.[erdma] name = ERDMA Repository baseurl = http://mirrors.cloud.aliyuncs.com/erdma/yum/redhat/$releasever/erdma/$basearch/ gpgcheck = 1 enabled = 1 gpgkey = http://mirrors.cloud.aliyuncs.com/erdma/GPGKEYRun the following command to update the yum source cache.
sudo yum makecacheRun the following command to install the user mode driver package.
sudo yum install libibverbs rdma-core librdmacm libibverbs-utils -y
Alibaba Cloud Linux
Run the following command in the container to add the repository.
sudo yum-config-manager \ --add-repo \ http://mirrors.cloud.aliyuncs.com/erdma/yum/alinux/erdma.repoNoteIf the
yum-config-managercommand is not installed in your container, runsudo yum install -y yum-utilsto install the yum-utils package.yum-config-manageris part of the yum-utils package and is used to manage yum configurations.Run the following command to update the yum source cache.
sudo yum makecacheRun the following command to install the user mode driver package.
sudo yum install libibverbs rdma-core librdmacm libibverbs-utils -y
Ubuntu 18.04/20.04/22.04/24.04
Run the following command in the container to add the PGP signature.
Ubuntu 18.04/Ubuntu 20.04
wget -qO - http://mirrors.cloud.aliyuncs.com/erdma/GPGKEY | sudo apt-key add -Ubuntu 22.04/24.04
wget -qO - http://mirrors.cloud.aliyuncs.com/erdma/GPGKEY | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/erdma.gpg
Run the following command to add the apt source.
Ubuntu 18.04
echo "deb [ arch=amd64 ] http://mirrors.cloud.aliyuncs.com/erdma/apt/ubuntu bionic/erdma main" | sudo tee /etc/apt/sources.list.d/erdma.listUbuntu 20.04
echo "deb [ arch=amd64 ] http://mirrors.cloud.aliyuncs.com/erdma/apt/ubuntu focal/erdma main" | sudo tee /etc/apt/sources.list.d/erdma.listUbuntu 22.04
echo "deb [ ] http://mirrors.cloud.aliyuncs.com/erdma/apt/ubuntu jammy/erdma main" | sudo tee /etc/apt/sources.list.d/erdma.listUbuntu 24.04
echo "deb [ ] http://mirrors.cloud.aliyuncs.com/erdma/apt/ubuntu noble/erdma main" | sudo tee /etc/apt/sources.list.d/erdma.list
Run the following command to update the apt source.
sudo apt updateRun the following command to install the user mode driver.
sudo apt install libibverbs1 ibverbs-providers ibverbs-utils librdmacm1 -y
Run the following command to view the eRDMA device information from within the container.
ibv_devinfo
The output shows that the eRDMA device can be detected from within the container.
After you configure eRDMA in the Docker container, you can integrate eRDMA into TCP applications inside the container using SMC-R or NetACC to achieve application acceleration. For more information, see Application adaptation overview.
Use eRDMA Controller to deploy an eRDMA pod on a self-managed Kubernetes cluster
Step 1: Install eRDMA Controller
Run the following command on the master node to install Helm.
This tool is used to manage the installation and uninstallation of eRDMA Controller components. After the installation, run
helm versionto verify that Helm is installed.curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 chmod 700 get_helm.sh ./get_helm.sRun the following command on the master node to download the eRDMA Controller source code.
git clone https://github.com/AliyunContainerService/alibabacloud-erdma-controller.gitThe installation configuration file for eRDMA Controller is
deployment/helm/values.yaml. You can modify this file to select a working mode and configure related parameters. Two working modes are supported:Regular Mode: This mode is suitable for scenarios where eRDMA Controller must automatically manage Elastic RDMA Interfaces (ERIs) and dynamically allocate eRDMA resources.
Local Mode: This mode is suitable for scenarios where the eRDMA environment is already prepared on the Kubernetes nodes and the eRDMA devices are exposed to pods. This mode does not involve dynamic management of eRDMA resources.
Configuration items:
localERIDiscovery: Sets the working mode of the controller.true: Local Mode.false: Regular Mode (default).
credentials: In Regular Mode, the controller must access Alibaba Cloud APIs to query and configure ECS instances and ENIs. Therefore, you must first create a RAM role and grant the required permissions.AccessKey authentication is currently supported. Set
typetoaccess_key, and enter the AccessKey ID and secret.credentials: type: "access_key" accessKeyID: "{access key}" accessKeySecret: "{access key secret}"You do not need to configure this item in Local Mode.
preferDriver: Sets the eRDMA driver type used by the node.default: Default driver mode.compat: RoCE-compatible driver mode.ofed: OFED-based driver mode, suitable for GPU-accelerated instance types.
allocateAllDevices: Sets the device allocation policy in Regular Mode.true: Allocates all eRDMA devices on the node to the pod.false: Allocates one eRDMA device to the pod based on the NUMA topology.
exposedLocalERIs: Sets the eRDMA devices on the node that must be exposed to pods in Local Mode. For the configuration format, see the example.exposedLocalERIs: - i-XXX erdma_0/erdma_1 # specify instance ID and erdma devices(erdma_0/erdma_1) to expose - i-* erdma_0 # specify erdma devices(erdma_0) to expose for all unspecified nodes - i-* erdma_* # expose all existing erdma devices for all unspecified nodesImage configuration: The default
erdma-agentimage (registry.aliyuncs.com/erdma/agent) does not support Local Mode. To use Local Mode, you must build an agent image yourself and push it to a custom image repository.Run the following commands to build and push the image. The example uses Alibaba Cloud Container Registry (ACR). You must create a namespace and an image repository in advance.
docker build --tag <REGISTRY_NAME>-registry.<REGION_ID>.cr.aliyuncs.com/<NAMESPACE>/agent:<TAG> --target agent . docker push <REGISTRY_NAME>-registry.<REGION_ID>.cr.aliyuncs.com/<NAMESPACE>/agent:<TAG>
Use
helmto install theerdma-controllercomponent.helm install -f deploy/helm/values.yaml --namespace kube-system alibaba-erdma-controller deploy/helm/Verify the installation.
After the installation in Regular Mode is complete, verify that the
erdma-agentanderdma-controllerpods are created:kubectl get pods -n kube-system | grep erdmaQuery the eRDMA device resources on the node:
kubectl get erdmadevicesIn Local Mode, only the
erdma-agentpod is created. Theerdma-controllerpod is not created. Therefore,erdmadevicesresources are not available.
Step 2: Create a pod that supports eRDMA network acceleration
To create a pod that supports eRDMA network acceleration, declare
aliyun/erdma: 1in theresources.limitssection of the container. The following example shows a sample configuration. Replace<ERDMA_POD_IMAGE>with the address of the container image that you use:apiVersion: apps/v1 kind: Deployment metadata: labels: app: erdma name: erdma spec: replicas: 1 selector: matchLabels: app: erdma template: metadata: labels: app: erdma annotations: spec: containers: - command: - sleep - "360000" image: <ERDMA_POD_IMAGE> name: erdma resources: limits: aliyun/erdma: 1To enable transparent acceleration with SMC-R, add the
network.alibabacloud.com/erdma-smcr: "true"annotation. This feature requires the operating system to be Alibaba Cloud Linux 3 with a kernel version of 5.10.134-17 or later.Based on the preceding configuration file, run the following command to deploy the deployment. The deployment starts the specified number of pod replicas.
kubectl apply -f example.yaml
References
If you want to configure and use eRDMA in an ACK cluster, see Use eRDMA to accelerate container networks in ACK clusters.
