All Products
Search
Document Center

Elastic Compute Service:Enable eRDMA in Docker containers

Last Updated:Nov 18, 2025

eRDMA (Elastic Remote Direct Memory Access) is a high-performance network communication technology. Using eRDMA in a Docker container allows applications to bypass the operating system kernel and directly access the host's physical eRDMA devices. This provides faster data transmission and improves communication efficiency. This is suitable for scenarios that require large-scale data transmission and high-performance network communication in containers. This topic describes how to configure eRDMA in Docker containers. It also describes how to use the eRDMA Controller component to quickly configure eRDMA on pods in a self-managed Kubernetes cluster.

Limits

The eRDMA feature is supported only on the following Docker images.

  • Image sources for Alibaba Cloud Linux 3, Ubuntu 22.04, and Ubuntu 24.04 (supported on both ARM and x86 instances)

  • Image sources for Alibaba Cloud Linux 2, CentOS 7, CentOS 8, Ubuntu 18.04, Ubuntu 20.04, and Ubuntu 24.04 (supported only on x86 instances)

Note

Image sources for Anolis OS are not supported.

How it works

To use the eRDMA feature in a container environment, you can use the --device option of Docker to map the /dev/infiniband/rdma_cm and /dev/infiniband/uverbsX character devices to the container. This allows user mode programs inside the container to bypass the operating system kernel and directly access the eRDMA device to send and receive data.

  • /dev/infiniband/rdma_cm: A character device for eRDMA connection management. User mode programs can perform operations on this character device to establish, destroy, and manage connections with the eRDMA device. These operations include creating and destroying connections, and sending and receiving connection events.

  • /dev/infiniband/uverbsX: A character device for user space eRDMA operations. User mode programs can perform operations on this character device to communicate with the eRDMA device. These operations include opening the device, creating and destroying eRDMA communication endpoints, and registering and unregistering memory buffers.

    Note

    In /dev/infiniband/uverbsX, X is the device index number and may vary based on the system and configuration. You can run the ls /dev/infiniband | grep uverbs command to view the character device name.

Configure eRDMA in a Docker container

Step 1: Configure eRDMA for the instance

Confirm that the instance type where Docker is located supports eRDMA. Also, confirm that an Elastic RDMA Interface is attached, the eRDMA driver is deployed, and the eRDMA device is working correctly.

Step 2: (Optional) Deploy Docker on the instance

If Docker is not deployed on your instance, install it.

Follow these steps:

  1. Remotely connect to the instance.

    For more information, see Log on to a Linux instance using Workbench.

  2. Run the following command to check whether a Docker environment is deployed on the instance.

    sudo docker -v

    If Docker is deployed correctly, a specific version number is returned, as shown in the following figure:

    image

    If Docker is not deployed or an error occurs, the following result may appear. In this case, see Deploy Docker on the instance.

    image

  3. Deploy Docker on the instance.

Step 3: (Optional) Deploy an image for Docker

If your Docker environment does not have any images, deploy one as needed.

This topic uses an Alibaba Cloud Linux base image as an example to show how to download an Alibaba Cloud Linux image in Docker. This operation requires Internet access.

Follow these steps:

  1. Install and use Docker and Docker Compose.

  2. Run the following command to download an Alibaba Cloud Linux Docker image:

    sudo docker pull alibaba-cloud-linux-<image_version>-registry.<region_ID>.cr.aliyuncs.com/alinux<image_version>/alinux<image_version>:<TAG>
    • <image_version>: the Alibaba Cloud Linux version. Example: 2 or 3.

    • <region_ID>: the region ID of the Docker image. Example: cn-hangzhou.

    • <TAG>: optional. The tag of the Docker image. If you specify this parameter, the specified Docker image version is downloaded. Otherwise, the latest Docker image version is downloaded.

    View the region and version information of an image

    1. Go to Container Registry - Artifact Center.

    2. To view information about the Alibaba Cloud Linux 2 Docker image, click alinux2/alinux2. To view information about the Alibaba Cloud Linux 3 Docker image, click alinux3/alinux3.

      For example, the following figure shows information about the Alibaba Cloud Linux 3 Docker image. Section ① shows the region of the Docker image, and section ② shows the version information of the Docker image.

      image.png

    Sample commands:

    • Download version 220901.1 of the Alibaba Cloud Linux 3 Docker image in the China (Hangzhou) region:

      sudo docker pull alibaba-cloud-linux-3-registry.cn-hangzhou.cr.aliyuncs.com/alinux3/alinux3:220901.1
    • Download the latest version of the Alibaba Cloud Linux 2 Docker image in the China (Hangzhou) region:

      sudo docker pull alibaba-cloud-linux-2-registry.cn-hangzhou.cr.aliyuncs.com/alinux2/alinux2
  3. Run the following command to check whether the images are downloaded:

    sudo docker images

    The following output shows that version 220901.1 of the Alibaba Cloud Linux 3 Docker image and the latest version of the Alibaba Cloud Linux 2 Docker image are downloaded.

    image.png

Step 4: Start the container and attach the eRDMA device

You can use the --device option of Docker to map the /dev/infiniband/rdma_cm and /dev/infiniband/uverbsX character devices to the container. This allows user mode programs inside the container to bypass the operating system kernel and directly access the eRDMA device to send and receive data. For more information, see How it works.

Follow these steps:

  1. Remotely connect to the ECS instance.

    For more information, see Log on to a Linux instance using Workbench.

  2. Run the following command to start a Docker container instance and map the eRDMA character devices to the container.

    sudo docker run --net=host --device=/dev/infiniband/uverbsX --device=/dev/infiniband/rdma_cm --ulimit memlock=-1 -t -i <IMAGE ID> /bin/bash

    The following table describes the parameters.

    • --net=host: Sets the communication mode of the container to host. Applications in the container can directly use the network interfaces and network configurations of the host. This provides the same network communication capabilities as the host.

    • --device=/dev/infiniband/uverbsX and --device=/dev/infiniband/rdma_cm: Expose the eRDMA user mode character devices to the container.

      Run the following command to view the character device name (the X in uverbsX ):

      ls /dev/infiniband | grep uverbs

      image

    • --ulimit memlock=-1: Sets maxlockedmemory to unlimited. This means there is no limit on the amount of memory that a non-root user can lock. This ensures that eRDMA applications can lock the required amount of memory when run by non-root users, allowing them to use the eRDMA feature effectively.

    • <IMAGE ID>: Replace this with the ID of your Docker image. You can run the sudo docker images command to view the image ID.

      Run the following command to view the target image ID:

      sudo docker images

      In this example, the Alibaba Cloud Linux image deployed in Step 3 is used. The command output is as follows:

      image

    Example command

    sudo docker run --net=host --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm --ulimit memlock=-1 -t -i e8d9a60b6967 /bin/bash

Step 5: Deploy the eRDMA driver package for the Docker container

To use eRDMA in a container, you must also deploy the eRDMA user mode driver package. Otherwise, the eRDMA device information cannot be detected from within the container. Alibaba Cloud provides yum and apt sources to help you easily deploy the required packages.

Follow these steps:

  1. Remotely connect to the instance.

    For more information, see Log on to a Linux instance using Workbench.

  2. Enter the target container.

    If you ran the command in Step 4 as shown in the example, you are already inside the Docker container. You can proceed to install the user mode driver package in the container.

    1. Run the following command to view the target container ID.

      sudo docker ps

      This example uses the container started in Step 4. The command output is as follows:

      image

    2. Run the following command to enter the container.

      sudo docker exec -it <CONTAINER ID> /bin/bash

      Replace CONTAINER ID with the target container ID that you obtained in the previous step.

  3. After you enter the container, install the user mode driver package.

    Important

    In the following examples, http://mirrors.cloud.aliyuncs.com is an internal source address. To access the source over the Internet, replace http://mirrors.cloud.aliyuncs.com with https://mirrors.aliyun.com. Using the Internet generates Internet traffic, which may incur additional fees. For more information about the billing rules for Internet traffic, see Public bandwidth billing.

    CentOS 7/CentOS 8

    1. Run the following command in the container to create an erdma.repo source file in the /etc/yum.repos.d folder.

      sudo vim /etc/yum.repos.d/erdma.repo
    2. Add the following content to the erdma.repo file and save the file.

      [erdma]
      name = ERDMA Repository
      baseurl = http://mirrors.cloud.aliyuncs.com/erdma/yum/redhat/$releasever/erdma/$basearch/
      gpgcheck = 1
      enabled = 1
      gpgkey = http://mirrors.cloud.aliyuncs.com/erdma/GPGKEY
    3. Run the following command to update the yum source cache.

      sudo yum makecache
    4. Run the following command to install the user mode driver package.

      sudo yum install  libibverbs rdma-core librdmacm libibverbs-utils -y 

    Alibaba Cloud Linux

    1. Run the following command in the container to add the repository.

      sudo yum-config-manager \
       --add-repo \
       http://mirrors.cloud.aliyuncs.com/erdma/yum/alinux/erdma.repo
      Note

      If the yum-config-manager command is not installed in your container, run sudo yum install -y yum-utils to install the yum-utils package. yum-config-manager is part of the yum-utils package and is used to manage yum configurations.

    2. Run the following command to update the yum source cache.

      sudo yum makecache
    3. Run the following command to install the user mode driver package.

      sudo yum install  libibverbs rdma-core librdmacm libibverbs-utils -y

    Ubuntu 18.04/20.04/22.04/24.04

    1. Run the following command in the container to add the PGP signature.

      • Ubuntu 18.04/Ubuntu 20.04

        wget -qO - http://mirrors.cloud.aliyuncs.com/erdma/GPGKEY | sudo apt-key add -
      • Ubuntu 22.04/24.04

        wget -qO - http://mirrors.cloud.aliyuncs.com/erdma/GPGKEY | sudo gpg --dearmour -o /etc/apt/trusted.gpg.d/erdma.gpg
    2. Run the following command to add the apt source.

      • Ubuntu 18.04

        echo "deb [ arch=amd64 ] http://mirrors.cloud.aliyuncs.com/erdma/apt/ubuntu bionic/erdma main" | sudo tee /etc/apt/sources.list.d/erdma.list
      • Ubuntu 20.04

        echo "deb [ arch=amd64 ] http://mirrors.cloud.aliyuncs.com/erdma/apt/ubuntu focal/erdma main" | sudo tee /etc/apt/sources.list.d/erdma.list
      • Ubuntu 22.04

        echo "deb [ ] http://mirrors.cloud.aliyuncs.com/erdma/apt/ubuntu jammy/erdma main" | sudo tee /etc/apt/sources.list.d/erdma.list
      • Ubuntu 24.04

        echo "deb [ ] http://mirrors.cloud.aliyuncs.com/erdma/apt/ubuntu noble/erdma main" | sudo tee /etc/apt/sources.list.d/erdma.list
    3. Run the following command to update the apt source.

      sudo apt update
    4. Run the following command to install the user mode driver.

      sudo apt install libibverbs1 ibverbs-providers ibverbs-utils librdmacm1 -y 
  4. Run the following command to view the eRDMA device information from within the container.

    ibv_devinfo

    image

    The output shows that the eRDMA device can be detected from within the container.

After you configure eRDMA in the Docker container, you can integrate eRDMA into TCP applications inside the container using SMC-R or NetACC to achieve application acceleration. For more information, see Application adaptation overview.

Use eRDMA Controller to deploy an eRDMA pod on a self-managed Kubernetes cluster

Step 1: Install eRDMA Controller

  1. Run the following command on the master node to install Helm.

    This tool is used to manage the installation and uninstallation of eRDMA Controller components. After the installation, run helm version to verify that Helm is installed.
    curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
    chmod 700 get_helm.sh
    ./get_helm.s
  2. Run the following command on the master node to download the eRDMA Controller source code.

    git clone https://github.com/AliyunContainerService/alibabacloud-erdma-controller.git
  3. The installation configuration file for eRDMA Controller is deployment/helm/values.yaml. You can modify this file to select a working mode and configure related parameters. Two working modes are supported:

    • Regular Mode: This mode is suitable for scenarios where eRDMA Controller must automatically manage Elastic RDMA Interfaces (ERIs) and dynamically allocate eRDMA resources.

    • Local Mode: This mode is suitable for scenarios where the eRDMA environment is already prepared on the Kubernetes nodes and the eRDMA devices are exposed to pods. This mode does not involve dynamic management of eRDMA resources.

    Configuration items:

    • localERIDiscovery: Sets the working mode of the controller.

      • true: Local Mode.

      • false: Regular Mode (default).

    • credentials: In Regular Mode, the controller must access Alibaba Cloud APIs to query and configure ECS instances and ENIs. Therefore, you must first create a RAM role and grant the required permissions.

      AccessKey authentication is currently supported. Set type to access_key, and enter the AccessKey ID and secret.

      credentials:
        type: "access_key"
        accessKeyID: "{access key}"
        accessKeySecret: "{access key secret}"

      You do not need to configure this item in Local Mode.

    • preferDriver: Sets the eRDMA driver type used by the node.

      • default: Default driver mode.

      • compat: RoCE-compatible driver mode.

      • ofed: OFED-based driver mode, suitable for GPU-accelerated instance types.

    • allocateAllDevices: Sets the device allocation policy in Regular Mode.

      • true: Allocates all eRDMA devices on the node to the pod.

      • false: Allocates one eRDMA device to the pod based on the NUMA topology.

    • exposedLocalERIs: Sets the eRDMA devices on the node that must be exposed to pods in Local Mode. For the configuration format, see the example.

      exposedLocalERIs:
        - i-XXX erdma_0/erdma_1 # specify instance ID and erdma devices(erdma_0/erdma_1) to expose
        - i-* erdma_0           # specify erdma devices(erdma_0) to expose for all unspecified nodes
        - i-* erdma_*           # expose all existing erdma devices for all unspecified nodes
    • Image configuration: The default erdma-agent image (registry.aliyuncs.com/erdma/agent) does not support Local Mode. To use Local Mode, you must build an agent image yourself and push it to a custom image repository.

      Run the following commands to build and push the image. The example uses Alibaba Cloud Container Registry (ACR). You must create a namespace and an image repository in advance.

      docker build --tag <REGISTRY_NAME>-registry.<REGION_ID>.cr.aliyuncs.com/<NAMESPACE>/agent:<TAG> --target agent .
      
      docker push <REGISTRY_NAME>-registry.<REGION_ID>.cr.aliyuncs.com/<NAMESPACE>/agent:<TAG>
  4. Use helm to install the erdma-controller component.

    helm install -f deploy/helm/values.yaml --namespace kube-system alibaba-erdma-controller deploy/helm/
  5. Verify the installation.

    • After the installation in Regular Mode is complete, verify that the erdma-agent and erdma-controller pods are created:

      kubectl get pods -n kube-system | grep erdma

      Query the eRDMA device resources on the node:

      kubectl get erdmadevices
    • In Local Mode, only the erdma-agent pod is created. The erdma-controller pod is not created. Therefore, erdmadevices resources are not available.

Step 2: Create a pod that supports eRDMA network acceleration

  1. To create a pod that supports eRDMA network acceleration, declare aliyun/erdma: 1 in the resources.limits section of the container. The following example shows a sample configuration. Replace <ERDMA_POD_IMAGE> with the address of the container image that you use:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: erdma
      name: erdma
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: erdma
      template:
        metadata:
          labels:
            app: erdma
          annotations:
        spec:
          containers:
          - command:
            - sleep
            - "360000"
            image: <ERDMA_POD_IMAGE>
            name: erdma
            resources:
              limits:
                aliyun/erdma: 1
    To enable transparent acceleration with SMC-R, add the network.alibabacloud.com/erdma-smcr: "true" annotation. This feature requires the operating system to be Alibaba Cloud Linux 3 with a kernel version of 5.10.134-17 or later.
  2. Based on the preceding configuration file, run the following command to deploy the deployment. The deployment starts the specified number of pod replicas.

    kubectl apply -f example.yaml

References

If you want to configure and use eRDMA in an ACK cluster, see Use eRDMA to accelerate container networks in ACK clusters.