×
Community Blog Using RDMA on Container Service for Kubernetes

Using RDMA on Container Service for Kubernetes

In this tutorial, we'll discuss emote direct memory access (RDMA) is developed to handle the latency of data processing on servers during network transmission.

What Is RDMA?

Remote direct memory access (RDMA) is developed to handle the latency of data processing on servers during network transmission.

In RDMA, data to be transmitted is transferred directly from the memory of one computer to that of another computer, without involving any operating systems or protocol stacks. Because the communication process bypasses operating systems and protocol stacks, RDMA can greatly lower the CPU usage, decrease memory replication in the kernel, and reduce context switches between the user mode and kernel mode.

Common RDMA implementations include RDMA over Converged Ethernet (RoCE), InfiniBand, and iWARP.

1

Alibaba Cloud's Support for RDMA

Alibaba Cloud supports Super Computing Cluster (SCC), RoCE, and Virtual Private Cloud (VPC). RoCE is dedicated to RDMA communication. SCC is mainly used in high-performance computing, artificial intelligence, machine learning, scientific computing, engineering computing, data analysis, audio and video processing, and other scenarios.

RoCE can provide a network speed comparable with the network performance of InfiniBand. It can also support more Ethernet-based applications.

Learn more about Alibaba Cloud ECS Bare Metal Instance and Super Computing Clusters at https://www.alibabacloud.com/help/doc-detail/60576.htm

You can directly purchase a yearly or monthly package of SCC virtual machines on the Elastic Compute Service (ECS) console. For more information, visit https://www.alibabacloud.com/help/doc-detail/61978.htm

Container Service's Support for RDMA

Currently, Alibaba Cloud Container Service supports RDMA. You can add SCC ECS instances to a container cluster and deploy an RDMA device plug-in to support RDMA at the scheduling level.

You can run the resourcesLimit rdma/hca: 1 statement to schedule containers to RDMA ECS instances.

Create a Container Cluster

Log on to the Container Service console, and then create a Kubernetes cluster. Because SCC is currently supported only in Shanghai, you need to select China East 2 (Shanghai) for the region of the container cluster to be created. After setting other parameters, click to create the cluster and wait until it is successfully created.

Deploy an RDMA Device Plug-in

On the Container Service console, use a template to deploy a plug-in. Deploy a device plug-in that supports RDMA. Select the corresponding cluster and namespace. The template is shown in the following figure.

apiVersion: v1
kind: ConfigMap
metadata:
  name: rdma-devices
  namespace: kube-system
data:
  config.json: |
    {
        "mode" : "hca"
    }

--- 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: rdma-device-plugin
  namespace: kube-system
spec:
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        name: rdma-sriov-dp-ds
    spec:
      hostNetwork: true
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      containers:
      - image: registry.cn-shanghai.aliyuncs.com/acs/rdma-device-plugin
        name: k8s-rdma-device-plugin
        imagePullPolicy: IfNotPresent
        securityContext:
          privileged: true
        volumeMounts:
          - name: device-plugin
            mountPath: /var/lib/kubelet/device-plugins
          - name: config
            mountPath: /k8s-rdma-sriov-dev-plugin
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
        - name: config
          configMap:
            name: rdma-devices
            items:
            - key: config.json
              path: config.json

Manually Add an SCC ECS Instance to the Cluster

  • Create an SCC ECS instance on the same VPC as the container cluster. Add the ECS instance to the same security group as the container cluster.
  • On the Container Service console, choose Cluster > More > Add Existing ECS. Obtain the script for manually adding the created RDMA ECS instance.
  • Log on to ECS, and then run the script. If the ECS instance is successfully added, the result is shown in the following figure.

    2

  • Choose Node > Label Management. In the Add dialog box, add the aliyun.accelerator/rdma: true label to the RDMA node.

    3

  • After the label is added, you can see that the device plug-in pod on the RDMA node has been successfully allocated to the SCC node.

Deploy Two Test Images

apiVersion: v1
kind: Pod
metadata:
  name: rdma-test-pod
spec:
  restartPolicy: OnFailure
  containers:
  - image: mellanox/centos_7_4_mofed_4_2_1_2_0_0_60
    name: mofed-test-ctr
    securityContext:
      capabilities:
        add: [ "IPC_LOCK" ]
    resources:
      limits:
        rdma/hca: 1
    command:
    - sh
    - -c
    - |
      ls -l /dev/infiniband /sys/class/net
      sleep 1000000
---

apiVersion: v1
kind: Pod
metadata:
  name: rdma-test-pod-1
spec:
  restartPolicy: OnFailure
  containers:
  - image: mellanox/centos_7_4_mofed_4_2_1_2_0_0_60
    name: mofed-test-ctr
    securityContext:
      capabilities:
        add: [ "IPC_LOCK" ]
    resources:
      limits:
        rdma/hca: 1
    command:
    - sh
    - -c
    - |
      ls -l /dev/infiniband /sys/class/net
      sleep 1000000

Run ib\_read\_bw -q 30 in a container.

4

Run ib\_read\_bw -q 30 <IP address of the preceding container> in another container.

5

Test results show that data can be transmitted between two containers through RDMA. The bandwidth is 5,500 Mbit/s, which is about 44 Gbit/s.

Note: An RDMA communication connection is usually established through TCP or RDMA_CM. If an application chooses the RDMA_CM mode, the assigned IP address of the pod in the VPC plug-in cannot be used as the RDMA_CM address. You need to configure a host network for the container and set bond0 ip as the RDMA_CM communication address.

1 1 1
Share on

Alibaba Container Service

120 posts | 26 followers

You may also like

Comments

Raja_KT February 15, 2019 at 3:41 am

Good one. I hope you can demonstrate the data transfer from node1 to node2 , bypassing the kernel.