All Products
Search
Document Center

Container Service for Kubernetes:Run GDR applications on eRDMA nodes in ACK clusters

Last Updated:Dec 20, 2024

GPU Direct RDMA (GDR) is a technology developed by NVIDIA for high-performance computing and deep learning. It allows GPUs to directly exchange data with devices that support Remote Direct Memory Access (RDMA) without involving the CPU, such as other GPUs or accelerators. This topic describes how to run GDR applications on elastic RDMA (eRDMA) nodes in Container Service for Kubernetes (ACK) clusters.

Prerequisites

Procedure

  1. Use Arena to submit an inference task.

    arena submit mpijob \
      --name=mpi-allreduce-sync-erdma \
      --device=aliyun/erdma=1 \
      -e NCCL_DEBUG=TRACE \
      -e OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 \
      -e OMPI_ALLOW_RUN_AS_ROOT=1 \
      --gpus=8 \
      --memory=16Gi \
      --hostNetwork true \
      --cpu=4 \
      --workers=2 \
      --image=registry.cn-beijing.aliyuncs.com/acs/horovod:0.28.1-tf2.9.2-torch1.12.1-py3.8-erdma \
      --toleration all \
      "mpirun -np 2 \
      --allow-run-as-root \
      --mca btl_tcp_if_include bond0 \
      --mca oob_tcp_if_include bond0 \
      --mca pml ob1 \
      --mca btl ^openib \
      python /examples/pytorch/pytorch_synthetic_benchmark.py"

    Expected output:

    iZ2zeg0kcgyxepyc5r63kgZ:17:28 [0] NCCL INFO NET/IB : Using [0]rocep26s0:1/RoCE ; OOB eth0:192.168.8.128<0>
    iZ2zeg0kcgyxepyc5r63kgZ:17:28 [0] NCCL INFO Using network IB
    iZ2zeg0kcgyxepyc5r63kgZ:18:27 [1] NCCL INFO NET/IB : Using [0]rocep26s0:1/RoCE ; OOB eth0:192.168.8.128<0>
    iZ2zeg0kcgyxepyc5r63kgZ:18:27 [1] NCCL INFO Using network IB

    The log of the job indicates that an eRDMA device is identified during the NVIDIA Collective Communication Library (NCCL) initialization. The eRDMA device runs in RoCE mode and uses eRDMA InfiniBand (BI) for network communication.

  2. Query information about eRDMA interfaces (ERIs) on the host.

    $ ibv_devinfo
    hca_id:	rocep156s0
    	transport:			eRDMA
    	fw_ver:				0.2.0
    	node_guid:			0216:3eff:fe2c:b8f3
    	sys_image_guid:			0216:3eff:fe2c:b8f3
    	vendor_id:			0x1ded
    	vendor_part_id:			4223
    	hw_ver:				0x0
    	phys_port_cnt:			1
    		port:	1
    			state:			PORT_DOWN (1)
    			max_mtu:		1024 (3)
    			active_mtu:		1024 (3)
    			sm_lid:			0
    			port_lid:		0
    			port_lmc:		0x00
    			link_layer:		Ethernet
    
    hca_id:	rocep26s0
    	transport:			eRDMA
    	fw_ver:				0.2.0
    	node_guid:			0216:3eff:fe10:f8b0
    	sys_image_guid:			0216:3eff:fe10:f8b0
    	vendor_id:			0x1ded
    	vendor_part_id:			4223
    	hw_ver:				0x0
    	phys_port_cnt:			1
    		port:	1
    			state:			PORT_ACTIVE (4)
    			max_mtu:		1024 (3)
    			active_mtu:		1024 (3)
    			sm_lid:			0
    			port_lid:		0
    			port_lmc:		0x00
    			link_layer:		Ethernet
  3. Use eadm to monitor eRDMA traffic on the host.

    $ eadm stat -d rocep26s0 -l
    Monitoring rocep26s0...    (press CTRL-C to stop)
    
     15:59:56  rx:           0 B/s     0 p/s          tx:           0 B/s     0 p/s
    
    
     rocep26s0  /  traffic statistics
    
                               rx         |       tx
    --------------------------------------+------------------
      bytes                    11.06 KiB  |       11.18 KiB
    --------------------------------------+------------------
              max            15.43 KiB/s  |     15.10 KiB/s
          average             4.03 KiB/s  |      4.07 KiB/s
              min                  0 B/s  |           0 B/s
    --------------------------------------+------------------
      packets                    8406769  |         8546764
    --------------------------------------+------------------
              max              38990 p/s  |       37488 p/s
          average               2988 p/s  |        3038 p/s
              min                  0 p/s  |           0 p/s
    --------------------------------------+------------------
      time                 33.78 minutes

    The preceding output indicates that eRDMA traffic is detected in real time.