You can run applications that use Google Remote Procedure Call (gRPC) and Remote Direct Memory Access (RDMA) verbs on Enhanced Remote Direct Memory Access (eRDMA) nodes to enable applications to communicate through RDMA instead of gRPC. This reduces the latency between the parameter server and worker nodes, and accelerates distributed training.
Prerequisites
The Arena client is installed in hostNetwork mode.
ACK eRDMA Controller is installed and configured on nodes. For more information, see Use eRDMA to accelerate container networking.
Procedure
The following procedure uses the tf_cnn_benchmark job as an example.
Submit the TensorFlow job that uses eRDMA.
arena submit tfjob --name=tf-ps-benchmark \ --gpus=8 --workers=1 --ps=1 \ --device=aliyun/erdma=1 \ --hostNetwork true \ --psImage=registry.cn-beijing.aliyuncs.com/acs/tf-benchmark:1.0 \ --image=registry.cn-beijing.aliyuncs.com/acs/tf-benchmark:1.0 \ "CUDA_VISIBLE_DEVICES= python benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \ --server_protocol=grpc+verbs \ --model=resnet50 \ --batch_size=16 \ --data_format=NHWC"Query the eRDMA interface.
$ ibv_devinfo hca_id: rocep156s0 transport: eRDMA fw_ver: 0.2.0 node_guid: 0216:3eff:fe2c:b8f3 sys_image_guid: 0216:3eff:fe2c:b8f3 vendor_id: 0x1ded vendor_part_id: 4223 hw_ver: 0x0 phys_port_cnt: 1 port: 1 state: PORT_DOWN (1) max_mtu: 1024 (3) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: Ethernet hca_id: rocep26s0 transport: eRDMA fw_ver: 0.2.0 node_guid: 0216:3eff:fe10:f8b0 sys_image_guid: 0216:3eff:fe10:f8b0 vendor_id: 0x1ded vendor_part_id: 4223 hw_ver: 0x0 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu: 1024 (3) active_mtu: 1024 (3) sm_lid: 0 port_lid: 0 port_lmc: 0x00 link_layer: EthernetMonitor eRDMA traffic.
$ eadm stat -d rocep26s0 -l Monitoring rocep26s0... (press CTRL-C to stop) 15:59:56 rx: 0 B/s 0 p/s tx: 0 B/s 0 p/s rocep26s0 / traffic statistics rx | tx --------------------------------------+------------------ bytes 11.06 GiB | 11.18 GiB --------------------------------------+------------------ max 52.43 MiB/s | 52.10 MiB/s average 4.03 MiB/s | 4.07 MiB/s min 0 B/s | 0 B/s --------------------------------------+------------------ packets 8406769 | 8546764 --------------------------------------+------------------ max 38990 p/s | 37488 p/s average 2988 p/s | 3038 p/s min 0 p/s | 0 p/s --------------------------------------+------------------ time 46.88 minutesThe preceding output indicates that eRDMA traffic is identified in real time.