This topic describes the basic concepts, benefits, scenarios, and specifications of Elastic Remote Direct Memory Access (eRDMA).
Introduction
What is eRDMA?
eRDMA is an elastic Remote Direct Memory Access (RDMA) network developed by Alibaba Cloud for the cloud. eRDMA reuses virtual private clouds (VPCs) as the underlying link and uses a congestion control (CC) algorithm that is developed by Alibaba Cloud. eRDMA features high throughput and low latency based on RDMA supports. Compared with RDMA, eRDMA implements large-scale RDMA networking within seconds. eRDMA supports traditional high-performance computing (HPC) applications and Transmission Control Protocol/Internet Protocol (TCP/IP) applications.
You can use eRDMA to deploy HPC applications in the cloud to build high-performance application clusters that have high elasticity at low costs. You can also replace a VPC with an eRDMA network to accelerate applications.
Implementation of eRDMA capabilities
The capabilities of eRDMA must be implemented based on the instance types that support eRDMA. You can create and bind eRDMA-capable elastic network interfaces (ENIs) to Elastic Compute Service (ECS) instances of the instance types to provide large-scale RDMA network service capabilities.
Elastic RDMA Interfaces (ERIs) are virtual network interfaces that can be bound to ECS instances. ERIs must depend on ENIs to enable RDMA devices. An ERI reuses the network to which an ENI belongs. This allows you to use the RDMA feature in the original network and enjoy the low latency provided by RDMA without the need to modify business networking.
Benefits
eRDMA provides the following benefits:
High performance
RDMA bypasses the kernel stack to transfer data from user-mode programs to Host Channel Adapter (HCA) for network transmission. This greatly reduces the CPU load and latency. eRDMA provides the advantages of traditional RDMA interfaces and applies RDMA to VPCs. eRDMA features ultra-low latency that RDMA provides to cloud networks.
NoteAn HCA is a hardware network interface card (NIC) that connects a server to a network and provides support for RDMA.
Inclusiveness
You can enable eRDMA free of charge. To enable eRDMA, you need to only select the Elastic RDMA Interface option when you purchase an ECS instance.
Large-scale deployment
Traditional RDMA is based on lossless networks. This makes large-scale deployment costly and difficult. eRDMA uses the CC algorithm developed by Alibaba Cloud to control transmission quality in VPCs, such as latency and packet loss. eRDMA provides good performance in lossy networks.
Scalability
Compared with RDMA that requires a separate hardware NIC, eRDMA uses an RDMA HCA card that has cloud attributes based on the Shenlong architecture. eRDMA can dynamically add devices when you use ECS and supports hot migration, which allows for flexible deployment.
Shared VPCs
eRDMA depends on ENIs and reuses networks to which ENIs belong. This allows you to activate the RDMA feature in legacy networks without the need to modify service networking.
Scenarios
The TCP/IP protocol stack provides mainstream network communication protocols based on which many applications are built. With the development of business that is related to data centers, higher requirements are imposed on network performance, such as lower latency and higher throughput. TCP/IP has become a bottleneck that restricts the performance of communication networks due to limits such as high copy overheads, cross-protocol stack processing, complex CC algorithm, and frequent context switching.
RDMA helps resolve the preceding pain points. RDMA provides features, such as zero-copy and kernel bypass, to prevent overheads when data is copied and context is frequently switched. Compared with TCP/IP communication, RDMA features low latency, high throughput, and low CPU utilization. However, RDMA has a few use scenarios due to high prices and O&M costs.
Alibaba Cloud eRDMA is designed to have inclusive compatibility with diverse cloud environments. eRDMA provides low latency and lowers requirements for a wide range of applications to adapt to cloud environments to enhance their performance. Compared with traditional RDMA, eRDMA can be used in a wide range of scenarios, such as Redis-based cache databases, Spark-based big data analytics, Weather Research and Forecasting Model (WRF) in HPC, and AI training. eRDMA offers considerable performance gains in the preceding scenarios.
Limits
Before you use eRDMA, make sure that the following conditions are met. For more information, see Configure eRDMA on an enterprise-level instance.
Basic specifications
This section describes the specifications of eRDMA. When you use eRDMA, make sure that the service specification requirements are met. Otherwise, your applications may not work as expected.
RDMA QP
Specification | Content | Description |
Maximum QPs (max_qp_num) | Up to 131,071 queue pairs (QPs) are supported. | The maximum number of QPs varies based on the instance type. |
Maximum outstanding WRs to the send queue (max_send_wr) | 8,192 | The maximum number of outstanding work requests (WRs) that can be posted to the send queue. |
Maximum outstanding WRs to the receive queue (max_recv_wr) | 32,768 | The maximum number of outstanding WRs that can be posted to the receive queue. |
Maximum SGEs in a send WR (max_send_sge) | 6 | The maximum number of scatter-gather elements (SGEs) in a send WR. |
Maximum SGEs in a receive WR (max_recv_sge) | 1 | The maximum number of SGEs in a receive WR. |
SRQ | Not supported. | None. |
QP type | Reliable connected (RC) | None. |
Connection establishment method | RDMA_CM | None. |
RDMA CQ
Specification | Content | Description |
CQs | The number of completion queues (CQs) varies based on the instance type. The maximum number of CQs is twice the number of QPs. | None. |
Vectors in a CQ (vector_num) | The number of vectors in a CQ varies based on the instance type. The maximum number of vectors in a CQ is 31. The number of CPUs is related to the number of QPs. |
|
Maximum CEQ depth | 256 |
|
Maximum CQ depth | 1,048,576 | None. |
RDMA MR
Specification | Content |
MRs | The number of memory regions (MRs) vary based on the instance type. The maximum number of MRs is twice the number of QPs. |
MWs | Not supported |
Max MR size | The size of MRs varies based on the underlying hardware. The minimum supported MR size is 2 GB and the maximum supported MR size is 64 GB. |
Supported RDMA Verbs opcode
Opcode | Supported |
RDMA Write | Yes. |
RDMA Write with Immediate | Yes. |
RDMA Read | Yes. |
Send | Yes. |
Send with Invalidate | Yes. |
Send with Immediate | Yes. |
Send with Solicited Event | Yes. |
Local Invalidate | Only kernel-mode Verbs opcode is supported. |
Atomic Operation | No. |