Basic concepts, benefits, use scenarios, and specifications of eRDMA - Elastic Compute Service

This topic describes the basic concepts, benefits, scenarios, and specifications of Elastic Remote Direct Memory Access (eRDMA).

Introduction

What is eRDMA?

eRDMA is an elastic Remote Direct Memory Access (RDMA) network developed by Alibaba Cloud for the cloud. eRDMA reuses virtual private clouds (VPCs) as the underlying link and uses a congestion control (CC) algorithm that is developed by Alibaba Cloud. eRDMA features high throughput and low latency based on RDMA supports. Compared with RDMA, eRDMA implements large-scale RDMA networking within seconds. eRDMA supports traditional high-performance computing (HPC) applications and Transmission Control Protocol/Internet Protocol (TCP/IP) applications.

You can use eRDMA to deploy HPC applications in the cloud to build high-performance application clusters that have high elasticity at low costs. You can also replace a VPC with an eRDMA network to accelerate applications.

Implementation of eRDMA capabilities

The capabilities of eRDMA must be implemented based on the instance types that support eRDMA. You can create and bind eRDMA-capable elastic network interfaces (ENIs) to Elastic Compute Service (ECS) instances of the instance types to provide large-scale RDMA network service capabilities.

Elastic RDMA Interfaces (ERIs) are virtual network interfaces that can be bound to ECS instances. ERIs must depend on ENIs to enable RDMA devices. An ERI reuses the network to which an ENI belongs. This allows you to use the RDMA feature in the original network and enjoy the low latency provided by RDMA without the need to modify business networking.

Benefits

eRDMA provides the following benefits:

High performance
RDMA bypasses the kernel stack to transfer data from user-mode programs to Host Channel Adapter (HCA) for network transmission. This greatly reduces the CPU load and latency. eRDMA provides the advantages of traditional RDMA interfaces and applies RDMA to VPCs. eRDMA features ultra-low latency that RDMA provides to cloud networks.
Note
An HCA is a hardware network interface card (NIC) that connects a server to a network and provides support for RDMA.
Inclusiveness
You can enable eRDMA free of charge. To enable eRDMA, you need to only select the Elastic RDMA Interface option when you purchase an ECS instance.
Large-scale deployment
Traditional RDMA is based on lossless networks. This makes large-scale deployment costly and difficult. eRDMA uses the CC algorithm developed by Alibaba Cloud to control transmission quality in VPCs, such as latency and packet loss. eRDMA provides good performance in lossy networks.
Scalability
Compared with RDMA that requires a separate hardware NIC, eRDMA uses an RDMA HCA card that has cloud attributes based on the Shenlong architecture. eRDMA can dynamically add devices when you use ECS and supports hot migration, which allows for flexible deployment.
Shared VPCs
eRDMA depends on ENIs and reuses networks to which ENIs belong. This allows you to activate the RDMA feature in legacy networks without the need to modify service networking.

Scenarios

The TCP/IP protocol stack provides mainstream network communication protocols based on which many applications are built. With the development of business that is related to data centers, higher requirements are imposed on network performance, such as lower latency and higher throughput. TCP/IP has become a bottleneck that restricts the performance of communication networks due to limits such as high copy overheads, cross-protocol stack processing, complex CC algorithm, and frequent context switching.

RDMA helps resolve the preceding pain points. RDMA provides features, such as zero-copy and kernel bypass, to prevent overheads when data is copied and context is frequently switched. Compared with TCP/IP communication, RDMA features low latency, high throughput, and low CPU utilization. However, RDMA has a few use scenarios due to high prices and O&M costs.

Alibaba Cloud eRDMA is designed to have inclusive compatibility with diverse cloud environments. eRDMA provides low latency and lowers requirements for a wide range of applications to adapt to cloud environments to enhance their performance. Compared with traditional RDMA, eRDMA can be used in a wide range of scenarios, such as Redis-based cache databases, Spark-based big data analytics, Weather Research and Forecasting Model (WRF) in HPC, and AI training. eRDMA offers considerable performance gains in the preceding scenarios.

Limits

Before you use eRDMA, make sure that the following conditions are met. For more information, see Configure eRDMA on an enterprise-level instance.

Basic specifications

This section describes the specifications of eRDMA. When you use eRDMA, make sure that the service specification requirements are met. Otherwise, your applications may not work as expected.

RDMA QP

Specification	Content	Description
Maximum QPs (max_qp_num)	Up to 131,071 queue pairs (QPs) are supported.	The maximum number of QPs varies based on the instance type.
Maximum outstanding WRs to the send queue (max_send_wr)	8,192	The maximum number of outstanding work requests (WRs) that can be posted to the send queue.
Maximum outstanding WRs to the receive queue (max_recv_wr)	32,768	The maximum number of outstanding WRs that can be posted to the receive queue.
Maximum SGEs in a send WR (max_send_sge)	6	The maximum number of scatter-gather elements (SGEs) in a send WR.
Maximum SGEs in a receive WR (max_recv_sge)	1	The maximum number of SGEs in a receive WR.
SRQ	Not supported.	None.
QP type	Reliable connected (RC)	None.
Connection establishment method	RDMA_CM	None.

RDMA CQ

Specification	Content	Description
CQs	The number of completion queues (CQs) varies based on the instance type. The maximum number of CQs is twice the number of QPs.	None.
Vectors in a CQ (vector_num)	The number of vectors in a CQ varies based on the instance type. The maximum number of vectors in a CQ is 31. The number of CPUs is related to the number of QPs.	Each vector corresponds to a hardware interrupt. In actual usage, each CPU can be configured with up to one vector to meet communication requirements. Each vector is associated with a completion event queue (CEQ) in eRDMA.
Maximum CEQ depth	256	The maximum CEQ depth for version 0.2.34 is 256. If you use the event mode, we recommend that you do not bind more than 256 CQs to each vector. Otherwise, CEQ overflow may occur.
Maximum CQ depth	1,048,576	None.

RDMA MR

Specification	Content
MRs	The number of memory regions (MRs) vary based on the instance type. The maximum number of MRs is twice the number of QPs.
MWs	Not supported
Max MR size	The size of MRs varies based on the underlying hardware. The minimum supported MR size is 2 GB and the maximum supported MR size is 64 GB.

Supported RDMA Verbs opcode

Opcode	Supported
RDMA Write	Yes.
RDMA Write with Immediate	Yes.
RDMA Read	Yes.
Send	Yes.
Send with Invalidate	Yes.
Send with Immediate	Yes.
Send with Solicited Event	Yes.
Local Invalidate	Only kernel-mode Verbs opcode is supported.
Atomic Operation	No.