All Products
Search
Document Center

Elastic High Performance Computing:Deploy an Elastic High Performance Computing cluster with eRDMA

Last Updated:Feb 06, 2026

This topic describes how to create an Elastic High Performance Computing (E-HPC) Cluster (formerly E-HPC NEXT) that supports elastic Remote Direct Memory Access (eRDMA). It also shows how to configure runtime parameters for the OSU-Benchmark application to accelerate communication for multi-node High Performance Computing (HPC) applications.

Background information

Using eRDMA technology, multi-node parallel HPC tasks in an E-HPC Cluster (formerly E-HPC NEXT) achieve high-speed network performance comparable to on-premises clusters. These tasks, such as climate forecasting, industrial simulation, and molecular dynamics, benefit from high bandwidth and low latency, which significantly improves the efficiency of numerical simulations. You can experience the benefits of RDMA on your existing network without deploying additional network interface controllers (NICs), which ensures seamless integration and ease of use.

Preparations

  1. Go to the Create Cluster page to create an E-HPC cluster. For more information, see Create a Standard Edition cluster.

    The following table shows an example cluster configuration.

  2. Configuration Item

    Configuration

    Cluster Configuration

    Region

    Shanghai

    Network and Zone

    Select Zone L

    Series

    Standard Edition

    Deployment Mode

    Public Cloud Cluster

    Cluster Type

    SLURM

    Control Plane Node

    • Instance type: ecs.c7.xlarge. This instance type provides 4 vCPUs and 8 GiB of memory.

    • Image: aliyun_2_1903_x64_20G_alibase_20240628.vhd

      Note

      The osu-benchmark installation package is built on the Alibaba Cloud Linux 2.1903 LTS 64-bit image.

    Compute Nodes and Queues

    Number of Queue Nodes

    Initial nodes: 2.

    Inter-node Interconnection

    eRDMA Network

    Note

    Only some node specifications support Elastic RDMA Interconnect (ERI). For more information, see elastic Remote Direct Memory Access (eRDMA) and Enable eRDMA on enterprise-level instances.

    Instance Type Group

    Instance type: ecs.c8ae.xlarge or other AMD instances of the same generation.

    Image: aliyun_2_1903_x64_20G_alibase_20240628.vhd

    Shared File Storage

    /home cluster mount directory

    By default, the /home and /opt directories of the control plane node are mounted to a file system and used as shared storage directories.

    /opt cluster mount directory

    Software and Service Components

    Software to Install

    • erdma-installer

    • mpich-aocc

    Installable Service Components

    Logon Node:

    • Instance type: ecs.c7.xlarge. This instance type provides 4 vCPUs and 8 GiB of memory.

    • Image: aliyun_2_1903_x64_20G_alibase_20240628.vhd

  3. Create a cluster user. For more information, see User management.

Check the eRDMA environment

Check if the eRDMA configuration of the compute nodes is correct.

  1. Log on to the Elastic High Performance Computing console and click the destination cluster.

  2. On the Nodes & Queues > Nodes page, select all compute nodes in the cluster and click Send Command.

    image

  3. Check the eRDMA network status and the RDMA hardware and software support on the compute nodes.

    1. Send the following command to all compute nodes.

      hpcacc erdma check

      image

    2. If the following result is returned, the eRDMA configuration is correct.

      image

    3. If an abnormal message is returned, run the following command to fix the issue.

      hpcacc erdma repair
    4. After the issue is fixed, confirm that the eRDMA configuration is correct.

OSU-Benchmark test

OSU-Benchmark is used to evaluate the communication performance of HPC clusters and distributed systems. This topic uses the following two benchmarks to test communication performance based on different network protocols (TCP vs. RDMA):

  • Network latency test (osu_latency): Measures the one-way latency of point-to-point communication, which is the time taken to send a message from one process to another, excluding the response time. This test focuses on the communication efficiency of small messages, from 1 B to several kilobytes. Small message latency reflects the underlying performance of network hardware, such as RDMA acceleration, and the optimization level of the MPI library. It is a core indicator of HPC system responsiveness. For example, low latency significantly reduces communication overhead in real-time simulations or machine learning parameter synchronization.

  • Network bandwidth test (osu_bw): Measures the sustainable bandwidth of point-to-point communication, which is the amount of data transferred per unit of time. This test focuses on the transfer efficiency of large messages, from several kilobytes to several megabytes. Bandwidth performance directly affects the efficiency of large data transfers, such as matrix exchanges in scientific computing or file I/O scenarios. If the measured bandwidth is much lower than the theoretical value, optimize the MPI configuration for multi-threaded communication or check network settings such as MTU and throttling.

The test procedure is as follows:

  1. Connect to the E-HPC cluster as the user you created. For more information, see Connect to a cluster.

  2. Run the following command to check if the required environment components are installed correctly.

    module avail
  3. Run the following command to download and decompress the precompiled osu-benchmark installation package.

    cd ~ && wget https://ehpc-perf.oss-cn-hangzhou.aliyuncs.com/AMD-Genoa/osu-bin.tar.gz
    tar -zxvf osu-bin.tar.gz
  4. Run the following command to navigate to the test working directory and edit the Slurm job script.

    cd ~/pt2pt 
    vim slurm.job

    The test script is as follows:

    #!/bin/bash
    #SBATCH --job-name=osu-bench
    #SBATCH --ntasks-per-node=1
    #SBATCH --nodes=2
    #SBATCH --partition=comp
    #SBATCH --output=%j.out
    #SBATCH --error=%j.out
    
    # Load environment parameters
    module purge
    module load aocc/4.0.0 gcc/12.3.0 libfabric/1.16.0 mpich-aocc/4.0.3
    
    # Run MPI latency test: eRDMA
    echo -e "++++++ use erdma for osu_lat: START"
    mpirun -np 2 -ppn 1 -genv FI_PROVIDER="verbs;ofi_rxm" ./osu_latency
    echo -e "------ use erdma for osu_lat: END\n"
    # Run MPI latency test: TCP
    echo -e "++++++ use tcp for osu_lat: START"
    mpirun -np 2 -ppn 1 -genv FI_PROVIDER="tcp;ofi_rxm" ./osu_latency
    echo -e "------ use tcp for osu_lat: END\n"
    
    # Run MPI bandwidth test: eRDMA
    echo -e "++++++ use erdma for osu_bw: START"
    mpirun -np 2 -ppn 1 -genv FI_PROVIDER="verbs;ofi_rxm" ./osu_bw
    echo -e "------ use erdma for osu_bw: END\n"
    # Run MPI bandwidth test: TCP
    echo -e "++++++ use tcp for osu_bw: START"
    mpirun -np 2 -ppn 1 -genv FI_PROVIDER="tcp;ofi_rxm" ./osu_bw
    echo -e "------ use tcp for osu_bw: END\n"
    Note
    • -np 2: Specifies the total number of processes. A value of 2 means the MPI job starts two processes.

    • -ppn 1: Specifies the number of processes per node. A value of 1 means one process runs on each node.

    • -genv: Sets an environment variable that applies to all processes.

      • FI_PROVIDER="tcp;ofi_rxm": Uses the TCP protocol and enhances communication reliability through the RXM framework.

      • FI_PROVIDER="verbs;ofi_rxm": Prioritizes the high-performance Verbs protocol (based on RDMA) and optimizes message transmission through the RXM framework. Alibaba Cloud eRDMA provides a high-performance elastic RDMA network.

  5. Run the following command to submit the test job.

    sbatch slurm.job

    The command line displays the job ID.

    image

  6. Run the following command to view the job status. During the test, you can also view monitoring information for E-HPC in the console, such as storage, job, and node status. For more information, see View monitoring information.

    squeue

    image

    In the current directory, you can view the log file that corresponds to the job ID. The output is as follows:

    • Network latency test results: These results show the relationship between message size (in bytes, from 1 B to 4 MB) and average latency. The following are example test results:

      Using the Verbs protocol (based on eRDMA)

      image

      Using the TCP protocol

      image

      The test data shows that for small messages (1 B to 8 KB), the latency of eRDMA is significantly lower than that of TCP.

    • Network bandwidth test results: These results show the relationship between message size (in bytes, from 1 B to 4 MB) and bandwidth. The following are example test results:

      Using the Verbs protocol (based on eRDMA)

      image

      Using the TCP protocol

      image

      The test data shows that for message sizes from 16 KB to 64 KB, eRDMA fully utilizes the network bandwidth, whereas the TCP protocol stack introduces additional processing overhead.