All Products
Search
Document Center

Elastic Compute Service:Deploy a 3FS cluster based on ECS instances

Last Updated:Mar 26, 2026

Fire-Flyer File System (3FS) is a high-performance distributed storage system developed by DeepSeek. It is designed and optimized specifically for AI large model training, big data analysis, and high-performance computing (HPC) scenarios. 3FS integrates modern SSD storage with Remote Direct Memory Access (RDMA) to overcome traditional network performance bottlenecks. This topic describes how to deploy a 3FS cluster based on Elastic Compute Service (ECS) instances over Alibaba Cloud elastic RDMA (eRDMA). The deployment solution achieves low-latency and high-throughput data transmission and meets high-performance requirements for AI training and large-scale data analysis.

Deployment solution

Create a 3FS cluster based on ECS instances from the i4 instance family with local SSDs over Alibaba Cloud's high-performance eRDMA, with Alibaba Cloud AI Containers (AC2) to provide a secure and reliable containerized access solution.

Important

Alibaba Cloud does not provide technical support for 3FS or commit to the data integrity, data correctness, software functionality, and software performance of 3FS. If issues occur, contact the 3FS community maintainers in GitHub.

  • eRDMA is an elastic Remote Direct Memory Access (RDMA) network developed by Alibaba Cloud for the cloud. eRDMA reuses virtual private clouds (VPCs) as the underlying link and uses a congestion control (CC) algorithm that is developed by Alibaba Cloud. eRDMA features high throughput and low latency based on RDMA supports. Compared with RDMA, eRDMA implements large-scale RDMA networking within seconds. eRDMA supports traditional HPC applications, AI applications, and Transmission Control Protocol/Internet Protocol (TCP/IP) applications.

    For more information, see eRDMA.

  • The i4 instance family uses Peripheral Component Interconnect Express (PCIe) Gen4 Non-Volatile Memory Express (NVMe) SSDs and the Cloud Infrastructure Processing Unit (CIPU) local disk virtualization architecture. The virtualization causes almost zero attenuation in the read and write bandwidth of SSDs and provides I/O-level O&M monitoring capabilities for customers. For more information, see Local disk categories.

  • Alibaba Cloud AI Containers (AC2) is a collection of Artificial Intelligence (AI) container images that are provided by Alibaba Cloud. AC2 has different built-in hardware acceleration libraries, AI runtimes, and AI frameworks to meet deployment requirements in different scenarios. AC2 is deeply optimized for the Alibaba Cloud infrastructure to provide better AI performance and experience on Alibaba Cloud.

    End-to-end security and reliability are ensured for AC2 images in the production and release processes. AC2 images are built using independently selected software, incorporating Common Vulnerabilities and Exposures (CVE) update policies and image security scanning mechanisms, which maximize the security of the images. For more information, see AC2 overview.

    Important
    • All 3FS components mentioned in this topic are pre-built in AC2 images. For information about how to independently build them, see 3FS GitHub.

    • AC2 images are provided free of charge. However, you may be charged for other resources, such as vCPUs, memory, storage, public bandwidth, and snapshots, when you use the images.

  • Before you use AC2 images, you must set up the Docker runtime environment. The following information describes how to enable eRDMA in a container:

    To use the eRDMA feature in a container environment, you can use the --device option of Docker to map the /dev/infiniband/rdma_cm and /dev/infiniband/uverbsX character devices to the container. This allows user mode programs inside the container to bypass the operating system kernel and directly access the eRDMA device to send and receive data.

    • /dev/infiniband/rdma_cm: A character device for eRDMA connection management. User mode programs can perform operations on this character device to establish, destroy, and manage connections with the eRDMA device. These operations include creating and destroying connections, and sending and receiving connection events.

    • /dev/infiniband/uverbsX: A character device for user space eRDMA operations. User mode programs can perform operations on this character device to communicate with the eRDMA device. These operations include opening the device, creating and destroying eRDMA communication endpoints, and registering and unregistering memory buffers.

      Note

      In /dev/infiniband/uverbsX, X is the device index number and may vary based on the system and configuration. You can run the ls /dev/infiniband | grep uverbs command to view the character device name.

Key 3FS components

The following information describes the key components of 3FS and their functions:

  • Metadata service (meta): A stateless service that handles file system metadata requests. It implements atomic operations by using FoundationDB at the underlying layer.

  • Storage service (storage): runs on storage nodes, with data stored in blocks on high-performance NVMe SSDs. It uses the Chain Replication with Apportioned Queries (CRAQ) protocol to manage replicas and provides write-all-read-any semantics.

  • Cluster manager (mgmtd): manages cluster configuration information and storage node status, and is responsible for electing the primary node and synchronizing updates to other components.

  • Client: 3FS provides two types of clients. You can use the Filesystem in Userspace (FUSE) client to implement standard access and the User Space Block I/O (USRBIO) client to implement high-performance transmission. This balances compatibility and efficiency and makes 3FS an ideal storage interface for AI and big data scenarios.

    • FUSE client: mounts the storage cluster as a local directory by using the user-space file system interface. It provides file operation interfaces (read, write, and mkdir) compatible with Portable Operating System Interface (POSIX), without the need to modify the application code.

    • USRBIO client: achieves microsecond-level latency and ultra-high throughput by using the user-space I/O stack and RDMA, meeting AI and HPC requirements.

All components communicate with each other over RDMA, bypassing the kernel protocol stack. This significantly reduces the CPU load and network latency.

Deployment example

In this example, an ecs.g8i.48xlarge instance from the g8i general-purpose instance family is used as the meta node. The meta service, cluster manager (mgmtd), client, and monitor service are deployed on the meta node. Five ecs.i4.32xlarge instances from the i4 instance family with local SSDs are used as storage nodes to provide high-performance NVMe SSDs. eRDMA is configured on each node to allow eRDMA communication and complete the 3FS cluster deployment. All the ECS instances reside in the same VPC and zone.

Note

To simplify the process, this example deploys the metadata service (meta), cluster manager (mgmtd), client, and monitor components on a single node. In a production environment, we recommend that you use a multi-node distributed architecture for service decoupling and resource isolation based on your business scale and high availability requirements.

image

Step 1: Prepare the environment

Prepare ECS instances for deploying 3FS cluster nodes and configure eRDMA on the instances.

Create an ECS instance to serve as the meta and client nodes, and create five ECS instances to serve as the storage nodes. All nodes can communicate with each other over the internal network. All nodes can access the Internet to download AC2 image resources.

During creation, take note of the following parameters for all ECS instances. For information about other parameters, see Custom launch ECS instances.

  • Region: In this example, the China (Hangzhou) region is used.

  • Instance type: Select an instance type that supports eRDMA.

  • Image: Select an Ubuntu 22 image that supports eRDMA.

  • eRDMA Interface (ERI): Enable ERI for an elastic network interface (ENI) on an instance to enable eRDMA communication mode.

    Important

    The IP addresses in this example are the primary private IP addresses of the ENIs with ERI enabled. In this example, ERI is enabled for the primary ENI of each instance. If you test RDMA communication on secondary ENIs, change the IP addresses accordingly.

    image

Step 2: Deploy the meta node

  1. Connect to the instance used as the meta node.

    For more information, see Connect to Linux.

  2. Install the eRDMA driver:

    curl -O http://mirrors.cloud.aliyuncs.com/erdma/env_setup.sh
    sudo /bin/bash env_setup.sh > /var/log/erdma_install.log 2>&1

    The script automatically installs the required dependencies and then the eRDMA driver. Wait for the script execution to complete.

  3. Check whether eRDMA is properly configured:

    ibv_devinfo

    The following command output indicates that eRDMA is properly enabled on the instance (the eRDMA driver is correctly installed and ERI is correctly configured). The installation process of the eRDMA driver requires a period of time to complete. If an error is returned, see Verify the correctness of eRDMA configurations for troubleshooting.

    image

  4. Set the RDMA connection mode to Compat mode.

    Important
    • By default, CPU-based instance families that support eRDMA install the eRDMA kernel-mode driver in Standard mode. In this mode, only the RDMA_CM connection establishment method is supported.

    • eRDMA provides Compat mode for out of bandwidth (OOB) scenarios. In Compat mode, 16 TCP ports in the range of 30608 to 30623 are additionally occupied.

    sudo sh -c "echo 'options erdma compat_mode=Y' >> /etc/modprobe.d/erdma.conf"
    sudo rmmod erdma
    sudo modprobe erdma compat_mode=Y
  5. Install Docker.

    Note

    Before you use AC2 images, you must set up the Docker runtime environment. The Docker installation steps vary based on the operating system. For more information, see Install and use Docker and Docker Compose. In this example, Ubuntu is used.

    apt update
    apt install docker.io -y

    After the installation is complete, run the following command to view the Docker version and check whether Docker is installed.

    image

  6. Deploy and start ClickHouse to persist cluster runtime metrics, such as node status, resource utilization, and I/O:

    docker run -d --net=host --name clickhouse-server --ulimit nofile=262144:262144 ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/clickhouse:25.3.1.2703-ubuntu22.04
    • --net=host: Sets the communication mode of the container to host. Applications in the container can directly use the network interfaces and network configurations of the host. This provides the same network communication capabilities as the host.

    • --ulimit nofile: specifies the file descriptor limit for processes in the container, which is the maximum number of files a process can open simultaneously. This prevents system instability caused by excessive file handle usage by processes.

  7. Deploy and start FoundationDB to store cluster transaction log settings and data:

    docker run -d --network=host --name fdb ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/fdb:7.3.63-ubuntu22.04
  8. Deploy and start 3FS monitor to collect and analyze various metrics during system runtime, such as throughput, latency, and resource usage:

    docker run -d --network=host --name monitor --ulimit memlock=-1 --privileged --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/3fs:b71ffc55-fdb7.3.63-fuse3.16.2-ubuntu22.04 ./monitor.sh
    • --ulimit memlock=-1: Sets maxlockedmemory to unlimited. This means there is no limit on the amount of memory that a non-root user can lock. This ensures that eRDMA applications can lock the required amount of memory when run by non-root users, allowing them to use the eRDMA feature effectively.

    • --device=/dev/infiniband/uverbsX and --device=/dev/infiniband/rdma_cm: Expose the eRDMA user mode character devices to the container.

      Run the following command to view the character device name (the X in uverbsX ):

      ls /dev/infiniband | grep uverbs

      image

  9. Deploy and start the 3FS cluster manager (mgmtd) to manage storage nodes and resource allocation in the cluster:

    docker run -d --network=host --name mgmtd --ulimit memlock=-1 --privileged --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm --env FDB_CLUSTER=`docker exec fdb cat /etc/foundationdb/fdb.cluster` --env REMOTE_IP="172.16.20.172:10000" --env MGMTD_SERVER_ADDRESSES="RDMA://172.16.20.172:8000" ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/3fs:b71ffc55-fdb7.3.63-fuse3.16.2-ubuntu22.04 ./mgmtd.sh
    • --env REMOTE_IP: specifies the address of the monitor service, which is the primary private IP address of the node on which the service resides. In this example, the service resides on the same node as the meta service.

    • --env MGMTD_SERVER_ADDRESSES: specifies the address of the mgmtd service, which is the primary private IP address of the node on which the service is started. In this example, the service resides on the same node as the meta and monitor services.

  10. Deploy and start the 3FS metadata service (meta):

    docker run -d --network=host --name meta --ulimit memlock=-1 --privileged --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm --env FDB_CLUSTER=`docker exec fdb cat /etc/foundationdb/fdb.cluster` --env META_NODE_ID=100 --env REMOTE_IP="172.16.20.172:10000" --env MGMTD_SERVER_ADDRESSES="RDMA://172.16.20.172:8000" ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/3fs:b71ffc55-fdb7.3.63-fuse3.16.2-ubuntu22.04 ./meta.sh
    • --env META_NODE_ID: specifies the sequential number of the meta node. In this example, one meta node is used, and its sequential number is set to 100.

    • --env REMOTE_IP: specifies the address of the monitor service, which is the primary private IP address of the node on which the service resides. In this example, the service resides on the same node as the meta service.

    • --env MGMTD_SERVER_ADDRESSES: specifies the address of the mgmtd service, which is the primary private IP address of the node on which the service is started. In this example, the service resides on the same node as the meta and monitor services.

  11. View the started node services (Docker containers):

    docker ps

    image

Step 3: Deploy storage nodes

Perform the following steps on each of the five storage nodes:

  1. Connect to an instance used as a storage node.

    For more information, see Connect to Linux.

  2. Install the eRDMA driver:

    curl -O http://mirrors.cloud.aliyuncs.com/erdma/env_setup.sh
    sudo /bin/bash env_setup.sh > /var/log/erdma_install.log 2>&1

    The script automatically installs the required dependencies and then the eRDMA driver. Wait for the script execution to complete.

  3. Check whether eRDMA is correctly configured:

    ibv_devinfo

    The following command output indicates that eRDMA is properly enabled on the instance (the eRDMA driver is correctly installed and ERI is correctly configured). The installation process of the eRDMA driver requires a period of time to complete. If an error is returned, see Verify the correctness of eRDMA configurations for troubleshooting.

    image

  4. Set the RDMA connection mode to Compat mode.

    Important
    • By default, CPU-based instance families that support eRDMA install the eRDMA kernel-mode driver in Standard mode. In this mode, only the RDMA_CM connection establishment method is supported.

    • eRDMA provides Compat mode for out of bandwidth (OOB) scenarios. In Compat mode, 16 TCP ports in the range of 30608 to 30623 are additionally occupied.

    sudo sh -c "echo 'options erdma compat_mode=Y' >> /etc/modprobe.d/erdma.conf"
    sudo rmmod erdma
    sudo modprobe erdma compat_mode=Y
  5. Format and mount NVMe SSDs. In this example, eight local disks on the storage node are mounted and formated to eXtensible File System (XFS) file systems. XFS file systems provide high performance and are suitable for large files.

    mkdir -p /storage/data{0..7}
    for i in {0..7};do mkfs.xfs -L data${i} /dev/nvme${i}n1;mount -o noatime,nodiratime -L data${i} /storage/data${i};done;
    mkdir -p /storage/data{0..7}/3fs

    If you repeat this step, clear the storage space and then specify the -f option to force mount the file systems.

    for i in {0..7};do umount /storage/data${i};done
    for i in {0..7};do mkfs.xfs -f -L data${i} /dev/nvme${i}n1;mount -o noatime,nodiratime -L data${i} /storage/data${i};done;

    After the mounting is complete, run the following command to check the partition status.

    image

  6. Install Docker.

    Note

    Before you use AC2 images, you must set up the Docker runtime environment. The Docker installation steps vary based on the operating system. For more information, see Install and use Docker and Docker Compose. In this example, Ubuntu is used.

    apt update
    apt install docker.io -y

    After the installation is complete, run the following command to view the Docker version and check whether Docker is installed.

    image

  7. Start the storage service (storage):

    docker run -d --network=host --name storage --ulimit memlock=-1 --privileged -v /storage:/storage --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm --env STORAGE_NODE_ID=10001 --env TARGET_PATHS="/storage/data0/3fs","/storage/data1/3fs","/storage/data2/3fs","/storage/data3/3fs","/storage/data4/3fs","/storage/data5/3fs","/storage/data6/3fs","/storage/data7/3fs" --env REMOTE_IP="172.16.20.172:10000" --env MGMTD_SERVER_ADDRESSES="RDMA://172.16.20.172:8000" ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/3fs:b71ffc55-fdb7.3.63-fuse3.16.2-ubuntu22.04 ./storage.sh
    • --device=/dev/infiniband/uverbsX and --device=/dev/infiniband/rdma_cm: Expose the eRDMA user mode character devices to the container.

      Run the following command to view the character device name (the X in uverbsX ):

      ls /dev/infiniband | grep uverbs

      image

    • --env STORAGE_NODE_ID: specifies the sequential number of the storage node. For the five storage nodes, the numbers are 10001, 10002, 10003, 10004, and 10005, respectively. Replace the values with the actual numbers.

    • --env TARGET_PATHS: specifies the storage directories for 3FS. In this example, the directories created on the local disks in the preceding step are used.

    • --env REMOTE_IP: specifies the address of the monitor service, which is the primary private IP address of the node on which the service resides. In this example, the service resides on the same node as the meta service.

    • --env MGMTD_SERVER_ADDRESSES: specifies the address of the mgmtd service, which is the primary private IP address of the node on which the service is started. In this example, the service resides on the same node as the meta and monitor services.

  8. View the started node services (Docker containers):

    docker ps

    image

Step 4: View the connected storage nodes on the meta node

  1. Connect to the instance used as the meta node.

    For more information, see Connect to Linux.

  2. View the status of the nodes connected to the meta node:

    docker exec -it meta  /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.16.20.172:8000"]' "list-nodes"

    Replace the IP address with the IP address of the mgmtd service.

    image

    admin_cli is a command-line tool for managing and maintaining 3FS. You can use it for cluster configuration, status monitoring, and troubleshooting. For more information, run the following command:

    docker exec -it meta  /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.16.20.172:8000"]' "help"

Step 5: Configure storage node-related parameters on the meta node

Configure storage node-related information, such as the number of replicas and the number of disks per storage node.

  1. Connect to the instance used as the meta node.

    For more information, see Connect to Linux.

  2. Configure 3FS:

    docker exec \
      --env STORAGE_NODE_NUM=5 \
      --env STORAGE_NODE_BEGIN=10001 \
      --env STORAGE_NODE_END=10005 \
      --env REPLICATION_FACTOR=3 \
      --env NUM_DISKS_PER_NODE=8 \
      --env MGMTD_SERVER_ADDRESSES="RDMA://172.16.20.172:8000" \
      meta \
      ./config_3fs.sh
    • --env STORAGE_NODE_NUM: specifies the number of storage nodes, which is 5 in this example.

    • --env STORAGE_NODE_BEGIN: specifies the start sequential number of storage nodes, which is 10001 in this example.

    • --env STORAGE_NODE_END: specifies the end sequential number of storage nodes, which is 10005 in this example.

    • --env REPLICATION_FACTOR: specifies the number of replicas for stored data. In this example, 3 is used, which indicates three replicas.

    • --env NUM_DISKS_PER_NODE: specifies the number of disks per storage node, which is 8 in this example.

    • --env MGMTD_SERVER_ADDRESSES: specifies the address of the mgmtd service.

    • --env NUM_TARGETS_PER_DISK: specifies the number of storage targets expected to be created per physical disk. Default value: 12, which indicates that each SSD is divided into 12 storage targets.

    • --env MIN_TARGETS_PER_DISK: specifies the minimum number of storage targets allowed per physical disk. Default value: 12, which indicates that each SSD must maintain at least 12 available storage targets.

    The config_3fs.sh script performs the following operations:

    • Creates the root administrator and generates an authentication token. The token is saved to /opt/3fs/etc/token.txt to authenticate subsequent operations.

    • A data distribution policy is generated based on input parameters such as STORAGE_NODE_NUM and REPLICATION_FACTOR. A chained storage table is then generated from this policy and uploaded to the management server mgmtd to deploy the target and chain structures for the storage nodes.

  3. Run the following command to confirm that the chain has been created and its status is normal.

    docker exec -it meta  /opt/3fs/bin/admin_cli -cfg /opt/3fs/etc/admin_cli.toml --config.mgmtd_client.mgmtd_server_addresses '["RDMA://172.16.20.172:8000"]' "list-chains"

    Replace the IP address with the address of the configured mgmtd service.

    The following figure shows the chain information. The chain status must be normal for the subsequent FIO test.

    image

Step 6: Configure the FUSE client

Configure the FUSE client on the meta node to allow you to mount remote storage and perform file operations as if operating on a local system.

  1. Connect to the instance used as the meta node.

    For more information, see Connect to Linux.

  2. Obtain a token to ensure secure client access:

    docker exec meta cat /opt/3fs/etc/token.txt
  3. Start the FUSE client:

    docker run -d --network=host --name fuse --shm-size=200g --ulimit memlock=-1 --privileged \
      --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm \
      --env REMOTE_IP="172.16.20.172:10000" \
      --env MGMTD_SERVER_ADDRESSES="RDMA://172.16.20.172:8000" \
      --env TOKEN=${token} \
      ac2-registry.cn-hangzhou.cr.aliyuncs.com/ac2/3fs:b71ffc55-fdb7.3.63-fuse3.16.2-ubuntu22.04  \
      ./fuse.sh
    • --shm-size: specifies the size of the /dev/shm shared memory area in the container.

      • If FUSE is used, you can set the --shm-size parameter to a smaller value, such as 2g.

      • If USRBIO is used, the test process needs to share I/O buffers with the FUSE process by using shared memory to achieve zero copy. We recommend that you increase the value of the --shm-size parameter, such as 200g.

    • --device=/dev/infiniband/uverbsX and --device=/dev/infiniband/rdma_cm: Expose the eRDMA user mode character devices to the container.

      Run the following command to view the character device name (the X in uverbsX ):

      ls /dev/infiniband | grep uverbs

      image

    • --env REMOTE_IP: specifies the address of the monitor service, which is the primary private IP address of the node on which the service resides. In this example, the service resides on the same node as the meta service.

    • --env MGMTD_SERVER_ADDRESSES: specifies the address of the mgmtd service, which is the primary private IP address of the node on which the service is started. In this example, the service resides on the same node as the meta and monitor services.

    • --env TOKEN: specifies the token string obtained in the previous step.

  4. View the started node services (Docker containers):

    docker ps

    image

  5. View the file system mounting information and disk usage in the FUSE container:

    docker exec fuse df -hT | grep 3fs

    image

Use fio to test performance

After the cluster deployment is complete, the FUSE container has a 3FS-based high-performance distributed file system that uses eRDMA for inter-node communication.

Flexible I/O tester (fio) is an open source storage performance benchmarking tool designed to evaluate the I/O capabilities of storage systems, such as hard drives, SSDs, and distributed file systems. Its core value lies in its ability to simulate load models in real business scenarios and precisely quantify key performance metrics of storage devices, such as throughput, IOPS, and latency, by using highly configurable test parameters.

The following section describes how to use fio to perform performance tests on the deployed 3FS file system.

Use USRBIO (user-space RDMA engine)

In the FUSE container, start 100 jobs to simulate a high-concurrency, large-file sequential write scenario and test the extreme performance of 3FS over the eRDMA network. The hf3fs_usrbio.so engine enables user-space RDMA communication, achieving zero-copy RDMA optimization and significant performance improvements compared with the kernel-space FUSE.

  1. Connect to the instance used as the meta node.

    For more information, see Connect to Linux.

  2. Start the test:

    docker exec -it fuse \
     fio -numjobs=100 -fallocate=none \
     -ioengine=external:/usr/lib/hf3fs_usrbio.so \
     -direct=1 \
     -rw=write \
     -bs=4MB \
     --group_reporting \
     -filesize=500MB \
     --nrfiles=100 \
     -iodepth=1 \
     -name=/3fs/test \
     -mountpoint=/3fs \
     -ior_depth=1
    • -numjobs=100: starts 100 concurrent jobs (threads or processes) to simulate a multi-threaded concurrent write scenario and test the system's throughput and lock contention under high concurrency.

    • -ioengine=external:/usr/lib/hf3fs_usrbio.so: uses the 3FS-specific RDMA engine.

    • -direct=1: enables direct I/O to bypass the operating system cache.

    • -rw=write: sets the test mode to sequential write (sequential writing of large files) to evaluate continuous write bandwidth.

    • -bs=4M: sets the block size to 4 MB, simulating large block data write scenarios such as video streams and batch data processing.

    • --group_reporting: merges and summarizes the results of all jobs for easier viewing of overall performance, rather than displaying individual job data.

    • -filesize=500MB: specifies each job to write 500 MB of data. The total data volume is 500 MB × 100 jobs = 50 GB.

    • --nrfiles=100: creates a total of 100 files (each job writes to one file) to test multi-file concurrent write performance.

    • -iodepth=1: sets the I/O queue depth of each job to 1, which specifies that each job only submits one I/O request.

    • -mountpoint=/3fs: sets the 3FS file system mount point to the /3fs directory.

    • -ior_depth=1: specifies the RDMA transmit (Tx) queue depth. In this example, 1 is used, which specifies the synchronous I/O mode.

  3. During the test, open the monitoring page of the instance on which the FUSE container resides and check the eRDMA traffic. For more information, see Monitor network bandwidth.

    image

  4. Wait for the test to complete and view the test results.

    image

    The test results for core metrics:

    • Throughput:

      • IOPS=2621: 2,621 4-MB write operations per second.

      • BW=10.2GiB/s (11.0GB/s): The actual test bandwidth reaches 10.2 GiB/s.

    • System resource consumption:

      cpu: usr=1.21%, sys=0.02%: ultra-low CPU utilization (significant advantage of eRDMA).

Use POSIX (kernel-space libaio engine)

In the FUSE container, use the Linux asynchronous I/O engine to start two jobs to simulate concurrent writes, inject 4-MB data blocks to test bandwidth in sequential write scenarios, and continuously run the test for 5 minutes to test 3FS stability.

  1. Connect to the instance used as the meta node.

    For more information, see Connect to Linux.

  2. Start the test:

    docker exec -it fuse \
      fio -numjobs=2 -fallocate=none \
      -iodepth=2 \
      -ioengine=libaio \
      -direct=1 \
      -rw=write \
      -bs=4M \
      --group_reporting \
      -size=100M \
      -time_based \
      -runtime=300 \
      -name=2depth_128file_4M_direct_write_bw \
      -directory=/3fs
    • -numjobs=2: starts two concurrent jobs (threads or processes) to simulate a multi-threaded concurrent write scenario and test the system's throughput and lock contention under high concurrency.

    • -iodepth=2: sets the I/O queue depth for each job to 2, which specifies that each job simultaneously submits two I/O requests.

    • -ioengine=libaio: uses the Linux asynchronous I/O engine (libaio), which supports non-blocking I/O.

    • -direct=1: enables direct I/O to bypass the operating system cache.

    • -rw=write: sets the test mode to sequential write (sequential writing of large files) to evaluate continuous write bandwidth.

    • -bs=4M: sets the block size to 4 MB, simulating large block data write scenarios such as video streams and batch data processing.

    • --group_reporting: merges and summarizes the results of all jobs for easier viewing of overall performance, rather than displaying individual job data.

    • -size=100M: specifies each job to write 100 MB of data (total data volume = 100 MB × 2 jobs = 200 MB).

    • -time_based and -runtime=300: configures the jobs to run for 300 seconds (5 minutes). The jobs continue running for the full test period, even if they have written the specified amount of data before the period ends, to test long-term write stability.

    • -directory=/3fs: sets the 3FS file system mount point to the /3fs directory.

  3. During the test, open the monitoring page of the instance on which the FUSE container resides and check the eRDMA traffic. For more information, see Monitor network bandwidth.

    image

  4. Wait for the test to complete and view the test results.

    image

    The test results for core metrics:

    • Throughput:

      • IOPS=715: 715 4-MB write operations per second (715 × 4 MB = 2,860 MB/s).

      • BW=2862MiB/s (3001MB/s): The actual bandwidth reaches 2.795 GB/s.

    • System resource consumption:

      cpu: usr=18.66%, sys=2.55% : high user-space CPU utilization (libaio engine overhead).