Community Blog Optimization of Parallel File Systems on Client

Optimization of Parallel File Systems on Client

This article explains how Alibaba Cloud CPFS responds to challenges and the technical exploration and landing practices of parallel file systems.

As a high-performance branch of file storage, the parallel file system has enjoyed a history of 20 years since its emergence. It has been widely used in high-performance computing fields (such as weather prediction, oil exploration, high-energy physics, automobile manufacturing, chip manufacturing, autonomous driving, and film and television rendering). GPU parallel computing is catching on in the AI era. Alibaba Cloud CPFS has officially evolved into the 2.0 era and has a series of innovations and practices in the technical system of traditional parallel file systems.

1. The Evolution of Parallel File Systems

The traditional parallel file system was born in the CPU + physical machine era. In 2012, Alex Krizhevsky (former Google computer scientist) used the deep learning + GPU scheme to win the Image Net LSVRC-2010 Image Recognition competition and miraculously increased the success rate of recognition from 74% to 85%. GPU has been famous ever since. With its massive stream processor and vector processing unit, it has become the first-choice accelerator for AI computing. The computing power of GPU chips has increased several times from V100 to A100 to H100, and the demand for data throughput also followed suit, even exceeding the capability of the PCI-E bus at one time.

With the gradual maturity of the container Kubernetes platform, the running platform of AI training has changed from virtual machine + physical machine to container + cloud computing platform. Under such a background, the traditional parallel file system is facing great challenges.

This article explains how Alibaba Cloud CPFS responds to these challenges and the technical exploration and landing practices of parallel file systems from the following aspects:

  • From dedicated clients to NFS standard protocol services
  • From the self-closing ring of file storage to an open ecosystem embracing cloud data lakes
  • From CPU large file computing to GPU small file computing
  • Faster evolution of server cache and compute cache

This article introduces the lightweight changes of CPFS on the client side.

2. Problems with the Dedicated Client

As we all know, due to the imperfection of general protocols (such as NFS at the beginning of the century), traditional parallel file systems are equipped with dedicated clients. Dedicated clients are the identity symbol of high-performance parallel file systems.


Dedicated clients are an important part of parallel file systems to achieve high performance. Dedicated clients are essential for parallel file systems. They provide MPI-IO interfaces, multi-backend server connection capabilities, SLB capabilities, and standalone data caching capabilities. However, with the advent of the container era, the dedicated client has shown many problems:

  1. Heavy clients mostly adopt kernel state, which leads to deep binding with the operating system. At the beginning of the century, professional HPC applications were limited, and most of them were developed and run in supercomputing centers by professional companies. Heavy clients did not seem to be a problem. However, with the advent of the AI era, GPU application development has matured, and developers have different habits, so restricting operating systems or kernel versions has become a huge threshold.
  2. Elastic containers bring rapid application deployment capabilities and elastic scaling capabilities and improve the utilization of computing resources to the extreme. The slow deployment speed and high software dependencies of dedicated clients reduce the deployment speed of applications and limit the elasticity of containers.
  3. Application-oriented data management replaces physical machine-oriented data management. In the container era, the user service interface is moved from physical machines and virtual machines to applications. Re-clients regard the entire file system as a unified namespace. Complex permission configuration can only be performed through the traditional ACL mode. In addition, dynamic and static PVs and container K8S cannot be linked to realize perfect isolation of application access data in containers.

3. New Ideas for Lightweight NFS Clients

You need to slim down the client to solve the problem of dedicated clients and achieve the lightweight NFS protocol. All Linux can easily use CPFS through operating system decoupling, thus unburdening developers. Then, take advantage of the high-performance advantages of distributed file systems. Finally, Kubernetes elastic PV is implemented, and strict data isolation between PVs is implemented. The specific methods include the following three aspects:


1. Lightweight Access Based on NFS Protocol

NFS is the most widely used protocol in the file storage field. It has matured general purpose and ease of use and is accepted by the majority of customers. CPFS must be compatible with NFS to lower the threshold of CPFS.

Traditional parallel file system re-clients often specify the operating system and kernel version. After the kernel version is upgraded, the client needs to be reinstalled, which leads to high operation and maintenance costs. However, the CPFS-NFS client is in user mode and does not depend on the kernel version. This brings two benefits:

  1. It supports all mainstream operating systems. CPFS-NFS client supports Alibaba Cloud Linux, CentOS, Ubuntu, Debian, etc.
  2. When the user operating system is upgraded, CPFS-NFS clients can continue to be used without upgrading.

Traditional parallel file system clients require complex configurations to achieve better operation results. For example, Lustre needs to configure the concurrency and block size of the network component LNET, metadata component MDC, and data component OSC, which increases user maintenance costs. The CPFS-NFS client is simple to use and only requires one mount command. The default configuration of the client is completed by the CPFS-NFS client, which lowers the threshold for users.

Parallel file systems usually move the file system logically to the client. For example, Lustre's OSC needs to be aware of which storage servers the file shard (stripe) is located to read data. This increases the resource overhead of CPU and memory on the client. The resource overhead of the CPFS-NFS client is lightweight and is only used to transmit data and necessary metadata operations. The CPU overhead is usually less than one logical core.

2. Optimize the Protocol to Achieve High Performance of Guaranteed End Access

With the base capability provided by CPFS parallel I/O and fully symmetric distributed architecture, the NFS protocol has high throughput and high IOPS cluster performance, which far exceeds the performance metrics brought by traditional NAS standalone architecture. For example, under the 200 MB/s/TiB specification, the NFS protocol provides a performance index of 200 MB/s throughputs per TiB capacity. The maximum throughput is 20GB/s, and the maximum IOPS is close to 1 million.

The NFS protocol service forms a protocol cluster and scales horizontally according to the CPFS file system capacity. CPFS-NFS has the ability to load between the client-side and the protocol node. When the client is mounted, you can select the best protocol node to establish a connection based on the protocol node load (including the number of connections, idle bandwidth, and CPU). This effectively avoids performance degradation caused by hot and fat clients crowding on a single protocol node.

3. Provide Support for Large-Scale Mounting in Various Mount Methods and Directory-Level Mount Points

CPFS supports multiple mounting methods to meet the requirements of Kubernetes elastic PVs and implement strict data isolation between PVs, including:

  • Large-Scale Container Mount

Traditional parallel file system clients usually save states. This results in a limited client size. For example, open files and read-write locks are saved on the client. Clients previously performed operations (such as issuing and recalling states to each other) to ensure data consistency. The larger the size of the client, the more resources are consumed. This limits the size of the client.

CPFS-NFS on the client side is stateless. The client is only connected to the storage node and does not increase the load on the client as the client size increases. The CPFS-NFS on the client-side supports 10,000 clients /PODs to simultaneously mount access data.

  • CSI plug-in supports static and dynamic volumes.

The CPFS-NFS on the client side is deeply integrated with Alibaba Cloud Container Service for Kubernetes (ACK). CSI supports static and dynamic volume mounting to mount CPFS volumes. Please see tactically provisioned CPFS volumes and Dynamically provisioned CPFS volumes for more information.

  • Directory-Level Mount Point

Directory-level mount points provide access isolation on the end. When a container is mounted, only subdirectories are mounted, which prevents container applications from directly accessing the entire file system and causing data security risks. CPFS can provide stronger directory isolation using Fileset and ACL: Fileset supports quotas in the future. You can configure the number of files in the directory subtree and the total capacity. ACL can configure user access permissions.


The standard NFS protocol of CPFS access mode helps realize flexible business migration to the cloud for some customers that cannot use CPFS on the cloud due to the original operating system version. At the same time, combined with Alibaba Cloud Container ACK, it provides customers with the dynamic scaling capability of hundreds of PODs per second, realizing fast expansion during busy hours and quick release during free hours and reducing the idle cost of GPU resources.

An important improvement in the ability of file storage CPFS to support the NFS protocol means containers and virtual machines can easily access high-performance CPFS parallel file systems regardless of the Linux version. This helps accelerate the landing of autonomous driving scenarios.

0 0 0
Share on

Alibaba Cloud Community

604 posts | 102 followers

You may also like


Alibaba Cloud Community

604 posts | 102 followers

Related Products