Talk about client optimization of parallel file system
Watch now → Storage Series Launch: File Storage CPFS New Capabilities Launch
As a high-performance branch of file storage, parallel file system has been used in high performance computing fields such as weather prediction, oil exploration, high-energy physics, automobile manufacturing, chip manufacturing, automatic driving, film and television rendering, etc., for more than 20 years since its appearance. In the AI era, GPU parallel computing is in full swing. Alibaba Cloud CPFS has also formally evolved to the 2.0 era, and has made a series of innovations and practices in the technical system of traditional parallel file systems.
1、 The evolution of parallel file system
The traditional parallel file system was born in the CPU+physical machine era at the beginning of the 21st century, until the second decade of the 21st century. In 2012, Alex Krizhevsky (a former Google computer scientist) won the Image Net LSVRC-2010 image recognition contest at one stroke by using the solution of deep learning+GPU, and miraculously increased the recognition success rate from 74% to 85%. GPU became famous from the war, and became the preferred accelerator of AI computing with its massive stream processor and vector processing unit. The computing power of GPU chips has doubled from V100 to A100 to H100, and the demand for data throughput has also doubled, even exceeding the capacity of PCI-E bus.
In the same way, with the gradual maturity of the container K8S platform, the operation platform of AI training has changed from the past virtual machine+physical machine to the container+cloud computing platform. Under this background, the traditional parallel file system is also facing great challenges.
This series of articles will share with you how Alibaba Cloud CPFS addresses these challenges and how it explores and implements parallel file systems from the following aspects:
• From dedicated clients to NFS standard protocol services
• From a file storage self closed loop to an open ecology embracing the cloud data lake
• From CPU large file calculation to GPU small file calculation
• Faster, faster -- evolution of server side cache and computing side cache
In this issue, we will specifically introduce the lightweight change of CPFS on the client side.
2、 The problem with the exclusive client
As we all know, due to the imperfection of common protocols such as NFS at the beginning of the century, traditional parallel file systems have designed and provided dedicated clients, which can be said to be the identity symbol of high-performance parallel file systems.
The exclusive client is an important part of the parallel file system to achieve high performance. The exclusive client is critical to the parallel file system. It not only provides MPI-IO interface, multi back-end server connectivity, and load balancing capabilities, but also some exclusive clients can provide stand-alone data caching capabilities. However, with the advent of the container era, the exclusive client has shown many problems:
First of all, heavy clients mostly adopt kernel mode, which causes deep binding with the operating system. At the beginning of the century, professional HPC applications were limited, and most of them were developed and run in supercomputing centers for professional companies. It seemed that heavy clients were not a problem. However, with the advent of the AI era, GPU application development has blossomed, and developers have different habits. Limiting the operating system or kernel version has become a huge threshold.
Secondly, the elastic container brings the ability of rapid application deployment and elastic capacity expansion, and improves the utilization of computing resources to the extreme. The slow deployment speed of the exclusive client and more software dependencies reduce the application deployment speed and limit the flexibility of the container.
Thirdly, application oriented data management replaces physical machine oriented data management. In the container era, the user business interface is moved from the physical machine and virtual machine to the application. The heavy client regards the entire file system as a unified namespace, which can only be configured with complex permissions through the traditional ACL method, and cannot be linked with the container K8S through dynamic and static PVs to achieve perfect isolation of application access data in the container.
3、 New ideas for lightweight NFS clients
To solve the problem of exclusive client, it is necessary to "slim down" the client and realize the lightweight of NFS protocol side. Through operating system decoupling, all Linux systems can easily use CPFS, freeing developers. Secondly, give play to the high-performance advantages of distributed file systems. Finally, realize K8S elastic PVs and strict data isolation between PVs. The specific methods include the following three aspects:
1. Lightweight end access based on NFS protocol
NFS is the most widely used protocol in the file storage field. It has mature universality and ease of use, and is accepted by the majority of users. In order to lower the threshold of using CPFS, CPFS needs to be NFS compatible.
The traditional parallel file system heavy client often specifies the operating system and kernel version. After the kernel version is upgraded, the client still needs to be reinstalled. The operation and maintenance cost is high. The CPFS-NFS client is user mode and does not rely on the kernel version. This brings two benefits: first, it supports all mainstream operating systems. The CPFS-NFS client supports Alibaba Cloud Linux, CentOS, Ubuntu, Debian, and so on. Second, when the user operating system is upgraded, CPFS-NFS clients can continue to use without upgrading.
Traditional parallel file system clients require complex configurations to achieve good operation results. For example, Lustre needs to configure the concurrency and block size of the network component LNET, metadata component MDC, and data component OSC, which increases the user's maintenance costs. The CPFS-NFS client is simple to use. Only one mount command is required. The default configuration of the client is completed by the CPFS-NFS client itself, reducing the threshold for users.
Parallel file systems usually move the file system logically to the client for completion. For example, Lustre's OSC needs to know which storage servers the file slices are located on to read data, which increases the CPU and memory resource overhead on the client. The resource cost of CPFS-NFS client is light, and it is only used for data transmission and necessary metadata operations. The CPU cost is usually less than a logical core.
2. Optimize protocol to ensure high performance of end access
With the base capacity provided by CPFS parallel I/O and fully symmetric distributed architecture, NFS protocol side also has the cluster performance of high throughput and high IOPS, far exceeding the performance indicators brought by traditional NAS stand-alone architecture. For example, under the 200MB/s/TiB specification, the NFS protocol side also provides the performance index of cashing 200MB/s throughput per TiB capacity. The maximum throughput is 20GB/s, and the maximum can be close to 1 million IOPS.
NFS protocol services form a protocol cluster, which is expanded horizontally according to the capacity of the CPFS file system. The CPFS-NFS client and the protocol node have the ability of load balancing. When the client is mounted, the best protocol node can be selected according to the protocol node load (number of connections, idle bandwidth, CPU, etc.) to establish a connection, effectively avoiding the performance degradation caused by hot and fat clients running on a single protocol node.
3. It provides support for multiple mounting methods, large-scale mounting, and directory level mounting points
In order to meet the requirements of K8S elastic PVs and achieve strict data isolation between PVs, CPFS supports a variety of mounting methods, including:
Large scale container mount
Traditional parallel file system clients usually save state, which leads to limited client scale, such as saving open files, read-write locks and other states on the client. To ensure data consistency, clients issue and recall state to each other before. The larger the client size is, the more interaction and resource consumption between clients will be, which limits the size of clients.
The CPFS-NFS client is stateless. The client is only connected to the storage node and will not increase the client's load as the client size increases. CPFS-NFS client supports 10000 clients/PODs to mount access data at the same time.
CSI plug-in, supporting static and dynamic volumes
The CPFS-NFS client is deeply integrated with Alibaba Cloud Container Service (ACK). CSI supports static storage volume mounting and dynamic storage volume mounting. See CPFS static volume and CPFS dynamic volume for details.
Directory level mount point
The directory level mount point provides the capability of end-to-end access isolation. When the container is mounted, it only mounts subdirectories, preventing the container application from directly accessing the entire file system, causing data security problems. By using filesets and ACLs, CPFS can provide stronger directory isolation: subsequent filesets support quotas, and the number and total capacity of files in the directory subtree can be configured; The ACL configures the user's access rights.
At present, the standard NFS protocol access mode of CPFS has been opened. It helps some customers who cannot use CPFS on the cloud because of the original operating system version to achieve business elasticity on the cloud. At the same time, combined with Alibaba Cloud container ACK service, it provides customers with the dynamic scalability of hundreds of PODs per second, enabling rapid capacity expansion during busy hours, rapid release during idle hours, and reducing the idle cost of GPU resources.
The important improvement in the ability of file storage CPFS to support NFS protocol means that both containers and virtual machines, regardless of the Linux version, can easily access high-performance CPFS parallel file systems, which undoubtedly helps speed up the landing of automatic driving scenarios. For more information, please go to https://yqh.aliyun.com/live/detail/28624 Make an appointment for live broadcast.
Later, we will continue to share the technological evolution of CPFS in data lake ecological integration, small file computing, caching technology, etc. We hope to continue to pay attention to this series of articles.
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Explore More Special Offers
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00