How CNFS breaks the container persistent storage dilemma
According to Forrester's prediction, by 2022, global enterprises and organizations will run containerized applications in production environments. Looking at today's container and Kubernetes applications, we can see two common phenomena. First of all, hosting Kubernetes on the cloud has become the preferred choice for enterprises to run containers on the cloud. In addition, the way users use containers is also changing. From stateless applications to core enterprise applications to data intelligence applications, more and more enterprises use containers to deploy stateful applications with production level, high complexity and high performance computing. Such as Web services, content databases, databases, and even DevOps, AI/big data applications.
In the cloud native era, how do we solve the arrangement and storage of massive containers? How to improve the performance and stability of container storage?
Evolution of storage capacity under the trend of container application
With the gradual evolution of infrastructure from physical machines to virtual machines, to the container environment represented by Kubernetes, and even to Serverless, today's computing and applications are facing tremendous changes. The most obvious is that from the past application in virtual machines to monopolize a CPU memory partition, to today's application in Serverless to provide services for users at the functional level.
Under such a technical system, the storage capacity also needs to be changed, mainly in the following aspects:
1. High density
In the era of virtual machine, a virtual machine corresponds to a complete storage space, which can be used to store all data-related access and storage requirements required by a single application. However, today's storage in the serverless environment of K8S is shared. A container needs to access a huge storage resource pool. The cost is that the storage density is very high, and the requirements for accessing the same storage capacity become higher.
When we create a physical or virtual machine, we often access and use storage media in a relatively stable period. However, in today's container environment, the elastic scaling of front-end computing services is very fast, which may need to change from dozens to hundreds in an instant, so it also requires very high elastic storage capacity.
3. Data isolation
In K8s and Serverless, it is difficult to monopolize memory and storage resources, because storage resources, computing resources, even the operating system and some dependent basic packages are shared in the container environment. At that time, we need to achieve a security isolation at the infrastructure level, and also at the upper application level, we need to achieve data isolation through perfect security policies and means, which is also a very big change and challenge.
What storage capabilities do enterprises need in a container environment?
Block storage, file storage, and object storage are common container storage solutions. What file storage capabilities do enterprises need in a container environment?
1. Application compatibility
It is difficult for us to quickly change the overall application mode of the enterprise. In many scenarios, enterprises use shared or distributed storage clusters. At this time, the compatibility of storage for applications is very important. It is an urgent and important demand to meet whether it can be consistent in the container environment and the non-container environment, so that the transformation of applications can be reduced as much as possible, even without transformation.
2. Extreme elasticity
A major feature of container deployment needs to meet the rapid elastic demand with the peak and trough of business. When the upper layer of computing becomes elastic, the lower storage needs to be able to follow up quickly, instead of spending a lot of time to synchronize the underlying data.
In big data, high-performance computing and other scenarios, the application data set is very large, and the magnitude is often TB or more than 10 TB, and some scenarios can even reach hundreds of TB. If the data of this specification cannot be shared, but needs to be synchronized through copy transmission in an elastic container environment, which is difficult to guarantee the pressure of cost and the loss of timeliness.
4. Safe and reliable
No matter how the underlying infrastructure changes, whether it is physical machine, virtual machine, K8s container or Serverless, no matter how abstract it is, the most fundamental demand of business applications must be security, and applications cannot pollute each other. Therefore, storage must be based on the data sharing ability to ensure data security.
5. Optimize costs
Enterprises' pursuit of cost optimization is tireless in almost all application scenarios. Even in the most core application scenario, we still need to control costs. Because today's business growth and change are very fast, and the growth of data is also very fast. How to optimize the cost while the data is growing rapidly is also a great challenge to storage.
Alibaba Cloud Container Network File System CNFS
In response to the advantages and challenges of using file storage in containers, Alibaba Cloud launched the container network file system CNFS, which is built into the Kubernetes service ACK hosted by Alibaba Cloud. CNFS abstracts Alibaba Cloud's file storage into a K8s object (CRD) for independent management, including operation and maintenance operations such as creation, deletion, description, mounting, monitoring, and expansion, so that users can enjoy the convenience of using file storage in containers, improve the performance and data security of file storage, and provide container consistent declarative management.
CNFS has made deep optimization for container storage in terms of elasticity, performance optimization, accessibility, observability, data protection, declarative, etc., making it have the following obvious advantages compared with similar solutions:
1. In terms of storage types, CNFS supports file storage and object storage, and currently supports Alibaba Cloud's NAS, CPFS, and OSS cloud products
2. Support Kubernetes-compatible declarative lifecycle management, and can manage containers and storage in a one-stop manner
3. Support online and automatic capacity expansion of PV, and optimize for container elastic scalability
4. Support better data protection combined with Kubernetes, including PV snapshot, recycle bin, deletion protection, data encryption, data disaster recovery, etc
5. Support application-level application consistency snapshots, automatically analyze application configuration and storage dependency, one-click backup and one-click restore
6. Support PV level monitoring
7. Support better access control and improve the permission security of the shared file system, including directory level Quota and ACL
8. Provide performance optimization, and provide microsecond performance optimization for small file reading and writing of file storage
9. Cost optimization, providing low-frequency media and conversion strategies, and reducing storage costs
Typical use scenarios and best practices
1. Extreme elastic container application scenario
Take the Internet and large financial service burst applications for example. In this scenario, there is a need to expand a large number of containers in a short time, and there is a high requirement for resource flexibility. Therefore, container storage needs to have general flexibility and rapid scalability. Typical applications of such scenarios include: media/entertainment/live broadcast, web services/content management, financial services, games, continuous integration, machine learning, high-performance computing, etc.
In this scenario, Pod needs to flexibly mount and unload the storage PV. The storage mount needs to match the container's fast start, and there are a lot of file I/Os; When massive persistent data grows rapidly, the pressure on storage costs will also be greater. It is recommended to use the combination of ACK+CNFS+NAS, combined with CNFS, to achieve the following optimization:
• Built-in file storage class, which can start thousands of containers in a short time and mount file storage PV in milliseconds
• Built-in file system NAS, which can provide shared read and write capabilities for massive containers, and quickly achieve high availability of container applications/data
• Optimize for low latency and small files, achieve microsecond read and write performance, and address the requirements of high concurrent access of containers on file storage performance
• Provide file storage lifecycle management, automatic hot and cold grading, and reduce storage costs
2. AI container application scenario
Now more and more AI businesses are trained and reasoned in containers. The combination of cloud computing infrastructure and IDC also provides AI with more flexible computing power scheduling. When AI business is trained and reasoned on the cloud, the application data set is very large. For example, in the field of automatic driving, the data set can reach the scale of 10 PB or even more than 100 PB. In order to conduct AI training under such a huge amount of data, it is necessary to ensure the timeliness of training, which makes container AI mainly face the following challenges:
• The data flow of AI is complex, and there is IO bottleneck of storage system;
• AI training and reasoning require high-performance computing and storage;
• AI computing power coordination, cloud and IDC resources/applications need unified scheduling
For this scenario, it is recommended to use the combination of ACK NMS cluster+CNFS+file storage NAS/CPFS to achieve the following optimization:
• Optimized file storage NAS read and write performance, providing high-performance shared storage, perfectly matching AI scenarios, supporting massive small file access, accelerating AI training and reasoning performance
• GPU cloud servers, bare metal servers (DPCA) and other computing clusters that are adapted to the container environment provide ultra-high throughput and ultra-high IOPS capabilities; CPFS can also support on-cloud/off-cloud hybrid deployment
• ACK NM cluster, which supports the Kubernetes cluster built by ACK NM IDC, forms a unified resource pool on/off the cloud, and uniformly schedules heterogeneous resources/applications to maximize the computing advantages of cloud computing infrastructure
3. Application scenario of gene computing
Now gene detection technology has gradually become mature and has been introduced in many hospitals to solve complex diseases more accurately and quickly through the measurement of patients' genes. For each of us, the number of gene sampling data is very large, and the number of moving tracks is tens of GB. However, when conducting some kind of targeted gene analysis, it is not enough to only collect individual samples. It may need to collect 100000 or even millions of samples, which will bring great challenges to container storage, including:
• Data mining of large-scale samples requires massive computing resources and storage resources. Data growth is fast, storage costs are high, and management is difficult.
• Massive data needs to be distributed quickly and safely to many places in China, and multiple data centers need to share access
• The batch sample processing time is long, the performance demand is high, the resource demand peak and valley are obvious, and it is difficult to plan
For the scenario of genetic computing, it is recommended to use the combination of ACK+AGS+CNFS+file storage NAS+OSS to solve the following problems:
• The file storage class built in NFS can quickly build a fast, low-cost and high-precision gene computing container environment to meet the needs of gene sequencing and data sharing
• CNFS supports object storage of OSS-type PV, which can save offline data, post-assembly data and analysis result data for data distribution, archiving and delivery, ensuring that massive users can upload and download data at the same time, and improving data delivery efficiency. At the same time, massive storage space is provided, and cold data is archived and stored through life cycle management to reduce storage costs
• AGS performs GPU accelerated calculation for hot data of gene computing, which improves the performance by 100 times compared with the traditional mode, and rapidly reduces the time and cost of gene sequencing
Of course, in addition to the above three typical representatives, CNFS can provide in-depth optimization solutions for the combination of container and storage for businesses in many scenarios. Welcome to learn about: https://help.aliyun.com/document_detail/264678.html
Case: Use CNFS and file storage to build modern enterprise applications
Through deep integration with CNFS, Alibaba Cloud file storage NAS has become the most ideal solution for container storage. Here are some real customer cases to help you more directly understand how to use Alibaba Cloud container service ACK and file storage to build modern enterprise applications.
Baijia Cloud is a leading one-stop video service provider in China. During the epidemic, the traffic of hundreds of cloud companies soared, and the business volume increased dozens of times in a short time. Such rapid expansion needs to be completed without the customer's awareness; In addition, hundreds of cloud business scenarios have a large number of read and write requirements. At the same time, the computing cluster has expanded four clusters horizontally. In the process of recording and transcoding, the original storage system has encountered an IO bottleneck, which is a severe test for the processing capacity of hundreds of cloud with large traffic and high concurrency.
The requirements for storage include fast adaptation to the elastic extension of container applications and fast data access after scaling. Finally, through the combination of Alibaba Cloud container service ACK and file storage NAS, the container cluster architecture was optimized, and the elastic expansion of 10 times of resources in 3 days was achieved.
The file storage NAS can be flexibly expanded on demand. Based on the container service ACK, it can automatically and regularly scale. Thousands of containers can be started in a short time, which can perfectly adapt to the flexibility of container applications. Adopt file storage NAS, provide standard access interface compatible with mainstream transcoding software, and easily mount editing video workstation. The K8s cluster of Baijia Cloud has extremely high performance requirements. Through high-performance NAS services, it can provide up to 10GB of large throughput, solve the IO bottleneck, perfectly cope with the scene of large traffic and high concurrency of Baijia Cloud, and ensure the smooth launch of live recording business during the epidemic.
The second case is a typical customer in the automobile industry. This customer is a leading intelligent automobile manufacturer in China, and also a technology company integrating the cutting-edge innovation of the Internet and artificial intelligence. Its products carry a number of artificial intelligence technology services, such as voice assistant, automatic driving, etc.
The problem faced by the enterprise is that in the automatic driving scenario, the training materials are usually hundreds of millions of small 100KB pictures, with a total amount of up to hundreds of terabytes. During the training process, the GPU usually needs to repeatedly and randomly access a part of the pictures in the training set, and needs the file system to provide high IOPS file access capability, so as to speed up the training process; The stability and performance of large-scale storage system cannot be linearly expanded with the scale; And with the rapid growth of storage resources, it has also brought problems such as high cost and complex operation and maintenance management.
By using Alibaba Cloud file storage, the high-performance computing platform that perfectly supports the customer's intelligent driving has finally improved the training speed of random access to small files by 60%; The files are stored in multiple data nodes in the cluster, and multiple clients access them at the same time, supporting parallel expansion; And Alibaba Cloud file storage supports multi-level storage data flow, greatly simplifying the process of automatic driving data collection, transmission and storage.
Finally, there is a case from the genetic computing scenario. The customer is a leading life science leading agency in the world. Problems faced by customers: data growth is fast, current storage cannot meet the linear expansion requirements of capacity and performance, and genetic computing performance meets IO bottleneck; Large sample data storage costs are high and management is difficult.
By using the container cluster to mount files to store the shared storage in the genetic data analysis of NAS high-performance computing, save the offline data, the assembled data, and the intermediate data in the process, it provides the organization with low latency and high IOPS container storage capacity. The storage performance is improved from 1GB/s to 10GB/s, and the data is processed end-to-end in 12 hours, including data being uploaded to the cloud and results being distributed to the cloud.
File storage NAS provides elastic expansion and high throughput bandwidth. According to the different business scales, NAS allocates capacity and provides matching bandwidth, which not only meets the requirements of business elasticity, but also saves TCO; File storage NAS schedules heterogeneous computing resources on and off the cloud through a unified process and unified resources, and completes genetic computing tasks with low cost and efficiency.
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Explore More Special Offers
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00