By Alibaba Cloud CNFS team
The percentage of containerized applications is growing rapidly because of the popularization of cloud-native. Kubernetes has become an infrastructure in the cloud-native era.
According to Forrester, containerized applications will be applied to the production environment of enterprises and organizations worldwide by 2022. We can see two common features from the situation of containers and Kubernetes applications today. First, hosting Kubernetes on the cloud has become a priority for enterprises to migrate to the cloud and run containers. The way users use containers is also changing from stateless applications and core enterprise applications to data intelligence applications. More enterprises are using containers to deploy stateful applications with high complexity and high-performance computing for production. These stateful applications include Web services, content databases, databases, and even DevOps, AI, and big data applications.
In the cloud-native era, how can we solve the problem of orchestration and storage of massive containers? How can we improve the performance and stability of container storage?
Virtual machines (container environments represented by Kubernetes to Serverless), computing, and applications nowadays are facing significant changes because of the gradual evolution of infrastructure from physical machines. In the past, applications took up an exclusive CPU memory partition in virtual machines. Today, applications can provide services for users at the functional level in Serverless.
Under such a technical system, the storage capacity also needs to change accordingly, mainly in the following aspects:
1. High Density
In the era of virtual machines, one virtual machine corresponds to one complete storage space that can be used to store all the data-related access and storage requirements needed by an application. However, the storage of Kubernetes in the serverless environment is shared today. Containers need to access a huge storage resource pool. The cost is that the storage density is very high, and the requirements for the accessing capability are also higher.
When we create a physical or virtual machine, we often access and use a storage medium within a relatively stable period. However, in today's container environment, the auto scaling of frontend computing services is changing very quickly, involving hundreds of machines in an instant. Thus, a very high elastic storage capacity is needed.
3. Data Isolation
In Kubernetes and Serverless, it is difficult to exclusively occupy memory and storage resources because storage resources, computing resources, the operating system, and some dependent basic packages are shared in the container environment. Then, it is necessary to achieve security isolation at the infrastructure level. At the upper application level, it is also necessary to achieve data isolation through perfect security policies and means. This is a significant change and a challenge.
Block storage, file storage, and object storage are common container storage solutions. However, what kind of file storage capabilities do enterprises need in a container environment?
1. Application Compatibility
It is difficult for us to change the overall application mode of the enterprise quickly. In many scenarios, enterprises use shared or distributed storage clusters. In these scenarios, storage compatibility to applications is very important. We need to ensure the consistency between the container and non-container environment, so the transformation of the application is as little as possible, even without the need for transformation. This is an urgent and important demand to meet.
2. Extreme Elasticity
One of the major features of container deployment is that it needs to meet rapid elasticity requirements with the peaks and troughs of the business. When the upper-level computing becomes elastic, the lower-level storage also needs to be able to follow up quickly, instead of spending a lot of time synchronizing the underlying data.
The data sets of applications are very large in scenarios, such as big data and high-performance computing, which can be dozens or hundreds of terabytes in some scenarios. If data of this size cannot be shared but needs to be synchronized through copy and transmission in an elastic container environment, it is difficult to meet the requirements of cost and timeliness.
4. Security and Reliability
No matter how the underlying infrastructure changes and is abstracted, the most fundamental demand of business applications is security. Applications cannot pollute each other. Therefore, storage must be based on data sharing capabilities to ensure data security.
5. Cost Optimization
The pursuit of cost optimization by enterprises exists in all application scenarios. Even in the most core application scenarios, we still need to control costs. Today, business growth and changes are very rapid, and the growth of data is also very fast. Learning how to optimize the cost while the data is growing rapidly is also a very big challenge for storage.
Alibaba Cloud launched Container Network File System (CNFS) to use the advantages and address the challenges of NAS in containers. It is built into Alibaba Cloud ACK. CNFS abstracts Alibaba Cloud NAS into a Kubernetes object (CRD) for independent management, including O&M operations, such as creating, deleting, describing, mounting, monitoring, and scale-out. CNFS enables users to enjoy the convenience of using NAS in containers, improves NAS performance and data security, and provides container-consistent declarative management.
CNFS is deeply optimized for container storage in terms of auto scaling, performance optimization, accessibility, observability, data protection, and declaration. As such, CNFS has the following prominent advantages compared with similar solutions:
1. Container Application Scenarios with Extreme Elasticity
Let's take applications with burst traffic on the Internet and large-scale financial services as an example. A large number of containers need to be scaled out elastically in a short time in these scenarios. This has a high demand for auto scaling capability of resources. Therefore, container storage must have general elasticity and fast scaling capabilities. Typical scenarios include media, entertainment, livestreaming, web services, content management, financial services, gaming, continuous integration, machine learning, and high-performance computing.
In these scenarios, Pod needs to mount and unmount storage PV flexibly. The mounting of storage PV needs to match the container to achieve quick start, and I/O operations of a large number of files are involved. When a large amount of persistent data grows rapidly, the pressure on storage costs will be relatively large. We recommend that you use a combination of ACK + CNFS + NAS. Combined with CNFS, the following optimizations can be achieved:
2. Application Scenarios of AI Container
Now, more AI services are trained and inferred in containers. The combination of massive infrastructures on the cloud and IDCs also allows more flexible scheduling of computing power for AI. When AI services are trained and inferred on the cloud, the data set of applications is very large. For example, in the autonomous driving field, the data set can reach a scale of 10 or over 100 petabytes. The timeliness of training needs to be ensured for AI training with such a large amount of data, so the container AI mainly faces the following challenges:
For this scenario, we recommend using the combination of ACK + CNFS + NAS/CPFS to get the following optimizations:
3. Application Scenarios of Genetic Computing
Genetic testing technology has gradually matured and has been introduced in many hospitals to help treat complex diseases more accurately and quickly through testing patients' genes. The sampling data of genes is very large, reaching dozens of gigabytes. When performing a certain type of targeted genetic analysis, individual samples are far from enough. It may be necessary to collect thousands (or millions) of samples. This poses significant challenges to container storage, including:
For the genetic computing scenario, we recommend using the combination of ACK + AGS + CNFS + NAS + OSS.
The built-in file storage class of NAS can build a fast, low-cost, and high-precision container environment for genetic computing to meet the requirements of gene sequencing computing and data sharing.
CNFS supports OSS-type PV. It can save off-machine data, data after assembling, and analysis results for data distribution, archiving, and delivery. This ensures that a large number of users can upload and download data at the same time, improving data delivery efficiency. It also provides great storage space and archives cold data through lifecycle management to reduce storage costs.
AGS performs accelerated GPU computing for hot data in genetic computing. Compared with the traditional model, the performance is improved 100 times, reducing the time and cost of gene sequencing significantly.
In addition to the three typical scenarios above, CNFS can provide a deeply optimized solution that combines containers and storage for businesses in many scenarios. For more information, please visit this link
Alibaba Cloud NAS has become the most ideal solution for container storage because of the deep integration with CNFS. The following section introduces a few real customer cases to help you understand how to use Alibaba Cloud ACK and NAS to build modern enterprise applications.
Baijiayun is the leading all-in-one video service provider in China. During the COVID-19 pandemic, the traffic of Baijiayun soared, and the business volume increased substantially in a short time. As a result, rapid expansion needed to be completed without customer awareness. In addition, there is a large number of reading and writing requests in Baijiayun business scenarios, so four clusters were horizontally scaled out in computing clusters. The original storage system encountered an I/O bottleneck during the process of recording and transcoding, which was a severe test for Baijiayun's large-traffic and high-concurrency processing capability.
This required the scale-out to adapt to container applications and fast data access quickly after scale-in. Finally, the container cluster architecture was optimized through the combination of Alibaba Cloud ACK and NAS, and the resources were scaled out ten times within three days.
NAS scales out resources on-demand elastically and scales out automatically based on ACK. Thousands of containers can be started in a short time. It adapts to the elasticity of container applications perfectly. NAS provides standard access interfaces compatible with mainstream transcoding software. It can be mounted to video editing workstations easily. The Kubernetes clusters of Baijiayun have extremely high performance requirements. Baijiayun can provide high throughput of up to ten gigabytes using high-performance NAS. This solved the I/O bottlenecks, handled the challenge of high-traffic and high-concurrency scenarios, and ensured the smooth launch of live recording services during the COVID-19 pandemic.
The second case is from one of our clients in the automotive industry. They are China's leading smart car manufacturer and a technology company that integrates the Internet and cutting-edge innovations in artificial intelligence. Its products are equipped with multiple artificial intelligence technologies and services, such as a voice assistant and autonomous driving.
The company faces the following problems:
In autonomous driving scenarios, the training material is usually composed of hundreds of millions of pictures of 100 kilobytes, with a total size of hundreds of terabytes. During the AI training, the GPU usually needs to access some pictures in the training set repeatedly and randomly. As a result, the file system needs to provide a high-IOPS file access capability to accelerate the training. The stability and performance of large-scale storage systems cannot be expanded linearly. The rapid growth of storage resources also causes problems, such as high costs and complex O&M management.
The customer's high-performance computing platform for intelligent driving is supported perfectly with Alibaba Cloud NAS. The training speed of random access to small files was increased by 60%. Files are stored in multiple data nodes in a cluster and accessed by multiple clients at the same time, which supports parallel expansion. In addition, Alibaba Cloud NAS supports a multi-level flowing of storage data, simplifying the process of collection, transmission, and storage of autonomous driving data.
The final case is from the genetic computing scenario. The customer is the world's leading frontier organization in life sciences. This customer faces the following problems:
The current storage cannot meet the linear expansion requirements of capacity and performance with rapid data growth. Genetic computing performance encounters I/O bottlenecks. Storage costs of large-scale sample data are high, and management is difficult.
We used the container cluster to mount NAS for shared storage of high-performance computing for gene data analysis. The shared storage saved the off-machine data and the data after assembling and the intermediate data in the process. As such, we provided the customer with container storage capabilities with low latency and high IOPS. The storage performance was improved from one gigabyte per second to ten gigabytes per second, and the end-to-end processing of data was completed in 12 hours, including data migration to the cloud and off-cloud distribution of result data.
NAS provides elastic expansion and high-throughput bandwidth. According to the scale of each business, NAS allocates the capacity on-demand and provides matching bandwidth, which meets the requirements of business elasticity and saves TCO. NAS uses unified processes and unified scheduling of heterogeneous computing resources on and off the cloud to complete genetic computing tasks efficiently at a low cost.
Alibaba Cloud Community - April 25, 2022
Alibaba Cloud Native - January 6, 2023
Alibaba Cloud Native - February 20, 2024
Alibaba Cloud Native Community - December 29, 2023
Alibaba Cloud Community - December 21, 2021
Alibaba Cloud Storage - February 10, 2021
Provides a control plane to allow users to manage Kubernetes clusters that run based on different infrastructure resourcesLearn More
Elastic and secure virtual cloud servers to cater all your cloud hosting needs.Learn More
High Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.Learn More
Block-level data storage attached to ECS instances to achieve high performance, low latency, and high reliabilityLearn More
More Posts by Alibaba Cloud Native Community