×
Community Blog Alibaba Cloud CNFS Breaks the Dilemma of Container Persistent Storage for Enterprise-Level Cloud Native

Alibaba Cloud CNFS Breaks the Dilemma of Container Persistent Storage for Enterprise-Level Cloud Native

This article discusses the importance and future of containerized applications and Kubernetes applications.

By Alibaba Cloud CNFS team

The percentage of containerized applications is growing rapidly because of the popularization of cloud-native. Kubernetes has become an infrastructure in the cloud-native era.

According to Forrester, containerized applications will be applied to the production environment of enterprises and organizations worldwide by 2022. We can see two common features from the situation of containers and Kubernetes applications today. First, hosting Kubernetes on the cloud has become a priority for enterprises to migrate to the cloud and run containers. The way users use containers is also changing from stateless applications and core enterprise applications to data intelligence applications. More enterprises are using containers to deploy stateful applications with high complexity and high-performance computing for production. These stateful applications include Web services, content databases, databases, and even DevOps, AI, and big data applications.

1

In the cloud-native era, how can we solve the problem of orchestration and storage of massive containers? How can we improve the performance and stability of container storage?

The Evolution of Storage Capabilities under the Trend of Application Containerization

Virtual machines (container environments represented by Kubernetes to Serverless), computing, and applications nowadays are facing significant changes because of the gradual evolution of infrastructure from physical machines. In the past, applications took up an exclusive CPU memory partition in virtual machines. Today, applications can provide services for users at the functional level in Serverless.

2

Under such a technical system, the storage capacity also needs to change accordingly, mainly in the following aspects:

1.  High Density

In the era of virtual machines, one virtual machine corresponds to one complete storage space that can be used to store all the data-related access and storage requirements needed by an application. However, the storage of Kubernetes in the serverless environment is shared today. Containers need to access a huge storage resource pool. The cost is that the storage density is very high, and the requirements for the accessing capability are also higher.

2.  Elasticity

When we create a physical or virtual machine, we often access and use a storage medium within a relatively stable period. However, in today's container environment, the auto scaling of frontend computing services is changing very quickly, involving hundreds of machines in an instant. Thus, a very high elastic storage capacity is needed.

3.  Data Isolation

In Kubernetes and Serverless, it is difficult to exclusively occupy memory and storage resources because storage resources, computing resources, the operating system, and some dependent basic packages are shared in the container environment. Then, it is necessary to achieve security isolation at the infrastructure level. At the upper application level, it is also necessary to achieve data isolation through perfect security policies and means. This is a significant change and a challenge.

What Kind of Storage Capacity Does an Enterprise Need in a Container Environment?

Block storage, file storage, and object storage are common container storage solutions. However, what kind of file storage capabilities do enterprises need in a container environment?

3

1.  Application Compatibility

It is difficult for us to change the overall application mode of the enterprise quickly. In many scenarios, enterprises use shared or distributed storage clusters. In these scenarios, storage compatibility to applications is very important. We need to ensure the consistency between the container and non-container environment, so the transformation of the application is as little as possible, even without the need for transformation. This is an urgent and important demand to meet.

2.  Extreme Elasticity

One of the major features of container deployment is that it needs to meet rapid elasticity requirements with the peaks and troughs of the business. When the upper-level computing becomes elastic, the lower-level storage also needs to be able to follow up quickly, instead of spending a lot of time synchronizing the underlying data.

3.  Sharing

The data sets of applications are very large in scenarios, such as big data and high-performance computing, which can be dozens or hundreds of terabytes in some scenarios. If data of this size cannot be shared but needs to be synchronized through copy and transmission in an elastic container environment, it is difficult to meet the requirements of cost and timeliness.

4.  Security and Reliability

No matter how the underlying infrastructure changes and is abstracted, the most fundamental demand of business applications is security. Applications cannot pollute each other. Therefore, storage must be based on data sharing capabilities to ensure data security.

5. Cost Optimization

The pursuit of cost optimization by enterprises exists in all application scenarios. Even in the most core application scenarios, we still need to control costs. Today, business growth and changes are very rapid, and the growth of data is also very fast. Learning how to optimize the cost while the data is growing rapidly is also a very big challenge for storage.

Alibaba Cloud Container Network File System (CNFS)

Alibaba Cloud launched Container Network File System (CNFS) to use the advantages and address the challenges of NAS in containers. It is built into Alibaba Cloud ACK. CNFS abstracts Alibaba Cloud NAS into a Kubernetes object (CRD) for independent management, including O&M operations, such as creating, deleting, describing, mounting, monitoring, and scale-out. CNFS enables users to enjoy the convenience of using NAS in containers, improves NAS performance and data security, and provides container-consistent declarative management.

4

CNFS is deeply optimized for container storage in terms of auto scaling, performance optimization, accessibility, observability, data protection, and declaration. As such, CNFS has the following prominent advantages compared with similar solutions:

  • CNFS supports file storage and object storage. Currently, Alibaba Cloud NAS, CPFS, and OSS are supported.
  • CNFS supports Kubernetes-compatible declarative lifecycle management to manage containers and storage in an all-in-one manner.
  • CNFS supports online and autonomous scale-out of PV. It is optimized for the auto scaling features of containers.
  • CNFS supports better data protection combined with Kubernetes using PV snapshot, recycle bin, deletion protection, data encryption, data disaster recovery, and other measures.
  • CNFS supports application-level consistent snapshots, autonomous analysis of application configuration and storage dependency, one-click backup, and one-click restoration.
  • CNFS supports PV-level monitoring.
  • CNFS supports better access control and improves the permission security of shared file systems, including directory-level Quota and ACL.
  • CNFS provides performance optimization, read and write of small files for file storage, and microsecond-level performance optimization.
  • CNFS provides cost optimization. It reduces storage costs with low-frequency media and conversion strategies.

Typical Scenarios and Best Practices

1.  Container Application Scenarios with Extreme Elasticity

Let's take applications with burst traffic on the Internet and large-scale financial services as an example. A large number of containers need to be scaled out elastically in a short time in these scenarios. This has a high demand for auto scaling capability of resources. Therefore, container storage must have general elasticity and fast scaling capabilities. Typical scenarios include media, entertainment, livestreaming, web services, content management, financial services, gaming, continuous integration, machine learning, and high-performance computing.

In these scenarios, Pod needs to mount and unmount storage PV flexibly. The mounting of storage PV needs to match the container to achieve quick start, and I/O operations of a large number of files are involved. When a large amount of persistent data grows rapidly, the pressure on storage costs will be relatively large. We recommend that you use a combination of ACK + CNFS + NAS. Combined with CNFS, the following optimizations can be achieved:

  • Built-in file storage classes allow thousands of containers that can be started in a short time, and the file storage PV can be mounted in milliseconds.
  • The built-in NAS can provide shared read and write capabilities for a large number of containers. It can achieve the high availability of container applications and data quickly.
  • Microsecond-level read and write are provided, with low latency and optimization for small files. It also meets the demands on file storage performance imposed by high-concurrent access to containers.
  • Lifecycle management of file storage is provided. Automatic hot/cold grading is adopted to reduce storage costs.

2.  Application Scenarios of AI Container

Now, more AI services are trained and inferred in containers. The combination of massive infrastructures on the cloud and IDCs also allows more flexible scheduling of computing power for AI. When AI services are trained and inferred on the cloud, the data set of applications is very large. For example, in the autonomous driving field, the data set can reach a scale of 10 or over 100 petabytes. The timeliness of training needs to be ensured for AI training with such a large amount of data, so the container AI mainly faces the following challenges:

  • The data flow of AI is complex, and there is an I/O bottleneck in the storage system.
  • AI training and inference require high-performance computing and storage.
  • AI requires computing power coordination, unified scheduling of cloud, and IDC resources and applications.

For this scenario, we recommend using the combination of ACK + CNFS + NAS/CPFS to get the following optimizations:

  • Better read and write performance of NAS and high-performance shared storage match perfectly with AI scenarios. Access to massive small files is supported to accelerate AI training and improve inference performance.
  • The computing clusters of GPU Cloud Computing and ECS Bare Metal Instance are adapted to the container environment, providing ultra-high throughput and IOPS capabilities. CPFS can also support hybrid deployment.
  • ACK can manage Kubernetes clusters built by IDCs, forming a unified resource pool on and off the cloud. Unified scheduling of heterogeneous resources and applications makes full use of the computing advantages of massive infrastructure on the cloud.

3.  Application Scenarios of Genetic Computing

Genetic testing technology has gradually matured and has been introduced in many hospitals to help treat complex diseases more accurately and quickly through testing patients' genes. The sampling data of genes is very large, reaching dozens of gigabytes. When performing a certain type of targeted genetic analysis, individual samples are far from enough. It may be necessary to collect thousands (or millions) of samples. This poses significant challenges to container storage, including:

  • Data mining for large-scale samples requires massive computing resources and storage resources. Data growth is fast, storage costs are high, and management is difficult.
  • Massive data needs to be distributed to many places throughout the country quickly and securely. Most data centers need to share access.
  • The processing time of batch samples is long, with high performance requirements. The peaks and troughs of resource requirements are clear, making it difficult to plan.

For the genetic computing scenario, we recommend using the combination of ACK + AGS + CNFS + NAS + OSS.

The built-in file storage class of NAS can build a fast, low-cost, and high-precision container environment for genetic computing to meet the requirements of gene sequencing computing and data sharing.

CNFS supports OSS-type PV. It can save off-machine data, data after assembling, and analysis results for data distribution, archiving, and delivery. This ensures that a large number of users can upload and download data at the same time, improving data delivery efficiency. It also provides great storage space and archives cold data through lifecycle management to reduce storage costs.

AGS performs accelerated GPU computing for hot data in genetic computing. Compared with the traditional model, the performance is improved 100 times, reducing the time and cost of gene sequencing significantly.

In addition to the three typical scenarios above, CNFS can provide a deeply optimized solution that combines containers and storage for businesses in many scenarios. For more information, please visit this link

Example: Building Modern Enterprise Applications with CNFS and NAS

Alibaba Cloud NAS has become the most ideal solution for container storage because of the deep integration with CNFS. The following section introduces a few real customer cases to help you understand how to use Alibaba Cloud ACK and NAS to build modern enterprise applications.

Video Service

Baijiayun is the leading all-in-one video service provider in China. During the COVID-19 pandemic, the traffic of Baijiayun soared, and the business volume increased substantially in a short time. As a result, rapid expansion needed to be completed without customer awareness. In addition, there is a large number of reading and writing requests in Baijiayun business scenarios, so four clusters were horizontally scaled out in computing clusters. The original storage system encountered an I/O bottleneck during the process of recording and transcoding, which was a severe test for Baijiayun's large-traffic and high-concurrency processing capability.

This required the scale-out to adapt to container applications and fast data access quickly after scale-in. Finally, the container cluster architecture was optimized through the combination of Alibaba Cloud ACK and NAS, and the resources were scaled out ten times within three days.

5

NAS scales out resources on-demand elastically and scales out automatically based on ACK. Thousands of containers can be started in a short time. It adapts to the elasticity of container applications perfectly. NAS provides standard access interfaces compatible with mainstream transcoding software. It can be mounted to video editing workstations easily. The Kubernetes clusters of Baijiayun have extremely high performance requirements. Baijiayun can provide high throughput of up to ten gigabytes using high-performance NAS. This solved the I/O bottlenecks, handled the challenge of high-traffic and high-concurrency scenarios, and ensured the smooth launch of live recording services during the COVID-19 pandemic.

Autonomous Driving

The second case is from one of our clients in the automotive industry. They are China's leading smart car manufacturer and a technology company that integrates the Internet and cutting-edge innovations in artificial intelligence. Its products are equipped with multiple artificial intelligence technologies and services, such as a voice assistant and autonomous driving.

The company faces the following problems:

In autonomous driving scenarios, the training material is usually composed of hundreds of millions of pictures of 100 kilobytes, with a total size of hundreds of terabytes. During the AI training, the GPU usually needs to access some pictures in the training set repeatedly and randomly. As a result, the file system needs to provide a high-IOPS file access capability to accelerate the training. The stability and performance of large-scale storage systems cannot be expanded linearly. The rapid growth of storage resources also causes problems, such as high costs and complex O&M management.

6

The customer's high-performance computing platform for intelligent driving is supported perfectly with Alibaba Cloud NAS. The training speed of random access to small files was increased by 60%. Files are stored in multiple data nodes in a cluster and accessed by multiple clients at the same time, which supports parallel expansion. In addition, Alibaba Cloud NAS supports a multi-level flowing of storage data, simplifying the process of collection, transmission, and storage of autonomous driving data.

Genetic Computing

The final case is from the genetic computing scenario. The customer is the world's leading frontier organization in life sciences. This customer faces the following problems:

The current storage cannot meet the linear expansion requirements of capacity and performance with rapid data growth. Genetic computing performance encounters I/O bottlenecks. Storage costs of large-scale sample data are high, and management is difficult.

We used the container cluster to mount NAS for shared storage of high-performance computing for gene data analysis. The shared storage saved the off-machine data and the data after assembling and the intermediate data in the process. As such, we provided the customer with container storage capabilities with low latency and high IOPS. The storage performance was improved from one gigabyte per second to ten gigabytes per second, and the end-to-end processing of data was completed in 12 hours, including data migration to the cloud and off-cloud distribution of result data.

7

NAS provides elastic expansion and high-throughput bandwidth. According to the scale of each business, NAS allocates the capacity on-demand and provides matching bandwidth, which meets the requirements of business elasticity and saves TCO. NAS uses unified processes and unified scheduling of heterogeneous computing resources on and off the cloud to complete genetic computing tasks efficiently at a low cost.

Related Products

0 0 0
Share on

You may also like

Comments

Related Products