Born for Data Elasticity, Alibaba Cloud Cloud-native Storage Speeds Up Again

This article explores challenges faced by enterprises running AI and big data applications on Kubernetes, focusing on the decoupling of computing and storage architecture.

By Zhihao and Zhanyi

An increasing number of enterprises are now running AI and big data applications on Kubernetes, which enhances resource elasticity and development efficiency. However, the decoupling of computing and storage architecture introduces challenges such as high network latency, expensive network costs, and inadequate storage service bandwidth.

Let's consider high-performance computing cases like AI training and gene computing. In these scenarios, a large number of computations need to be executed concurrently within a short timeframe, and multiple computing instances share access to the same data source from the file system. To address this, many enterprises utilize the Apsara File Storage NAS or CPFS service, mounting it to computing tasks executed by Alibaba Cloud Container Service for Kubernetes (ACK). This implementation enables high-performance shared access to thousands of computing nodes.

As cloud-native machine learning and big data scenarios witness an increase in computing power, model size, and workload complexity, high-performance computing demands faster and more elastic access to parallel file systems.

Consequently, providing an elastic and swift experience for containerized computing engines has become a new storage challenge.

To tackle this challenge, we have launched the Elastic File Client (EFC) to establish a cloud-native storage system. Our aim is to leverage the high extensibility, native POSIX interface, and high-performance directory tree structure of Alibaba Cloud's file storage service to build this cloud-native storage system. Moreover, we integrate EFC with Fluid, a cloud-native data orchestration and acceleration system, to enable visibility, auto-scaling, data migration, and computing acceleration of datasets. This combination provides a reliable, efficient, and high-performance solution for shared access to file storage in cloud-native AI and big data applications.

New Data Abstraction of Cloud-native Fluid

Fluid [1] is a cloud-native distributed data orchestration and acceleration system designed for data-intensive applications like big data and AI.

Unlike traditional storage-oriented Persistent Volume Claims (PVCs), Fluid introduces the concept of elastic datasets (Datasets) from the application's perspective, abstracting the "process of utilizing data on Kubernetes."

Fluid enables flexible and efficient movement, replication, eviction, transformation, and management of data between various storage sources (such as NAS, CPFS, OSS, and Ceph) and upper-level Kubernetes applications.

Fluid can implement CRUD operations, permission control, and access acceleration for datasets. Users can directly access abstracted data in the same way as they access native Kubernetes data volumes. Fluid currently focuses on two important scenarios: dataset orchestration and application orchestration.

• In terms of dataset orchestration, Fluid can cache data of a specified dataset to a Kubernetes node with specified features to improve data access speed.
• In terms of application orchestration, Fluid can schedule specified applications to nodes that have stored specified datasets to reduce data transfer costs and improve computing efficiency.

The two can also be combined to form a collaborative orchestration scenario, where dataset and application requirements are considered for node resource scheduling.

Fluid offers an efficient and convenient data abstraction layer for cloud-native AI and big data applications. It encompasses the following core features related to abstracted data:

Unified Abstraction of Application-Oriented Datasets:

Fluid abstracts datasets not only by consolidating data from multiple storage sources but also by describing data mobility and characteristics. Additionally, it provides observability features such as total dataset volume, current cache space size, and cache hit ratio. With this information, users can evaluate the need for scaling up or down the cache system.

Extensible Data Engine Plug-ins:

While Dataset is a unified abstract concept, different storage systems have distinct runtime interfaces, requiring different runtime implementations for actual data operations. Fluid provides Cache Runtime and Thin Runtime. Cache Runtime facilitates cache acceleration using various open-source distributed cache runtimes like Alluxio, Juice FS, Alibaba Cloud EFC, Jindo, and Tencent Cloud GooseFS. Thin Runtime offers unified access interfaces (e.g., s3fs, nfs-fuse) to integrate with third-party storage systems.

Automated Data Manipulation:

Fluid enables various operations, including data prefetching, data migration, and data backup, using Custom Resource Definitions (CRDs). It supports one-time, scheduled, and event-driven operations, allowing integration with automated operations and maintenance (O&M) systems.

General Data Acceleration:

Combining distributed data caching technology with features like autoscaling, portability, observability, and scheduling capabilities, Fluid improves data access performance.

Runtime Platform Independence:

Fluid supports multiple Kubernetes forms, including native Kubernetes, edge Kubernetes, serverless Kubernetes, and multi-cluster Kubernetes. It can run in diverse environments such as public clouds, edge environments, and hybrid clouds. Depending on the environment, Fluid can be run using either the CSI Plugin or sidecar mode for the storage client.

EFC for Cloud-native Storage, Elastic Acceleration to Ensure Business Stability

After implementing cloud-native modernization, enterprise applications can build more flexible services. The question is, how can the storage of application data be cloud-native synchronously?

What is cloud-native storage?

Cloud-native storage is not merely a storage system built on the cloud or deployed in Kubernetes containers. It refers to a storage service that seamlessly integrates with Kubernetes environments to meet the requirements of business elasticity and agility.

Cloud-native storage must fulfill the following requirements:

Stable storage service: Each node in the system must exhibit stability and self-recovery capabilities that meet the necessary criteria. For instance, in the case of file storage, a failure in the NFS client or FUSE would previously only impact one ECS instance. However, within a cloud-native architecture, a single point of storage failure could affect several pods in a container cluster.
Elastic storage capacity and performance: Traditional distributed storage performance increases with capacity. However, the performance requirements for storage in cloud-native environments change rapidly with the expansion of pods. The storage system needs to achieve elastic performance when the computing scale increases rapidly.
Large-scale scaling of computing pods: Cloud-native application scenarios require high agility and flexibility of services. In many scenarios, fast startup and flexible scheduling of containers are required. It is common for 1,000 to 2,000 pods to be popped up in a minute. This requires that volumes can be mounted quickly based on changes in pods.
Observable pod granularity: Most storage services provide sufficient monitoring capabilities at the file system level. But from the perspective of cloud-native, monitoring data from the perspectives of PV and dataset can truly help cloud-native platform administrators.
Near-local storage performance under the separation of storage and computing: The separation of storage and computing brings elasticity and agility. However, the consumption of network latency and remote access protocols also significantly reduces the I/O performance of pods accessing storage. New techniques are needed to reduce the negative performance impact.

However, none of the above requirements can be solved independently by relying on storage backend services or clients.

Therefore, Alibaba Cloud launched the Elastic File Client (EFC), which combines the high extensibility, native POSIX interface, and high-performance directory tree structure of Elastic File Client file storage service to build a cloud-native storage system. EFC replaces the traditional kernel-mode NFS client of NAS and provides acceleration capabilities such as multi-link access, metadata cache, and distributed data cache. It also provides end-side performance monitoring, QoS capability, and hot upgrade capability.

At the same time, EFC avoids the problem that POSIX clients that use open source FUSE cannot fail over within seconds, ensuring the stability during large-scale computing.

EFC Runtime Core Capabilities, Tailored for Data-Intensive Applications.

EFC Runtime is a runtime type implementation that supports the acceleration of Dataset access. It uses EFC as its cache engine. Fluid enables visibility, auto scaling, data migration, and compute the acceleration of datasets by managing and scheduling EFC Runtime. Using and deploying EFC Runtime on Fluid is simple. EFC Runtime on Fluid is compatible with the native Kubernetes environment, and can automatically and controllably improve data throughput.

You can access Apsara File Storage by using EFC Runtime to obtain the following capabilities in addition to the enterprise-level basic features of Apsara File Storage:

POSIX. EFC provides standard POSIX interfaces. EFC uses Apsara File Storage NAS and CPFS services to provide container applications with the ability to access shared data through POSIX interfaces.
Failover in seconds. EFC supports failover in seconds. When the FUSE process crashes or performs a version upgrade due to various reasons, EFC can be automatically pulled up in seconds to ensure that business I/O is almost unaffected.
Strongly consistent semantics. EFC uses the strongly consistent distributed lease mechanism to implement strong consistency between files and directories. Files written to a pod can be read by other pods immediately. After a new file is created, all other clients can access the file simultaneously. This makes it easier for users to manage data across multiple nodes.
Powerful on-side caching capability: EFC optimizes the caching logic of FUSE to provide better read and write performance of small files. Compared with traditional NFS clients, EFC improves performance by more than 50%.
Distributed cache capability: EFC incorporates the distributed cache technology developed by Alibaba Cloud. It combines the memory of multiple nodes into a large cache pool. Hot data required for computing does not need to be read from the remote end each time, and the throughput and cache pool can increase as the computing scale increases.
Small file prefetching capability: EFC prefetches hot data from hot directories in a targeted manner to reduce the overhead of pulling data.

The training Time can be shortened by 87%, And the Performance is Better Than That of Open-source NFS

We use the insightface(ms1m-ibug) dataset [2], and use the Arena [3] based on the Kubernetes cluster to verify the concurrent read speed on this dataset. As a result, based on EFC Runtime, when the local cache is enabled, the performance is significantly better than that of open source nfs, and the training time is shortened by 87%.

How to Get Started Quickly Using EFC Runtime?

The following uses Apsara File Storage NAS as an example of how to use Fluid EFC Runtime to accelerate access to NAS files.

First, you need to prepare the Alibaba Cloud ACK Pro cluster and the Alibaba Cloud NAS file system.

Then, you only need to spend about 5 minutes creating the required EFC Runtime environment. The process of using EFC Runtime is very simple. You can follow the following procedures to deploy the EFC Runtime environment.

Step1: Create a Dataset and EFC Runtime

Create a dataset.yaml file that contains the following two parts:

Dataset custom resource information. In Dataset, declare the URL of the Apsara File Storage NAS file system to be mounted (replace) and the sub-path in NAS (replace).
An EFC Runtime, which is equivalent to starting an EFC distributed cluster to provide caching services.

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: efc-demo
spec:
  placement: Shared
  mounts:
    - mountPoint: "nfs://<nas_url>:<nas_dir>"
      name: efc
      path: "/"
---
apiVersion: data.fluid.io/v1alpha1
kind: EFCRuntime
metadata:
  name: efc-demo
spec:
  replicas: 3
  master:
    networkMode: ContainerNetwork
  worker:
    networkMode: ContainerNetwork
  fuse:
    networkMode: ContainerNetwork
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 15Gi

mountPoint: the path of the NAS or CPFS file system. For example, the format of NAS is nfs://:, and the format of CPFS is cpfs://:. If no subdirectories are required, the root directory can be used. For more information, please refer to the document [4].
replicas: the number of cache workers in the created distributed cluster. This parameter can be adjusted based on the memory configuration of compute nodes and the size of datasets. It is recommended that the product of quota and replicas be greater than the total size of the dataset to be cached.
The optional values for the network are Container Network and Host Network. We recommend that you select Container Network in the ACK environment because Container Network doesn't render additional performance loss.
mediumtype: the cache type. Only one of the cache types in HDD, SSD, and MEM is supported. MEM represents memory. We recommend that you use MEM. When MEM is used, the cache data storage directory specified by path must be an in-memory file system (for example, tmpfs)
path: the cache data storage directory of the EFC cache system worker. It is recommended to keep /dev/shm.
quota: the maximum cache capacity provided by a single Worker component. This parameter can be adjusted based on the memory configuration of compute nodes and the size of datasets. It is recommended that the product of quota and replicas be greater than the total size of the dataset to be cached.

kubectl create -f dataset.yaml

To view the Dataset:

$ kubectl get dataset efc-demo

The expected output is:

NAME       UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
efc-demo                                                                  Bound   24m

Step 2: Create an application container to experience the acceleration effect

You can create an application container to use the EFC acceleration service, or submit machine learning jobs to experience related features.

Next, we will create two application containers to access the same 10GB file in the dataset. You can also use another file for testing, which needs to be pre-stored in the NAS file system.

Define the following app.yaml file:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: efc-app
  labels:
    app: nginx
spec:
  serviceName: nginx
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        command: ["/bin/bash"]
        args: ["-c", "sleep inf"]
        volumeMounts:
        - mountPath: "/data"
          name: data-vol
      volumes:
        - name: data-vol
          persistentVolumeClaim:
            claimName: efc-demo

Run the following command to view the size of the data file to be accessed:

kubectl exec -it efc-app-0 -- du -h /data/allzero-demo
10G     /data/allzero-demo

Run the following command to check the read time of the file in the first application container. If you use your own real data file, replace /data/allzero-demo with the real file path.

kubectl exec -it eac-app-0 -- bash -c "time cat /data/allzero-demo > /dev/null"

The expected output is:

real    0m15.792s
user    0m0.023s
sys     0m2.404s

Then, in another container, test the time taken to read the same 10G file. If you use your own real data file, please replace /data/allzero-demo with the real file path):

kubectl exec -it efc-app-1 -- bash -c "time cat /data/allzero-demo > /dev/null"

The expected output is:

real    0m9.970s
user    0m0.012s
sys     0m2.283s

From the above output information, it can be found that the throughput is improved from the original 648MiB/s to 1034.3MiB/s, and the reading efficiency for the same file is improved by 59.5%.

Summary and Outlook

By combining Fluid with EFC, you can better support AI and big data services in cloud-native scenarios. This combination can improve data usage efficiency and enhance the integration of automated O&M through standardized operations such as data preheating and migration.

In addition, we will also support running in serverless scenarios to provide a better-distributed file storage access experience for serverless containers.

Community

Born for Data Elasticity, Alibaba Cloud Cloud-native Storage Speeds Up Again

New Data Abstraction of Cloud-native Fluid

EFC for Cloud-native Storage, Elastic Acceleration to Ensure Business Stability

EFC Runtime Core Capabilities, Tailored for Data-Intensive Applications.

The training Time can be shortened by 87%, And the Performance is Better Than That of Open-source NFS

How to Get Started Quickly Using EFC Runtime?

Step1: Create a Dataset and EFC Runtime

Step 2: Create an application container to experience the acceleration effect

Summary and Outlook

Related Links

Read previous post:

Read next post:

Alibaba Cloud Native Community

You may also like

Comments

Alibaba Cloud Native Community

Related Products

Cloud-Native Applications Management Solution

E-Commerce Solution

Container Service for Kubernetes

Data Lake Storage Solution