Let big data and AI embrace an important puzzle of cloud origin

Thanks to the efficient deployment and agile iteration brought by containerization, as well as the natural advantages of cloud computing in terms of resource cost and elastic expansion, the cloud-native orchestration framework represented by Kubernetes attracts more and more AI and big data applications in Deploy and run on it. However, a native component has been missing in the Cloud Native Computing Foundation (CNCF) panorama to help these data-intensive applications access data efficiently, securely, and conveniently in cloud native scenarios.


How to drive big data and AI applications to run efficiently in cloud-native scenarios is an important challenging issue with both theoretical significance and application value. On the one hand, to solve this problem, it is necessary to consider a series of theoretical and technical problems such as application collaborative orchestration, scheduling optimization, and data caching in complex scenarios. On the other hand, the solution to this problem can effectively promote the application of big data and AI in broad cloud service scenarios. In order to systematically solve related problems, academia and industry have worked closely together. Dr. Gu Rong, an associate researcher at PASALab of Nanjing University, Che Yang, a senior technical expert of Alibaba Cloud container services, and Dr. Fan Bin, a founding member of the Alluxio project, jointly promoted the launch of the Fluid open source cooperation project.

What is Fluid?

Fluid is an open source cloud native infrastructure project. Driven by the background of separation of computing and storage, Fluid's goal is to provide a layer of efficient and convenient data abstraction for AI and big data cloud-native applications, abstracting data from storage in order to achieve:

Through data affinity scheduling and distributed cache engine acceleration, the fusion between data and computing is realized, thereby accelerating computing access to data.
Data is managed independently of storage, and resources are isolated through the Kubernetes namespace to achieve data security isolation.
Combining data from different storages for computing has the opportunity to break the data island effect caused by the differences of different storages.

Through the data layer abstraction provided by Kubernetes services, data can be flexibly and efficiently moved, copied, evicted, converted and managed like a fluid between storage sources such as HDFS, OSS, Ceph and cloud-native application computing on the upper layer of Kubernetes. The specific data operations are transparent to users, and users no longer have to worry about the efficiency of accessing remote data, the convenience of managing data sources, and how to help Kuberntes make operation and maintenance scheduling decisions. Users only need to directly access the abstracted data in the most natural way of Kubernetes native data volumes, and the remaining tasks and underlying details are all handled by Fluid.


The Fluid project currently focuses on two important scenarios: dataset orchestration and application orchestration. Dataset orchestration can cache the data of a specified dataset to Kubernetes nodes with specified characteristics; while application orchestration will specify that the application be scheduled to a node that can or has stored the specified dataset. The two can also be combined to form a collaborative orchestration scenario, that is, collaboratively consider data sets and application requirements for node resource scheduling.

Why Cloud Native Needs Fluid?

There are natural differences in the design concepts and mechanisms between the cloud-native environment and the earlier big data processing frameworks. The Hadoop big data ecosystem, deeply influenced by Google's three papers GFS, MapReduce, and BigTable, has believed in and practiced the concept of "mobile computing instead of data" since its birth. Therefore, data-intensive computing frameworks represented by Spark, Hive, and MapReduce and their applications are designed with more consideration for data localization architecture in order to reduce data transmission. However, with the changes of the times, in order to take into account the flexibility of resource expansion and the cost of use, the architecture of separation of computing and storage is popular in the more emerging cloud-native environment. Therefore, a component like Fluid is needed in the cloud-native environment to supplement the lack of data locality brought about by the big data framework's embrace of cloud-native.

In addition, in a cloud-native environment, applications are usually deployed in a stateless (Stateless) micro-service manner, not centered on data processing; while data-intensive frameworks and applications are usually centered on data abstraction to carry out related computing operations and tasks allocation execution. When the data-intensive framework is integrated into the cloud-native environment, it also needs a data abstraction-centric scheduling and allocation framework like Fluid to work together.

Aiming at the lack of intelligent perception and scheduling optimization of application data in Kubernetes, and the limitation that the data orchestration engine such as Alluxio is difficult to directly control the cloud-native infrastructure layer, Fluid proposes a series of innovations such as data application collaborative orchestration, intelligent perception, and joint optimization. method, and form an efficient support platform for data-intensive applications in cloud-native scenarios. **
The specific structure is shown in the figure below:

demo
We provide a video demo to show you how to use Fluid to improve the speed of AI model training on the cloud. In this Demo, using the same ResNet50 test code, Fluid acceleration has obvious advantages compared with the direct access of native ossfs, regardless of the training speed per second and the total training time, and the training time is shortened by 69% %.

Video demo

Experience Fluid quickly

Fluid needs to run on Kubernetes v1.14 and above, and needs to support CSI storage. The deployment and management of Fluid Operator is realized through the package management tool Helm v3 on the Kubernetes platform. Before running Fluid, please make sure that Helm has been correctly installed in the Kubernetes cluster. You can refer to the documentation to install and use Fluid.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us