All Products
Search
Document Center

Container Service for Kubernetes:Overview of Fluid

Last Updated:Sep 04, 2023

Fluid is an open source Kubernetes-native distributed dataset orchestrator and accelerator for data-intensive applications in cloud-native scenarios, such as big data applications and AI applications. This topic introduces Fluid and its core features.

Features

Fluid provides features by defining the Dataset and Runtime objects, as shown in the following figure.

fluid-arch
  • Fluid provides native support for dataset abstraction. This feature provides fundamental support for data-intensive applications, enables efficient data access, and improves the cost-effectiveness of data management in multiple aspects.

  • Fluid provides an extensible data engine plug-in with a unified interface for integration with third-party storage services. A variety of runtimes are supported.

  • Fluid automates data operations and supports multiple modes to integrate with automated O&M systems.

  • Fluid accelerates data access by combining the data caching technology with elastic scaling and data affinity-scheduling.

  • Fluid is independent of runtime platforms and supports Kubernetes clusters, Container Service for Kubernetes (ACK) edge clusters, and ACK Serverless clusters. Fluid is also suitable for multi-cluster scenarios and hybrid cloud scenarios.

Concepts

  • Dataset: a set of logically related data that is used by computing engines. For example, Apache Spark uses datasets in big data scenarios and TensorFlow uses datasets in AI scenarios. Datasets enable intelligent applications and help produce the core values in various industries. Dataset management involves multiple aspects, including security, versions, and data acceleration.

  • Runtime: the execution engine that implements security, version management, and data acceleration for datasets. Runtime also defines a series of lifecycle interfaces. These interfaces are used to manage and accelerate datasets.

  • AlluxioRuntime: the execution engine of open source Alluxio. AlluxioRuntime provides dataset management and caching. AlluxioRuntime supports persistent volume claims (PVCs), Ceph, and Cloud Parallel File System (CPFS) acceleration. You can use AlluxioRuntime in hybrid cloud scenarios.

  • JuiceFSRuntime: a distributed cache acceleration engine developed based on JuiceFS. JuiceFSRuntime supports scenario-specific data caching and acceleration. For more information about JuiceFS, see Introduction to JuiceFS.

  • JindoRuntime: the execution engine of JindoFS developed by the Alibaba Cloud Elastic MapReduce (EMR) team. JindoRuntime is based on C++ and provides dataset management and caching. JindoRuntime also supports Object Storage Service (OSS), OSS-HDFS, and data caching and acceleration based on Hadoop Distributed File System (HDFS).

  • EFCRuntime: the runtime for the EFC elastic acceleration client developed by the Apsara File Storage NAS (NAS) technical team. EFCRuntime can accelerate access to NAS and CPFS, and supports hot updates and fault tolerance.

  • ThinRuntime: an extensible general-purpose storage system that allows users to access various storage systems in a low-code way. ThinRuntime reuses the data orchestration management capabilities and core capabilities provided by Fluid to integrate with runtime platforms.

Feature

Alluxio

JuiceFS

Jindo

EFC

Bottom-layer storage

PVC, Ceph, HDFS, CPFS, Network File System (NFS), and OSS

JuiceFS

OSS, OSS-HDFS, and PVC

NAS and CPFS

Supported by

Open source projects

Open source projects

Alibaba Cloud services

Alibaba Cloud services