Data Lake practice explanation [AI training acceleration] Lecture 17: Fluid + JindoFS train and accelerate data on HDFS-Alibaba Cloud Developer Community

Introduction: [AI training acceleration] Lecture 17

subject: F Fluid + JindoFS accelerate data training on HDFS luid + JindoFS pair accelerate Data training on OSS

lecturer: chen Shan, EMR technical expert, Alibaba Computing Platform Division

content Framework:

  • what is Fluid + JindoFS (JindoRuntime)
  • why do I use JindoRuntime to accelerate HDFS?
  • How to use JindoRuntime
  • demo

live playback link:(17 Lectures) 4-

1. What is Fluid + JindoFS (JindoRuntime)

introduction to Fluid

CNCF Fluid is an open-source Kubernetes-native distributed dataset orchestration and acceleration engine that mainly serves data-intensive applications in cloud-native scenarios, such as big data applications and AI applications.

Reference URL:

Fluid functional concepts

 Fluid is not full storage acceleration and management, but the data set acceleration and management used by applications.

Fluid JindoRuntime

background: in the cloud native environment, JindoFS Cache Acceleration engine is used to orchestrate cached datasets and applications.

Why do I use JindoRuntime to accelerate HDFS?

HDFS storage and AI training

problems faced by HDFS in AI training scenarios

Fluid JindoRuntime to accelerate access to HDFS

JindoRuntime function support

  • the Master supports Raft high availability.
  • Supports data affinity scheduling (nodeAffinity) and selects an appropriate cache node.
  • Supports data preloading DataLoad CRD.
  • You can specify a Fuse user to access HDFS.

Reference URL:

3. How to use JindoRuntime

JindoRuntime to accelerate HDFS

  • download and Install Fluid : https :// / aliyun / alibabacloud-jindodata /blob/master/docs/ jindo_fluid /
  • create Dataset
  • create JindoRuntime
  • cache preloading DataLoad
  • implementation AI training job

4. Demonstration

Fluid JindoRuntime Use

environment requirements:

  • Kubernetes version > 1.14 support CSI
  • Golang 1.12+
  • Helm 3
  • Fluid 0.6.0



demo: accelerate data access on HDFS


links to related documents:

  • Fluid   JindoRuntime reference D o_fluid/jindo_fl

  • embrace cloud native, Fluid combination JindoFS : Acceleration HDFS user Guide d_jindofs_hdfs_int

  • InsightFace dataset acceleration test

Click the playback link to directly watch the video playback of lecture 17 and obtain the lecturer's example explanation: 4-

Github link:

not bad. For each live broadcast, discuss more data Lake JindoFS and OSS related technical issues. Welcome to scan the code to join the DingTalk communication group!

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now