Data Lake practice explanation [AI training acceleration] Lecture 17: Fluid + JindoFS train and accelerate data on HDFS-Alibaba Cloud Developer Community

Introduction: [AI training acceleration] Lecture 17

subject: F Fluid + JindoFS accelerate data training on HDFS luid + JindoFS pair accelerate Data training on OSS

lecturer: chen Shan, EMR technical expert, Alibaba Computing Platform Division

content Framework:

  • what is Fluid + JindoFS (JindoRuntime)
  • why do I use JindoRuntime to accelerate HDFS?
  • How to use JindoRuntime
  • demo

live playback link:(17 Lectures)

https://developer.aliyun.com/live/24703 4-

1. What is Fluid + JindoFS (JindoRuntime)

introduction to Fluid

CNCF Fluid is an open-source Kubernetes-native distributed dataset orchestration and acceleration engine that mainly serves data-intensive applications in cloud-native scenarios, such as big data applications and AI applications.

Reference URL: https://github.com/fluid-cloudnative/fluid

Fluid functional concepts

 Fluid is not full storage acceleration and management, but the data set acceleration and management used by applications.

Fluid JindoRuntime

background: in the cloud native environment, JindoFS Cache Acceleration engine is used to orchestrate cached datasets and applications.

Why do I use JindoRuntime to accelerate HDFS?

HDFS storage and AI training

problems faced by HDFS in AI training scenarios

Fluid JindoRuntime to accelerate access to HDFS

JindoRuntime function support

  • the Master supports Raft high availability.
  • Supports data affinity scheduling (nodeAffinity) and selects an appropriate cache node.
  • Supports data preloading DataLoad CRD.
  • You can specify a Fuse user to access HDFS.

Reference URL: https://github.com/aliyun/alibabacloud-jindofs/blob/master/docs/jindo_fluid/jindo_fluid_overview.md

3. How to use JindoRuntime

JindoRuntime to accelerate HDFS

  • download and Install Fluid : https :// github.com / aliyun / alibabacloud-jindodata /blob/master/docs/ jindo_fluid / jindo_fluid_jindofs_hdfs_introduce.md
  • create Dataset
  • create JindoRuntime
  • cache preloading DataLoad
  • implementation AI training job

4. Demonstration

Fluid JindoRuntime Use

environment requirements:

  • Kubernetes version > 1.14 support CSI
  • Golang 1.12+
  • Helm 3
  • Fluid 0.6.0

reference: https://github.com/aliyun/alibabacloud-jindofs/blob/master/docs/jindo_fluid/jindo_fluid_overview.md

ISSUE: https://github.com/aliyun/alibabacloud-jindofs/issues

demo: accelerate data access on HDFS

reference: https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/jindo_fluid/jindo_fluid_jindofs_hdfs_introduce.md

links to related documents:

  • Fluid   JindoRuntime reference

https://github.com/aliyun/alibabacloud-jindofs/blob/master/docs/jin D o_fluid/jindo_fl uid_overview.md

  • embrace cloud native, Fluid combination JindoFS : Acceleration HDFS user Guide

https://github.com/aliyun/alibabacloud-jindodata/blob/master/docs/jindo_fluid/jindo_flui d_jindofs_hdfs_int roduce.md

https://github.com/aliyun/alibabacloud-jindofs/blob/master/docs/jindo_fluid/jindo_fluid_resnet50_example.md

  • InsightFace dataset acceleration test

https://github.com/aliyun/alibabacloud-jindofs/blob/master/docs/jindo_fluid/jindo_fluid_cache_performance_report.md

Click the playback link to directly watch the video playback of lecture 17 and obtain the lecturer's example explanation:

   https://developer.aliyun.com/live/24703 4-

Github link:

https://github.com/aliyun/alibabacloud-jindofs

not bad. For each live broadcast, discuss more data Lake JindoFS and OSS related technical issues. Welcome to scan the code to join the DingTalk communication group!

Selected, One-Stop Store for Enterprise Applications
Support various scenarios to meet companies' needs at different stages of development

Start Building Today with a Free Trial to 50+ Products

Learn and experience the power of Alibaba Cloud.

Sign Up Now