In hybrid cloud environments, pods often need fast access to self-managed storage — such as NFS — mounted as a hostPath volume on worker nodes. Without caching, every read crosses the network, which creates latency and limits throughput.
JindoRuntime is a Fluid runtime engine developed by Alibaba Cloud E-MapReduce (EMR) based on JindoFS, a C++ file system. It caches data from hostPath volumes into a distributed cache layer on local memory or disk, so subsequent reads are served from the cache rather than the remote file system.
This topic describes how to use JindoRuntime to accelerate access to hostPath volumes in an ACK cluster.
Prerequisites
Before you begin, make sure you have:
An ACK Pro cluster running on non-containerOS nodes, with Kubernetes 1.18 or later
Importantack-fluid is not supported on ContainerOS.
The ack-fluid component (version later than 1.0.6) deployed in the cluster
If you haven't installed the cloud-native AI suite yet, enable Fluid acceleration when installing it. For more information, see Deploy the cloud-native AI suite.
If the cloud-native AI suite is already installed, log on to the ACK console and deploy ack-fluid from the Cloud-native AI Suite page.
ImportantIf you have open-source Fluid installed, uninstall it before installing ack-fluid.
How it works
The setup uses three Fluid objects that work together:
| Component | Role |
|---|---|
| Dataset | Defines the data source (a hostPath directory) and how it is mounted inside pods. |
| JindoRuntime master | Coordinates cache metadata. |
| JindoRuntime worker | Stores cached data on each node (memory or disk, depending on your tieredstore config). Scales horizontally to add cache capacity. |
| JindoRuntime FUSE | Presents the cached data as a POSIX file system to application pods. |
| DataLoad (optional) | Prefetches data into the cache before pods start, eliminating cold-read latency. |
Pods consume the dataset via a PersistentVolumeClaim (PVC) whose name matches the Dataset object.
Step 1: Prepare the hostPath directories
JindoRuntime's master and worker pods must run on nodes that have the hostPath directory pre-created. Create the directory on each target node, then label those nodes so Kubernetes schedules JindoRuntime components only there.
Create the hostPath directory on a node. Run this on each node where JindoRuntime will run:
mkdir /mnt/demo-remote-fsIf your nodes are accessible via SSH, create the directory remotely. Replace the node names with your actual node names:
ssh cn-beijing.192.168.1.45 "mkdir -p /mnt/demo-remote-fs" ssh cn-beijing.192.168.2.234 "mkdir -p /mnt/demo-remote-fs"Label the nodes to restrict JindoRuntime scheduling to those nodes:
kubectl label node cn-beijing.192.168.1.45 demo-remote-fs=true kubectl label node cn-beijing.192.168.2.234 demo-remote-fs=true
Step 2: Create a Dataset and JindoRuntime
Create a file named dataset.yaml with the following content:
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: hostpath-demo-dataset
spec:
mounts:
- mountPoint: local:///mnt/demo-remote-fs
name: data
path: /
accessModes:
- ReadOnlyMany
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
name: hostpath-demo-dataset
spec:
master:
nodeSelector:
demo-remote-fs: "true"
worker:
nodeSelector:
demo-remote-fs: "true"
fuse:
nodeSelector:
demo-remote-fs: "true"
replicas: 2
tieredstore:
levels:
- mediumtype: MEM
volumeType: emptyDir
path: /dev/shm
quota: 10Gi
high: "0.99"
low: "0.99"The following table describes the key parameters:
| Parameter | Description |
|---|---|
mountPoint | The data source in local://<path> format, where <path> is an absolute path on the host. |
nodeSelector | Restricts master, worker, and FUSE pods to nodes that have the hostPath directory. Apply the same selector to all three components. |
replicas | Number of worker pods to deploy. Increase this to add cache capacity. |
mediumtype | Cache storage type. Supported values: HDD, SSD, MEM. |
volumeType | How the cache medium is mounted. Use emptyDir for memory (/dev/shm) or local system disks to avoid leaving residual data on the node. Use hostPath for dedicated data disks and set path to the disk mount path. Default value: hostPath. |
path | Directory where worker pods store cached data. /dev/shm (tmpfs) gives the highest throughput for memory-based caching. |
quota | Maximum cache size per worker. |
Choose a cache medium:
| Storage available | mediumtype | volumeType | path |
|---|---|---|---|
| Memory or system disk | MEM or SSD | emptyDir | /dev/shm or a tmpfs path |
| Dedicated local data disk | SSD or HDD | hostPath | Mount path of the data disk on the host |
For detailed recommendations, see Policy 2: Select proper cache media.
Apply the configuration:
kubectl create -f dataset.yamlVerify the Dataset is bound:
kubectl get dataset hostpath-demo-datasetExpected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
hostpath-demo-dataset 1.98GiB 0.00B 20.00GiB 0.0% Bound 3m54sWhen PHASE is Bound, JindoFS is running and pods can access the dataset.
JindoFS pulls a container image on first launch. This may take 2 to 3 minutes depending on network conditions.
(Optional) Step 3: Prefetch data with DataLoad
By default, the cache is populated passively as pods read data — the first read for any file goes to the remote file system. For latency-sensitive workloads where cold-read misses are unacceptable, create a DataLoad object to prefetch the entire dataset into the cache before your application starts.
Create a file named
dataload.yaml:Parameter Description dataset.nameName of the Dataset to prefetch. dataset.namespaceNamespace of the Dataset. Must match the DataLoad's namespace. loadMetadataSet to truefor JindoRuntime to sync metadata before prefetching.target[*].pathRelative path within the Dataset's mount point to prefetch. target[*].replicasNumber of worker pods used to cache the prefetched data. apiVersion: data.fluid.io/v1alpha1 kind: DataLoad metadata: name: dataset-warmup spec: dataset: name: hostpath-demo-dataset namespace: default loadMetadata: true target: - path: / replicas: 1Create the DataLoad object:
kubectl create -f dataload.yamlMonitor prefetch progress:
kubectl get dataload dataset-warmupExpected output when complete:
NAME DATASET PHASE AGE DURATION dataset-warmup pv-demo-dataset Complete 62s 9sVerify the dataset is fully cached:
kubectl get datasetExpected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE hostpath-demo-dataset 1.98GiB 1.98GiB 20.00GiB 100.0% Bound 7m24sWhen
CACHEDequalsUFS TOTAL SIZEandCACHED PERCENTAGEis100.0%, the entire dataset is in the cache.
Step 4: Access the cached data from a pod
Mount the Dataset as a PVC in your application pod. The claimName must match the Dataset name from Step 2.
Create a file named
pod.yaml:apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6 command: - "bash" - "-c" - "sleep inf" volumeMounts: - mountPath: /data name: data-vol volumes: - name: data-vol persistentVolumeClaim: claimName: hostpath-demo-dataset # Must match the Dataset nameCreate the pod:
kubectl create -f pod.yamlLog in to the pod and read data:
kubectl exec -it nginx bashInside the pod, verify the data is accessible and measure read performance:
# List files in the mounted directory ls -lh /dataExpected output:
total 2.0G -rwxrwxr-x 1 root root 2.0G Jun 9 04:02 demo-file# Measure read throughput time cat /data/demofile > /dev/nullExpected output:
real 0m2.061s user 0m0.015s sys 0m0.581sReads are served directly from the local JindoFS cache rather than fetched from the remote file system, reducing data transmission latency.
What's next
Policy 2: Select proper cache media — choose the right
mediumtypeandvolumeTypefor your storage hardware.Create an ACK Pro cluster — set up the cluster required for JindoRuntime.
Deploy the cloud-native AI suite — install the ack-fluid component.