All Products
Search
Document Center

Container Service for Kubernetes:Accelerate access to hostPath volumes

Last Updated:Mar 26, 2026

In hybrid cloud environments, pods often need fast access to self-managed storage — such as NFS — mounted as a hostPath volume on worker nodes. Without caching, every read crosses the network, which creates latency and limits throughput.

JindoRuntime is a Fluid runtime engine developed by Alibaba Cloud E-MapReduce (EMR) based on JindoFS, a C++ file system. It caches data from hostPath volumes into a distributed cache layer on local memory or disk, so subsequent reads are served from the cache rather than the remote file system.

This topic describes how to use JindoRuntime to accelerate access to hostPath volumes in an ACK cluster.

Prerequisites

Before you begin, make sure you have:

  • An ACK Pro cluster running on non-containerOS nodes, with Kubernetes 1.18 or later

    Important

    ack-fluid is not supported on ContainerOS.

  • The ack-fluid component (version later than 1.0.6) deployed in the cluster

    • If you haven't installed the cloud-native AI suite yet, enable Fluid acceleration when installing it. For more information, see Deploy the cloud-native AI suite.

    • If the cloud-native AI suite is already installed, log on to the ACK console and deploy ack-fluid from the Cloud-native AI Suite page.

    Important

    If you have open-source Fluid installed, uninstall it before installing ack-fluid.

How it works

The setup uses three Fluid objects that work together:

ComponentRole
DatasetDefines the data source (a hostPath directory) and how it is mounted inside pods.
JindoRuntime masterCoordinates cache metadata.
JindoRuntime workerStores cached data on each node (memory or disk, depending on your tieredstore config). Scales horizontally to add cache capacity.
JindoRuntime FUSEPresents the cached data as a POSIX file system to application pods.
DataLoad (optional)Prefetches data into the cache before pods start, eliminating cold-read latency.

Pods consume the dataset via a PersistentVolumeClaim (PVC) whose name matches the Dataset object.

Step 1: Prepare the hostPath directories

JindoRuntime's master and worker pods must run on nodes that have the hostPath directory pre-created. Create the directory on each target node, then label those nodes so Kubernetes schedules JindoRuntime components only there.

  1. Create the hostPath directory on a node. Run this on each node where JindoRuntime will run:

    mkdir /mnt/demo-remote-fs
  2. If your nodes are accessible via SSH, create the directory remotely. Replace the node names with your actual node names:

    ssh cn-beijing.192.168.1.45 "mkdir -p /mnt/demo-remote-fs"
    ssh cn-beijing.192.168.2.234 "mkdir -p /mnt/demo-remote-fs"
  3. Label the nodes to restrict JindoRuntime scheduling to those nodes:

    kubectl label node cn-beijing.192.168.1.45 demo-remote-fs=true
    kubectl label node cn-beijing.192.168.2.234 demo-remote-fs=true

Step 2: Create a Dataset and JindoRuntime

Create a file named dataset.yaml with the following content:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: hostpath-demo-dataset
spec:
  mounts:
    - mountPoint: local:///mnt/demo-remote-fs
      name: data
      path: /
  accessModes:
    - ReadOnlyMany
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: hostpath-demo-dataset
spec:
  master:
    nodeSelector:
      demo-remote-fs: "true"
  worker:
    nodeSelector:
      demo-remote-fs: "true"
  fuse:
    nodeSelector:
      demo-remote-fs: "true"
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: MEM
        volumeType: emptyDir
        path: /dev/shm
        quota: 10Gi
        high: "0.99"
        low: "0.99"

The following table describes the key parameters:

ParameterDescription
mountPointThe data source in local://<path> format, where <path> is an absolute path on the host.
nodeSelectorRestricts master, worker, and FUSE pods to nodes that have the hostPath directory. Apply the same selector to all three components.
replicasNumber of worker pods to deploy. Increase this to add cache capacity.
mediumtypeCache storage type. Supported values: HDD, SSD, MEM.
volumeTypeHow the cache medium is mounted. Use emptyDir for memory (/dev/shm) or local system disks to avoid leaving residual data on the node. Use hostPath for dedicated data disks and set path to the disk mount path. Default value: hostPath.
pathDirectory where worker pods store cached data. /dev/shm (tmpfs) gives the highest throughput for memory-based caching.
quotaMaximum cache size per worker.

Choose a cache medium:

Storage availablemediumtypevolumeTypepath
Memory or system diskMEM or SSDemptyDir/dev/shm or a tmpfs path
Dedicated local data diskSSD or HDDhostPathMount path of the data disk on the host

For detailed recommendations, see Policy 2: Select proper cache media.

Apply the configuration:

kubectl create -f dataset.yaml

Verify the Dataset is bound:

kubectl get dataset hostpath-demo-dataset

Expected output:

NAME                    UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
hostpath-demo-dataset   1.98GiB          0.00B    20.00GiB         0.0%                Bound   3m54s

When PHASE is Bound, JindoFS is running and pods can access the dataset.

Note

JindoFS pulls a container image on first launch. This may take 2 to 3 minutes depending on network conditions.

(Optional) Step 3: Prefetch data with DataLoad

By default, the cache is populated passively as pods read data — the first read for any file goes to the remote file system. For latency-sensitive workloads where cold-read misses are unacceptable, create a DataLoad object to prefetch the entire dataset into the cache before your application starts.

  1. Create a file named dataload.yaml:

    ParameterDescription
    dataset.nameName of the Dataset to prefetch.
    dataset.namespaceNamespace of the Dataset. Must match the DataLoad's namespace.
    loadMetadataSet to true for JindoRuntime to sync metadata before prefetching.
    target[*].pathRelative path within the Dataset's mount point to prefetch.
    target[*].replicasNumber of worker pods used to cache the prefetched data.
    apiVersion: data.fluid.io/v1alpha1
    kind: DataLoad
    metadata:
      name: dataset-warmup
    spec:
      dataset:
        name: hostpath-demo-dataset
        namespace: default
      loadMetadata: true
      target:
        - path: /
          replicas: 1
  2. Create the DataLoad object:

    kubectl create -f dataload.yaml
  3. Monitor prefetch progress:

    kubectl get dataload dataset-warmup

    Expected output when complete:

    NAME             DATASET           PHASE      AGE   DURATION
    dataset-warmup   pv-demo-dataset   Complete   62s   9s
  4. Verify the dataset is fully cached:

    kubectl get dataset

    Expected output:

    NAME                    UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    hostpath-demo-dataset   1.98GiB          1.98GiB   20.00GiB         100.0%              Bound   7m24s

    When CACHED equals UFS TOTAL SIZE and CACHED PERCENTAGE is 100.0%, the entire dataset is in the cache.

Step 4: Access the cached data from a pod

Mount the Dataset as a PVC in your application pod. The claimName must match the Dataset name from Step 2.

  1. Create a file named pod.yaml:

    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
    spec:
      containers:
        - name: nginx
          image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
          command:
          - "bash"
          - "-c"
          - "sleep inf"
          volumeMounts:
            - mountPath: /data
              name: data-vol
      volumes:
        - name: data-vol
          persistentVolumeClaim:
            claimName: hostpath-demo-dataset  # Must match the Dataset name
  2. Create the pod:

    kubectl create -f pod.yaml
  3. Log in to the pod and read data:

    kubectl exec -it nginx bash

    Inside the pod, verify the data is accessible and measure read performance:

    # List files in the mounted directory
    ls -lh /data

    Expected output:

    total 2.0G
    -rwxrwxr-x 1 root root 2.0G Jun  9 04:02 demo-file
    # Measure read throughput
    time cat /data/demofile > /dev/null

    Expected output:

    real    0m2.061s
    user    0m0.015s
    sys     0m0.581s

    Reads are served directly from the local JindoFS cache rather than fetched from the remote file system, reducing data transmission latency.

What's next