All Products
Search
Document Center

Container Service for Kubernetes:Accelerate data access for Job applications

Last Updated:Mar 26, 2026

When a Kubernetes Job reads training data or batch input directly from Object Storage Service (OSS), each file fetch crosses the network on every run. Fluid (deployed as the ack-fluid Helm chart) places a JindoFS-backed caching layer between your ACK Serverless pods and OSS. The first time a pod reads a file, JindoFS fetches it from OSS and writes it to local cache. Every subsequent read comes from cache — cutting access time from tens of seconds to under one second for repeated workloads.

This topic walks you through deploying Fluid, configuring a Dataset and JindoRuntime backed by an OSS bucket, and running a Kubernetes Job that reads from cache.

Prerequisites

Before you begin, ensure that you have:

Limitations

Fluid's data access acceleration conflicts with the virtual node scheduling feature of ACK Serverless clusters. You cannot use both at the same time. See Enable the virtual node scheduling policy for a cluster.

To avoid this conflict, all JindoRuntime cache worker pods and application pods must include the alibabacloud.com/burst-resource: eci_only annotation, which disables virtual node scheduling on those pods. This annotation appears in the YAML examples later in this topic.

Deploy the Fluid control plane

Important

If you have already installed open-source Fluid, uninstall it before deploying the ack-fluid component.

  1. In the ACK console, click Clusters in the left navigation pane.

  2. Click the name of your cluster. In the left navigation pane, click Applications > Helm.

  3. On the Helm page, click Deploy.

  4. In the Basic Information step, set the following parameters, then click Next.

    The default release name is ack-fluid and the default namespace is fluid-system. If you specify different values, a Confirm dialog appears. Click Yes to revert to the defaults.
    Parameter Value
    Source Marketplace
    Chart Search for and select ack-fluid
  5. In the Parameters step, click OK.

  6. Verify that the Fluid control plane is running:

    • Dataset Controller — manages the full lifecycle of Dataset Custom Resources (CRs) introduced by Fluid.

    • Fluid Webhook — injects a sidecar container into application pods that need data access, enabling transparent caching in serverless scenarios.

    The Fluid control plane also includes controller components for JindoFS, JuiceFS, and Alluxio. These controllers are not created during initial deployment — the pod for each caching system scales on demand only when you configure that system.
    kubectl get pod -n fluid-system

    Expected output:

    NAME                                  READY   STATUS    RESTARTS   AGE
    dataset-controller-d99998f79-dgkmh    1/1     Running   0          2m48s
    fluid-webhook-55c6d9d497-dmrzb        1/1     Running   0          2m49s

    The two components serve distinct roles:

Accelerate data access

Step 1: Upload test data to OSS

  1. Create a 2 GB test file. This topic uses test as an example.

  2. Upload the file to your OSS bucket using ossutil. See Install ossutil.

Step 2: Create the Dataset and JindoRuntime resources

Fluid represents your data source as two Custom Resources (CRs):

  • Dataset — declares the URL of data in the external storage system.

  • JindoRuntime — declares the caching system and its configuration.

Fluid uses lazy loading: on first access, it fetches data from OSS and writes it to local cache. Jobs that access the same data repeatedly benefit most from this approach — the first run warms the cache; subsequent runs read entirely from cache. To eliminate first-run latency, pre-warm the cache before submitting your Job.

  1. Create a Secret to store the OSS credentials:

    kubectl create secret generic oss-access-key \
      --from-literal=fs.oss.accessKeyId=<access_key_id> \
      --from-literal=fs.oss.accessKeySecret=<access_key_secret>
  2. Create a file named dataset.yaml with the following content:

    Parameter Description
    mountPoint OSS path to mount, in the format oss://<bucket_name>/<bucket_path>. Set path to / for a single mount point.
    fs.oss.endpoint OSS bucket endpoint. Use an internal endpoint (e.g., oss-cn-hangzhou-internal.aliyuncs.com) for better performance when the cluster and bucket are in the same region. Use a public endpoint (e.g., oss-cn-hangzhou.aliyuncs.com) otherwise.
    replicas Number of JindoRuntime cache worker pods. Controls the total cache capacity available to the distributed caching system.
    alibabacloud.com/burst-resource: eci_only Disables virtual node scheduling on the cache worker pods. Required because Fluid conflicts with the virtual node scheduling feature (see Limitations).
    k8s.aliyun.com/eci-use-specs Elastic Container Instance (ECI) instance spec for each cache worker pod.
    k8s.aliyun.com/eci-image-cache Enables instance image cache to speed up pod startup.
    tieredstore.levels.mediumtype Cache medium. Supported values: MEM (memory), SSD (solid-state drive), HDD (hard disk drive). See Strategy 2: Select a cache medium.
    tieredstore.levels.volumeType Volume type for the cache medium. Use emptyDir for memory or system disk (prevents residual cache from affecting node availability). Use hostPath for data disks, and set path to the disk's mount point on the host. Default: hostPath.
    tieredstore.levels.path Path for the cache medium. Supports only a single path.
    tieredstore.levels.quota Maximum cache capacity per worker, for example 10Gi.
    tieredstore.levels.high / low High and low watermarks for cache eviction.
    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: demo-dataset
    spec:
      mounts:
        - mountPoint: oss://<bucket_name>/<bucket_path>
          name: demo
          path: /
          options:
            fs.oss.endpoint: oss-<region>.aliyuncs.com
          encryptOptions:
            - name: fs.oss.accessKeyId
              valueFrom:
                secretKeyRef:
                  name: oss-access-key
                  key: fs.oss.accessKeyId
            - name: fs.oss.accessKeySecret
              valueFrom:
                secretKeyRef:
                  name: oss-access-key
                  key: fs.oss.accessKeySecret
    ---
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: demo-dataset
    spec:
      # Number of cache worker nodes
      replicas: 2
      worker:
        podMetadata:
          annotations:
            # Required: disable virtual node scheduling (conflicts with Fluid — see Limitations)
            alibabacloud.com/burst-resource: eci_only
            # ECI instance spec for the JindoFS cache worker pod
            k8s.aliyun.com/eci-use-specs: <eci_instance_spec>
            # Enable instance image cache to speed up pod startup
            k8s.aliyun.com/eci-image-cache: "true"
      tieredstore:
        levels:
          # 10 GiB of memory cache per worker node
          - mediumtype: MEM
            volumeType: emptyDir
            path: /dev/shm
            quota: 10Gi
            high: "0.99"
            low: "0.99"

    Key parameters:

  3. Apply the manifest:

    kubectl create -f dataset.yaml
  4. Wait about one to two minutes for the caching system to deploy, then verify the Dataset status:

    If you run this command immediately after applying the manifest, PHASE may show NotBound while the caching system is still initializing. Wait one to two minutes and run the command again.
    kubectl get dataset demo-dataset

    Expected output:

    NAME           UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    demo-dataset   1.16GiB          0.00B    20.00GiB         0.0%                Bound   2m58s

    PHASE: Bound confirms the Dataset deployed successfully. The other columns show how much data is in OSS, how much is already cached, and the total cache capacity across all worker nodes.

Step 3 (Optional): Pre-warm the cache

Because Fluid uses lazy loading, the first Job run fetches data from OSS — which can take tens of seconds for large datasets. If your application is latency-sensitive on first access, or if you know exactly which files will be needed, pre-warming pulls data into cache before any Job runs so that even the first run reads from cache.

  1. Create a file named dataload.yaml:

    apiVersion: data.fluid.io/v1alpha1
    kind: DataLoad
    metadata:
      name: data-warmup
    spec:
      dataset:
        name: demo-dataset
        namespace: default
      loadMetadata: true
  2. Start the pre-warm job:

    kubectl create -f dataload.yaml

    Monitor progress until the status shows Complete:

    NAME          DATASET        PHASE      AGE   DURATION
    data-warmup   demo-dataset   Complete   99s   58s

    The output shows that the data cache warm-up took about 58s.

Step 4: Create a Job application

All pods that mount the demo-dataset PersistentVolumeClaim (PVC) read from the JindoFS cache automatically — no application code changes needed. The alibabacloud.com/fluid-sidecar-target: eci label tells Fluid Webhook to inject the caching sidecar into the pod.

  1. Create a file named job.yaml:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: demo-app
    spec:
      template:
        metadata:
          labels:
            alibabacloud.com/fluid-sidecar-target: eci
          annotations:
            # Required: disable virtual node scheduling (conflicts with Fluid — see Limitations)
            alibabacloud.com/burst-resource: eci_only
            # ECI instance spec for the application pod
            k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge
        spec:
          containers:
            - name: demo
              image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
              command:
                - /bin/bash
              args:
                - -c
                - du -sh /data && time cp -r /data/ /tmp
              volumeMounts:
                - mountPath: /data
                  name: demo
          restartPolicy: Never
          volumes:
            - name: demo
              persistentVolumeClaim:
                claimName: demo-dataset
      backoffLimit: 4
  2. Submit the Job:

    kubectl create -f job.yaml
  3. Check the Job logs after it completes:

    kubectl logs demo-app-jwktf -c demo

    Expected output:

    1.2G    /data
    
    real    0m0.992s
    user    0m0.004s
    sys     0m0.674s

    The output shows that the real time for copying the file is only 0m0.992s.

Step 5: Clean up

Clean up resources to avoid incurring unnecessary charges.

  1. Delete the Job:

    kubectl delete job demo-app
  2. Delete the Dataset. This also removes the associated caching system components:

    Important

    Cleanup takes about one minute. Wait until all caching system pods are fully deleted before proceeding.

    kubectl delete dataset demo-dataset
  3. Scale down the Fluid control plane:

    kubectl get deployments.apps -n fluid-system | awk 'NR>1 {print $1}' | xargs kubectl scale deployments -n fluid-system --replicas=0

    To use the data access feature again, scale the control plane back up before creating new Dataset and JindoRuntime resources:

    kubectl scale -n fluid-system deployment dataset-controller --replicas=1
    kubectl scale -n fluid-system deployment fluid-webhook --replicas=1