Use Fluid to accelerate data access for Jobs on ACK Serverless - Container Service for Kubernetes

When a Kubernetes Job reads training data or batch input directly from Object Storage Service (OSS), each file fetch crosses the network on every run. Fluid (deployed as the ack-fluid Helm chart) places a JindoFS-backed caching layer between your ACK Serverless pods and OSS. The first time a pod reads a file, JindoFS fetches it from OSS and writes it to local cache. Every subsequent read comes from cache — cutting access time from tens of seconds to under one second for repeated workloads.

This topic walks you through deploying Fluid, configuring a Dataset and JindoRuntime backed by an OSS bucket, and running a Kubernetes Job that reads from cache.

Prerequisites

Before you begin, ensure that you have:

An ACK Serverless cluster running Kubernetes 1.18 or later with CoreDNS installed. See Create a cluster.
kubectl configured to connect to the cluster. See Obtain the KubeConfig of a cluster and use kubectl to connect to the cluster.
An activated OSS instance with a bucket ready. See Activate OSS, then create a bucket in the console.

Limitations

Fluid's data access acceleration conflicts with the virtual node scheduling feature of ACK Serverless clusters. You cannot use both at the same time. See Enable the virtual node scheduling policy for a cluster.

To avoid this conflict, all JindoRuntime cache worker pods and application pods must include the alibabacloud.com/burst-resource: eci_only annotation, which disables virtual node scheduling on those pods. This annotation appears in the YAML examples later in this topic.

Deploy the Fluid control plane

Important

If you have already installed open-source Fluid, uninstall it before deploying the ack-fluid component.

In the ACK console, click Clusters in the left navigation pane.
Click the name of your cluster. In the left navigation pane, click Applications > Helm.
On the Helm page, click Deploy.

In the Basic Information step, set the following parameters, then click Next.

The default release name is ack-fluid and the default namespace is fluid-system. If you specify different values, a Confirm dialog appears. Click Yes to revert to the defaults.

Parameter	Value
Source	Marketplace
Chart	Search for and select `ack-fluid`

In the Parameters step, click OK.
Verify that the Fluid control plane is running:
- Dataset Controller — manages the full lifecycle of Dataset Custom Resources (CRs) introduced by Fluid.
- Fluid Webhook — injects a sidecar container into application pods that need data access, enabling transparent caching in serverless scenarios.
The Fluid control plane also includes controller components for JindoFS, JuiceFS, and Alluxio. These controllers are not created during initial deployment — the pod for each caching system scales on demand only when you configure that system.
```
kubectl get pod -n fluid-system
```
Expected output:
```
NAME                                  READY   STATUS    RESTARTS   AGE
dataset-controller-d99998f79-dgkmh    1/1     Running   0          2m48s
fluid-webhook-55c6d9d497-dmrzb        1/1     Running   0          2m49s
```
The two components serve distinct roles:

Accelerate data access

Step 1: Upload test data to OSS

Create a 2 GB test file. This topic uses test as an example.
Upload the file to your OSS bucket using ossutil. See Install ossutil.

Step 2: Create the Dataset and JindoRuntime resources

Fluid represents your data source as two Custom Resources (CRs):

Dataset — declares the URL of data in the external storage system.
JindoRuntime — declares the caching system and its configuration.

Fluid uses lazy loading: on first access, it fetches data from OSS and writes it to local cache. Jobs that access the same data repeatedly benefit most from this approach — the first run warms the cache; subsequent runs read entirely from cache. To eliminate first-run latency, pre-warm the cache before submitting your Job.

Create a Secret to store the OSS credentials:

kubectl create secret generic oss-access-key \
  --from-literal=fs.oss.accessKeyId=<access_key_id> \
  --from-literal=fs.oss.accessKeySecret=<access_key_secret>

Create a file named dataset.yaml with the following content:

Parameter	Description
`mountPoint`	OSS path to mount, in the format `oss://<bucket_name>/<bucket_path>`. Set `path` to `/` for a single mount point.
`fs.oss.endpoint`	OSS bucket endpoint. Use an internal endpoint (e.g., `oss-cn-hangzhou-internal.aliyuncs.com`) for better performance when the cluster and bucket are in the same region. Use a public endpoint (e.g., `oss-cn-hangzhou.aliyuncs.com`) otherwise.
`replicas`	Number of JindoRuntime cache worker pods. Controls the total cache capacity available to the distributed caching system.
`alibabacloud.com/burst-resource: eci_only`	Disables virtual node scheduling on the cache worker pods. Required because Fluid conflicts with the virtual node scheduling feature (see Limitations).
`k8s.aliyun.com/eci-use-specs`	Elastic Container Instance (ECI) instance spec for each cache worker pod.
`k8s.aliyun.com/eci-image-cache`	Enables instance image cache to speed up pod startup.
`tieredstore.levels.mediumtype`	Cache medium. Supported values: `MEM` (memory), `SSD` (solid-state drive), `HDD` (hard disk drive). See Strategy 2: Select a cache medium.
`tieredstore.levels.volumeType`	Volume type for the cache medium. Use `emptyDir` for memory or system disk (prevents residual cache from affecting node availability). Use `hostPath` for data disks, and set `path` to the disk's mount point on the host. Default: `hostPath`.
`tieredstore.levels.path`	Path for the cache medium. Supports only a single path.
`tieredstore.levels.quota`	Maximum cache capacity per worker, for example `10Gi`.
`tieredstore.levels.high` / `low`	High and low watermarks for cache eviction.

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  mounts:
    - mountPoint: oss://<bucket_name>/<bucket_path>
      name: demo
      path: /
      options:
        fs.oss.endpoint: oss-<region>.aliyuncs.com
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: oss-access-key
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: oss-access-key
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  # Number of cache worker nodes
  replicas: 2
  worker:
    podMetadata:
      annotations:
        # Required: disable virtual node scheduling (conflicts with Fluid — see Limitations)
        alibabacloud.com/burst-resource: eci_only
        # ECI instance spec for the JindoFS cache worker pod
        k8s.aliyun.com/eci-use-specs: <eci_instance_spec>
        # Enable instance image cache to speed up pod startup
        k8s.aliyun.com/eci-image-cache: "true"
  tieredstore:
    levels:
      # 10 GiB of memory cache per worker node
      - mediumtype: MEM
        volumeType: emptyDir
        path: /dev/shm
        quota: 10Gi
        high: "0.99"
        low: "0.99"

Key parameters:

Apply the manifest:
```
kubectl create -f dataset.yaml
```
Wait about one to two minutes for the caching system to deploy, then verify the Dataset status:

If you run this command immediately after applying the manifest, PHASE may show NotBound while the caching system is still initializing. Wait one to two minutes and run the command again.
```
kubectl get dataset demo-dataset
```
Expected output:
```
NAME           UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
demo-dataset   1.16GiB          0.00B    20.00GiB         0.0%                Bound   2m58s
```
PHASE: Bound confirms the Dataset deployed successfully. The other columns show how much data is in OSS, how much is already cached, and the total cache capacity across all worker nodes.

Step 3 (Optional): Pre-warm the cache

Because Fluid uses lazy loading, the first Job run fetches data from OSS — which can take tens of seconds for large datasets. If your application is latency-sensitive on first access, or if you know exactly which files will be needed, pre-warming pulls data into cache before any Job runs so that even the first run reads from cache.

Create a file named dataload.yaml:

apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: data-warmup
spec:
  dataset:
    name: demo-dataset
    namespace: default
  loadMetadata: true

Start the pre-warm job:

kubectl create -f dataload.yaml

Monitor progress until the status shows Complete:

NAME          DATASET        PHASE      AGE   DURATION
data-warmup   demo-dataset   Complete   99s   58s

The output shows that the data cache warm-up took about 58s.

Step 4: Create a Job application

All pods that mount the demo-dataset PersistentVolumeClaim (PVC) read from the JindoFS cache automatically — no application code changes needed. The alibabacloud.com/fluid-sidecar-target: eci label tells Fluid Webhook to inject the caching sidecar into the pod.

Create a file named job.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: demo-app
spec:
  template:
    metadata:
      labels:
        alibabacloud.com/fluid-sidecar-target: eci
      annotations:
        # Required: disable virtual node scheduling (conflicts with Fluid — see Limitations)
        alibabacloud.com/burst-resource: eci_only
        # ECI instance spec for the application pod
        k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge
    spec:
      containers:
        - name: demo
          image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
          command:
            - /bin/bash
          args:
            - -c
            - du -sh /data && time cp -r /data/ /tmp
          volumeMounts:
            - mountPath: /data
              name: demo
      restartPolicy: Never
      volumes:
        - name: demo
          persistentVolumeClaim:
            claimName: demo-dataset
  backoffLimit: 4

Submit the Job:
```
kubectl create -f job.yaml
```
Check the Job logs after it completes:
```
kubectl logs demo-app-jwktf -c demo
```
Expected output:
```
1.2G    /data

real    0m0.992s
user    0m0.004s
sys     0m0.674s
```
The output shows that the real time for copying the file is only 0m0.992s.

Step 5: Clean up

Clean up resources to avoid incurring unnecessary charges.

Delete the Job:
```
kubectl delete job demo-app
```
Delete the Dataset. This also removes the associated caching system components:

Important
Cleanup takes about one minute. Wait until all caching system pods are fully deleted before proceeding.
```
kubectl delete dataset demo-dataset
```

Scale down the Fluid control plane:

kubectl get deployments.apps -n fluid-system | awk 'NR>1 {print $1}' | xargs kubectl scale deployments -n fluid-system --replicas=0

To use the data access feature again, scale the control plane back up before creating new Dataset and JindoRuntime resources:

kubectl scale -n fluid-system deployment dataset-controller --replicas=1
kubectl scale -n fluid-system deployment fluid-webhook --replicas=1