Cut OSS Data Latency for Argo Workflows Using JindoRuntime on ACK - Container Service for Kubernetes

Fluid uses JindoRuntime to cache Object Storage Service (OSS) data locally within your ACK Pro cluster, so Argo workflow containers read from in-cluster memory instead of fetching from OSS on every access. In cache mode, the first workflow run prefetches data into a JindoFS memory cache. Subsequent runs read directly from that cache—cutting file-copy time from 24.966 seconds to 1.948 seconds (13x faster).

This guide walks you through setting up cache-mode acceleration for Argo workflows running as elastic container instances (ECI) or Alibaba Cloud Container Compute Service (ACS) pods.

Considerations

Before you start, be aware of the following constraints:

This feature is mutually exclusive with the elastic scheduling feature of ACK. See Configure priority-based resource scheduling.
ack-fluid does not support ContainerOS nodes. Use non-containerOS node pools.
If you have a previous open-source Fluid installation, uninstall it before deploying ack-fluid. The open-source and ACK versions cannot coexist.
ack-ai-pipeline is incompatible with Argo Workflows. Deselect ack-ai-pipeline when deploying the Cloud-native AI Suite.
ACS pods require ack-fluid v1.0.11 or later, and the ACS pod advanced features must be enabled via a support ticket before deployment.

Prerequisites

Before you begin, ensure that you have:

Argo Workflows installed via the Argo quick-start guide or the ack-workflow component. See Argo Workflows.
Virtual nodes deployed in your ACK Pro cluster. See Schedule pods to elastic container instances through virtual nodes.
An ACK Pro cluster running on non-containerOS nodes with Kubernetes 1.18 or later. See Create an ACK Pro cluster.
The ack-fluid component deployed (see Deploy ack-fluid below).
kubectl connected to your ACK Pro cluster. See Connect to a cluster by using kubectl.
An OSS bucket with data to accelerate. See Activate OSS and Create buckets.

Deploy ack-fluid

If you have not installed the Cloud-native AI Suite, install it and enable Fluid under Data Access Acceleration. See Deploy Cloud-native AI Suite.

If the Cloud-native AI Suite is already installed, go to Cloud-native AI Component Set in the ACK console and deploy the ack-fluid component.

Step 1: Upload the test dataset to OSS

Create a 2 GB test dataset and upload it to your OSS bucket. This guide uses the BERT wwm_uncased_L-24_H-1024_A-16 dataset as an example.

Upload the dataset using ossutil. See Install ossutil.

Step 2: Create a Dataset and JindoRuntime

The Dataset tells Fluid where your OSS data lives. The JindoRuntime manages a JindoFS cache cluster that stores that data in local memory. Together they make the data available to workflow pods as a standard Kubernetes PersistentVolumeClaim (PVC).

Create secret.yaml with your OSS credentials:

apiVersion: v1
kind: Secret
metadata:
  name: access-key
stringData:
  fs.oss.accessKeyId: <your-access-key-id>
  fs.oss.accessKeySecret: <your-access-key-secret>

Deploy the Secret:
```
kubectl create -f secret.yaml
```

Create dataset.yaml. Replace <oss_bucket>, <bucket_dir>, and <oss_endpoint> with your values:

Important

The default access mode is read-only. To use read/write mode, see Configure the access mode of a dataset.

Parameter	Description
`mountPoint`	OSS path to mount, in the format `oss://<oss_bucket>/<bucket_dir>`. Do not include the endpoint. Example: `oss://mybucket/path/to/dir`. Set `path` to `/` when using a single mount target.
`fs.oss.endpoint`	Public or private endpoint of your OSS bucket. Use the private endpoint for better security—make sure your ACK cluster is in the same region as your OSS bucket. Example private endpoint: `oss-cn-hangzhou-internal.aliyuncs.com`.
`fs.oss.accessKeyId`	AccessKey ID for accessing the bucket.
`fs.oss.accessKeySecret`	AccessKey secret for accessing the bucket.
`replicas`	Number of JindoFS worker nodes to create.
`mediumtype`	Cache storage medium: `HDD`, `SSD`, or `MEM`. See Policy 2: Select proper cache media.
`volumeType`	Volume type for the cache medium: `emptyDir` (recommended for memory or local system disks) or `hostPath` (for dedicated data disks). Default: `hostPath`. See Policy 2: Select proper cache media.
`path`	Cache storage path on the node. Supports a single path only.
`quota`	Maximum cache size. Example: `5Gi` limits the cache to 5 GiB.
`high`	Cache eviction upper threshold (fraction of `quota`).
`low`	Cache eviction lower threshold (fraction of `quota`).

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: serverless-data
spec:
  mounts:
  - mountPoint: oss://<oss_bucket>/<bucket_dir>
    name: demo
    path: /
    options:
      fs.oss.endpoint: <oss_endpoint>
    encryptOptions:
      - name: fs.oss.accessKeyId
        valueFrom:
          secretKeyRef:
            name: access-key
            key: fs.oss.accessKeyId
      - name: fs.oss.accessKeySecret
        valueFrom:
          secretKeyRef:
            name: access-key
            key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: serverless-data
spec:
  replicas: 1
  tieredstore:
    levels:
      - mediumtype: MEM
        volumeType: emptyDir
        path: /dev/shm
        quota: 5Gi
        high: "0.95"
        low: "0.7"

Key parameters:

Deploy the Dataset and JindoRuntime:
```
kubectl create -f dataset.yaml
```

Verify the Dataset is bound:

kubectl get dataset serverless-data

Expected output:

NAME              UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
serverless-data   1.16GiB          0.00B    5.00GiB          0.0%                Bound   2m8s

PHASE: Bound confirms the Dataset is ready.

Verify the JindoRuntime is ready:

kubectl get jindo serverless-data

Expected output:

NAME              MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
serverless-data   Ready          Ready          Ready        2m51s

FUSE PHASE: Ready confirms the JindoRuntime is running.

(Optional) Step 3: Prefetch data

Prefetching loads OSS data into the JindoFS cache before your workflow runs, so the first workflow execution reads from cache rather than OSS. Skip this step if you don't need to optimize first-run latency.

Create dataload.yaml:

apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: serverless-data-warmup
spec:
  dataset:
    name: serverless-data
    namespace: default
  loadMetadata: true

Start the data prefetch:
```
kubectl create -f dataload.yaml
```

Monitor prefetch progress:

kubectl get dataload

Wait until PHASE shows Complete:

NAME                     DATASET           PHASE      AGE     DURATION
serverless-data-warmup   serverless-data   Complete   2m49s   45s

Confirm the cache is fully populated before proceeding:

kubectl get dataset

Expected output after prefetch completes:

NAME              UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
serverless-data   1.16GiB          1.16GiB   5.00GiB          100.0%              Bound   5m20s

Proceed to the next step only after CACHED PERCENTAGE shows 100.0%.

Step 4: Deploy a workflow to access OSS data

Create workflow.yaml based on your compute target—ECI or ACS. Both examples mount serverless-data as a volume and use Fluid to serve cached OSS data to the container.

Deploy on ECI (elastic container instances)

Add the alibabacloud.com/fluid-sidecar-target: eci label to the pod. Fluid automatically adapts the pod spec to run as an elastic container instance—no manual changes needed.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  selector:
    matchLabels:
      app: model-serving
  template:
    metadata:
      labels:
        app: model-serving
        alibabacloud.com/fluid-sidecar-target: eci
        alibabacloud.com/eci: "true"
    spec:
      containers:
        - image: fluidcloudnative/serving
          name: serving
          ports:
            - name: http1
              containerPort: 8080
          env:
            - name: TARGET
              value: "World"
          volumeMounts:
            - mountPath: /data
              name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: serverless-data

Deploy on ACS (Alibaba Cloud Container Compute Service)

Important

ACS pods require ack-fluid v1.0.11 or later. Accessing cached Fluid data in ACS containers relies on advanced ACS pod features—submit a support ticket to enable this feature before deploying.

Add the alibabacloud.com/fluid-sidecar-target: acs label to declare ACS compute resources. Fluid adapts the pod for the ACS environment automatically.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  selector:
    matchLabels:
      app: model-serving
  template:
    metadata:
      labels:
        app: model-serving
        alibabacloud.com/fluid-sidecar-target: acs
        alibabacloud.com/acs: "true"
        alibabacloud.com/compute-qos: default
        alibabacloud.com/compute-class: general-purpose
    spec:
      containers:
        - image: fluidcloudnative/serving
          name: serving
          ports:
            - name: http1
              containerPort: 8080
          env:
            - name: TARGET
              value: "World"
          volumeMounts:
            - mountPath: /data
              name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: serverless-data

Run the workflow

Deploy the workflow:
```
kubectl create -f workflow.yaml
```
Check the container logs to confirm cache acceleration is working:
```
kubectl logs serverless-workflow-g5knn-3271897614
```
Expected output:
```
real    0m1.948s
user    0m0.000s
sys     0m0.668s
```
The real time of 1.948 seconds shows the file copy completed from cache. Without caching, the same operation takes 24.966 seconds—13x slower. See Accelerate Argo workflows (no cache mode) for comparison.

Step 5: Clean up

After testing, delete the workflow and dataset to free resources.

Delete the workflow:

kubectl delete workflow serverless-workflow-g5knn

Delete the dataset:
```
kubectl delete dataset serverless-data
```

What's next

Accelerate Argo workflows (no cache mode) — compare performance without caching
Configure the access mode of a dataset — enable read/write mode for the Dataset
Policy 2: Select proper cache media — choose between HDD, SSD, and MEM cache tiers