All Products
Search
Document Center

Container Service for Kubernetes:Accelerate Argo workflows

Last Updated:Mar 26, 2026

Fluid uses JindoRuntime to cache Object Storage Service (OSS) data locally within your ACK Pro cluster, so Argo workflow containers read from in-cluster memory instead of fetching from OSS on every access. In cache mode, the first workflow run prefetches data into a JindoFS memory cache. Subsequent runs read directly from that cache—cutting file-copy time from 24.966 seconds to 1.948 seconds (13x faster).

This guide walks you through setting up cache-mode acceleration for Argo workflows running as elastic container instances (ECI) or Alibaba Cloud Container Compute Service (ACS) pods.

Considerations

Before you start, be aware of the following constraints:

  • This feature is mutually exclusive with the elastic scheduling feature of ACK. See Configure priority-based resource scheduling.

  • ack-fluid does not support ContainerOS nodes. Use non-containerOS node pools.

  • If you have a previous open-source Fluid installation, uninstall it before deploying ack-fluid. The open-source and ACK versions cannot coexist.

  • ack-ai-pipeline is incompatible with Argo Workflows. Deselect ack-ai-pipeline when deploying the Cloud-native AI Suite.

  • ACS pods require ack-fluid v1.0.11 or later, and the ACS pod advanced features must be enabled via a support ticket before deployment.

Prerequisites

Before you begin, ensure that you have:

Deploy ack-fluid

If you have not installed the Cloud-native AI Suite, install it and enable Fluid under Data Access Acceleration. See Deploy Cloud-native AI Suite.

If the Cloud-native AI Suite is already installed, go to Cloud-native AI Component Set in the ACK console and deploy the ack-fluid component.

Step 1: Upload the test dataset to OSS

Create a 2 GB test dataset and upload it to your OSS bucket. This guide uses the BERT wwm_uncased_L-24_H-1024_A-16 dataset as an example.

Upload the dataset using ossutil. See Install ossutil.

Step 2: Create a Dataset and JindoRuntime

The Dataset tells Fluid where your OSS data lives. The JindoRuntime manages a JindoFS cache cluster that stores that data in local memory. Together they make the data available to workflow pods as a standard Kubernetes PersistentVolumeClaim (PVC).

  1. Create secret.yaml with your OSS credentials:

    apiVersion: v1
    kind: Secret
    metadata:
      name: access-key
    stringData:
      fs.oss.accessKeyId: <your-access-key-id>
      fs.oss.accessKeySecret: <your-access-key-secret>
  2. Deploy the Secret:

    kubectl create -f secret.yaml
  3. Create dataset.yaml. Replace <oss_bucket>, <bucket_dir>, and <oss_endpoint> with your values:

    Important

    The default access mode is read-only. To use read/write mode, see Configure the access mode of a dataset.

    Parameter Description
    mountPoint OSS path to mount, in the format oss://<oss_bucket>/<bucket_dir>. Do not include the endpoint. Example: oss://mybucket/path/to/dir. Set path to / when using a single mount target.
    fs.oss.endpoint Public or private endpoint of your OSS bucket. Use the private endpoint for better security—make sure your ACK cluster is in the same region as your OSS bucket. Example private endpoint: oss-cn-hangzhou-internal.aliyuncs.com.
    fs.oss.accessKeyId AccessKey ID for accessing the bucket.
    fs.oss.accessKeySecret AccessKey secret for accessing the bucket.
    replicas Number of JindoFS worker nodes to create.
    mediumtype Cache storage medium: HDD, SSD, or MEM. See Policy 2: Select proper cache media.
    volumeType Volume type for the cache medium: emptyDir (recommended for memory or local system disks) or hostPath (for dedicated data disks). Default: hostPath. See Policy 2: Select proper cache media.
    path Cache storage path on the node. Supports a single path only.
    quota Maximum cache size. Example: 5Gi limits the cache to 5 GiB.
    high Cache eviction upper threshold (fraction of quota).
    low Cache eviction lower threshold (fraction of quota).
    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: serverless-data
    spec:
      mounts:
      - mountPoint: oss://<oss_bucket>/<bucket_dir>
        name: demo
        path: /
        options:
          fs.oss.endpoint: <oss_endpoint>
        encryptOptions:
          - name: fs.oss.accessKeyId
            valueFrom:
              secretKeyRef:
                name: access-key
                key: fs.oss.accessKeyId
          - name: fs.oss.accessKeySecret
            valueFrom:
              secretKeyRef:
                name: access-key
                key: fs.oss.accessKeySecret
    ---
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: serverless-data
    spec:
      replicas: 1
      tieredstore:
        levels:
          - mediumtype: MEM
            volumeType: emptyDir
            path: /dev/shm
            quota: 5Gi
            high: "0.95"
            low: "0.7"

    Key parameters:

  4. Deploy the Dataset and JindoRuntime:

    kubectl create -f dataset.yaml
  5. Verify the Dataset is bound:

    kubectl get dataset serverless-data

    Expected output:

    NAME              UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    serverless-data   1.16GiB          0.00B    5.00GiB          0.0%                Bound   2m8s

    PHASE: Bound confirms the Dataset is ready.

  6. Verify the JindoRuntime is ready:

    kubectl get jindo serverless-data

    Expected output:

    NAME              MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
    serverless-data   Ready          Ready          Ready        2m51s

    FUSE PHASE: Ready confirms the JindoRuntime is running.

(Optional) Step 3: Prefetch data

Prefetching loads OSS data into the JindoFS cache before your workflow runs, so the first workflow execution reads from cache rather than OSS. Skip this step if you don't need to optimize first-run latency.

  1. Create dataload.yaml:

    apiVersion: data.fluid.io/v1alpha1
    kind: DataLoad
    metadata:
      name: serverless-data-warmup
    spec:
      dataset:
        name: serverless-data
        namespace: default
      loadMetadata: true
  2. Start the data prefetch:

    kubectl create -f dataload.yaml
  3. Monitor prefetch progress:

    kubectl get dataload

    Wait until PHASE shows Complete:

    NAME                     DATASET           PHASE      AGE     DURATION
    serverless-data-warmup   serverless-data   Complete   2m49s   45s
  4. Confirm the cache is fully populated before proceeding:

    kubectl get dataset

    Expected output after prefetch completes:

    NAME              UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    serverless-data   1.16GiB          1.16GiB   5.00GiB          100.0%              Bound   5m20s

    Proceed to the next step only after CACHED PERCENTAGE shows 100.0%.

Step 4: Deploy a workflow to access OSS data

Create workflow.yaml based on your compute target—ECI or ACS. Both examples mount serverless-data as a volume and use Fluid to serve cached OSS data to the container.

Deploy on ECI (elastic container instances)

Add the alibabacloud.com/fluid-sidecar-target: eci label to the pod. Fluid automatically adapts the pod spec to run as an elastic container instance—no manual changes needed.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  selector:
    matchLabels:
      app: model-serving
  template:
    metadata:
      labels:
        app: model-serving
        alibabacloud.com/fluid-sidecar-target: eci
        alibabacloud.com/eci: "true"
    spec:
      containers:
        - image: fluidcloudnative/serving
          name: serving
          ports:
            - name: http1
              containerPort: 8080
          env:
            - name: TARGET
              value: "World"
          volumeMounts:
            - mountPath: /data
              name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: serverless-data

Deploy on ACS (Alibaba Cloud Container Compute Service)

Important

ACS pods require ack-fluid v1.0.11 or later. Accessing cached Fluid data in ACS containers relies on advanced ACS pod features—submit a support ticket to enable this feature before deploying.

Add the alibabacloud.com/fluid-sidecar-target: acs label to declare ACS compute resources. Fluid adapts the pod for the ACS environment automatically.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  selector:
    matchLabels:
      app: model-serving
  template:
    metadata:
      labels:
        app: model-serving
        alibabacloud.com/fluid-sidecar-target: acs
        alibabacloud.com/acs: "true"
        alibabacloud.com/compute-qos: default
        alibabacloud.com/compute-class: general-purpose
    spec:
      containers:
        - image: fluidcloudnative/serving
          name: serving
          ports:
            - name: http1
              containerPort: 8080
          env:
            - name: TARGET
              value: "World"
          volumeMounts:
            - mountPath: /data
              name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: serverless-data

Run the workflow

  1. Deploy the workflow:

    kubectl create -f workflow.yaml
  2. Check the container logs to confirm cache acceleration is working:

    kubectl logs serverless-workflow-g5knn-3271897614

    Expected output:

    real    0m1.948s
    user    0m0.000s
    sys     0m0.668s

    The real time of 1.948 seconds shows the file copy completed from cache. Without caching, the same operation takes 24.966 seconds—13x slower. See Accelerate Argo workflows (no cache mode) for comparison.

Step 5: Clean up

After testing, delete the workflow and dataset to free resources.

  1. Delete the workflow:

    kubectl delete workflow serverless-workflow-g5knn
  2. Delete the dataset:

    kubectl delete dataset serverless-data

What's next