All Products
Search
Document Center

Container Service for Kubernetes:Accelerate data access for Argo jobs

Last Updated:Mar 26, 2026

In serverless (ECI) environments, pods are ephemeral and have no persistent workers to pre-warm a local cache. Fluid's cache-free mode addresses this by routing data access directly through JindoRuntime to Object Storage Service (OSS), without requiring dedicated master or worker nodes. This guide walks you through running an Argo Workflow on ACK virtual nodes with OSS data mounted via Fluid in cache-free mode.

Prerequisites

Before you begin, ensure that you have:

Limitations

This feature is mutually exclusive with the elastic scheduling feature of ACK. For details, see Configure priority-based resource scheduling.

Step 1: Upload the test dataset to the OSS bucket

  1. Download the test dataset (approximately 2 GB).

  2. Upload it to your OSS bucket using ossutil. See Install ossutil.

Step 2: Create a Dataset and JindoRuntime

Deploy a Fluid Dataset and JindoRuntime configured for cache-free mode. This takes a few minutes.

  1. Create secret.yaml with your OSS credentials:

    apiVersion: v1
    kind: Secret
    metadata:
      name: access-key
    stringData:
      fs.oss.accessKeyId: ****
      fs.oss.accessKeySecret: ****
  2. Deploy the Secret:

    kubectl create -f secret.yaml
  3. Create resource.yaml with the Dataset and JindoRuntime definitions:

    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: serverless-data
    spec:
      mounts:
      - mountPoint: oss://<your-bucket>/    # format: oss://<bucket-name>/<optional-path>
        name: demo
        path: /
        options:
          fs.oss.endpoint: oss-cn-shanghai.aliyuncs.com
        encryptOptions:
          - name: fs.oss.accessKeyId
            valueFrom:
              secretKeyRef:
                name: access-key
                key: fs.oss.accessKeyId
          - name: fs.oss.accessKeySecret
            valueFrom:
              secretKeyRef:
                name: access-key
                key: fs.oss.accessKeySecret
      accessModes:
        - ReadWriteMany
    ---
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: serverless-data
    spec:
      master:
        disabled: true    # cache-free mode: no master node
      worker:
        disabled: true    # cache-free mode: no worker node

    Key parameters:

    Parameter Description
    mountPoint Path to the OSS bucket, in the format oss://<bucket>/<path>. Do not include the endpoint here; specify it separately under options.
    fs.oss.endpoint Public or private endpoint of the OSS bucket. You can use the private endpoint to enhance data security, but make sure that your ACK cluster is deployed in the same region as OSS. For example, if your OSS bucket is in the China (Hangzhou) region, the public endpoint is oss-cn-hangzhou.aliyuncs.com and the private endpoint is oss-cn-hangzhou-internal.aliyuncs.com.
    fs.oss.accessKeyId AccessKey ID used to access the bucket.
    fs.oss.accessKeySecret AccessKey secret used to access the bucket.
    accessModes Access mode for the volume. Valid values: ReadWriteOnce, ReadOnlyMany, ReadWriteMany, ReadWriteOncePod. Default: ReadOnlyMany.
    master.disabled / worker.disabled Setting both to true enables cache-free mode. JindoRuntime forwards requests directly to OSS without caching data locally.
  4. Deploy the Dataset and JindoRuntime:

    kubectl create -f resource.yaml
  5. Verify the Dataset is bound:

    kubectl get dataset serverless-data

    Expected output:

    NAME              UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    serverless-data                                                                  Bound   1d

    The Dataset is ready when PHASE shows Bound.

  6. Verify the JindoRuntime is ready:

    kubectl get jindo serverless-data

    Expected output:

    NAME              MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
    serverless-data                                 Ready        3m41s

    The JindoRuntime is ready when FUSE PHASE shows Ready.

Step 3: Create an Argo Workflow to access OSS data

  1. Create workflow.yaml:

    apiVersion: argoproj.io/v1alpha1
    kind: Workflow
    metadata:
      generateName: serverless-workflow-
    spec:
      entrypoint: serverless-workflow-example
      volumes:
      - name: datadir
        persistentVolumeClaim:
          claimName: serverless-data    # references the Dataset created in Step 2
    
      templates:
      - name: serverless-workflow-example
        steps:
        - - name: copy
            template: copy-files
        - - name: check
            template: check-files
    
      - name: copy-files
        metadata:
          labels:
            alibabacloud.com/fluid-sidecar-target: eci    # injects the Fluid sidecar into the ECI pod
            alibabacloud.com/eci: "true"                  # schedules this pod to an ECI virtual node
          annotations:
            k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge  # ECI instance type
        container:
          image: debian:buster
          command: [bash, -c]
          args: ["time cp -r /data/ /tmp"]
          volumeMounts:
          - name: datadir
            mountPath: /data
    
      - name: check-files
        metadata:
          labels:
            alibabacloud.com/fluid-sidecar-target: eci    # injects the Fluid sidecar into the ECI pod
            alibabacloud.com/eci: "true"                  # schedules this pod to an ECI virtual node
          annotations:
            k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge  # ECI instance type
        container:
          image: debian:buster
          command: [bash, -c]
          args: ["du -sh /data; md5sum /data/*"]
          volumeMounts:
          - name: datadir
            mountPath: /data
  2. Submit the Workflow:

    kubectl create -f workflow.yaml
  3. Check the copy time from the copy-files step log:

    kubectl logs serverless-workflow-85sbr-4093682611

    Expected output:

    real    0m24.966s
    user    0m0.009s
    sys     0m0.677s

    The real value is the total copy time. Actual time varies with network latency and bandwidth. To reduce copy time through caching, see Use cache mode to accelerate data access for Argo jobs.

  4. Verify data integrity by comparing MD5 checksums.

    1. Get the MD5 value from the Fluid-mounted file:

      kubectl logs serverless-workflow-85sbr-1882013783

      Expected output:

      1.2G    /data
      871734851bf7d8d2d1193dc5f1f692e6  /data/wwm_uncased_L-24_H-1024_A-16.zip
    2. Get the MD5 value of your local copy:

      md5sum ./wwm_uncased_L-24_H-1024_A-16.zip

      Expected output:

      871734851bf7d8d2d1193dc5f1f692e6  ./wwm_uncased_L-24_H-1024_A-16.zip

    Matching checksums confirm that Fluid correctly served the data from OSS.

Step 4: Clean up

When you no longer need the environment, delete the Workflow and Dataset:

kubectl delete workflow serverless-workflow-85sbr
kubectl delete dataset serverless-data

What's next