All Products
Search
Document Center

Container Service for Kubernetes:Accelerate Argo task data access

Last Updated:Jul 28, 2025

Fluid lets you use JindoRuntime to accelerate access to data in Object Storage Service (OSS) in serverless scenarios. Data access can be accelerated in cache mode or no-cache mode. This topic describes how to accelerate Argo task data access in no-cache mode.

Prerequisites

  • A Container Service for Kubernetes (ACK) Pro cluster with non-containerOS is created, and the Kubernetes version of the cluster is 1.18 or later. For more information, see Create an ACK Pro cluster.

    Important

    The ack-fluid component is not currently supported on the ContainerOS.

  • The Cloud-native AI Suite is installed and the ack-fluid component is deployed.

    Important

    If you have installed open source Fluid, you must uninstall it before you deploy the ack-fluid component.

  • You have connected to the Kubernetes cluster using kubectl. For more information, see Connect to a cluster using kubectl.

  • You have activated Object Storage Service (OSS) and created a bucket. For more information, see Activate OSS and Create a bucket in the console.

Limits

This feature is mutually exclusive with the elastic scheduling feature of ACK. For more information about the elastic scheduling feature of ACK, see Configure priority-based resource scheduling.

Step 1: Upload the test dataset to the OSS bucket

  1. Create a test dataset of 2 GB in size. In this example, the test dataset is used.

  2. Upload the test dataset to the OSS bucket that you created.

    You can use the ossutil tool provided by OSS to upload data. For more information, see Install ossutil.

Step 2: Create a dataset and a JindoRuntime

After you set up the ACK cluster and OSS bucket, you need to deploy the dataset and JindoRuntime. The deployment requires only a few minutes.

  1. Create a file named secret.yaml based on the following content.

    The file contains the fs.oss.accessKeyId and fs.oss.accessKeySecret that are used to access the OSS bucket.

    apiVersion: v1
    kind: Secret
    metadata:
      name: access-key
    stringData:
      fs.oss.accessKeyId: ****
      fs.oss.accessKeySecret: ****
  2. Run the following command to deploy the Secret:

    kubectl create -f secret.yaml
  3. Create a file named resource.yaml based on the following content.

    The YAML file stores the following information:

    • Dataset: specifies the dataset that is stored in a remote datastore and the Unix file system (UFS) information.

    • JindoRuntime: enables JindoFS for data caching in the cluster.

    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: serverless-data
    spec:
      mounts:
      - mountPoint: oss://large-model-sh/
        name: demo
        path: /
        options:
          fs.oss.endpoint: oss-cn-shanghai.aliyuncs.com
        encryptOptions:
          - name: fs.oss.accessKeyId
            valueFrom:
              secretKeyRef:
                name: access-key
                key: fs.oss.accessKeyId
          - name: fs.oss.accessKeySecret
            valueFrom:
              secretKeyRef:
                name: access-key
                key: fs.oss.accessKeySecret
      accessModes:
        - ReadWriteMany
    ---
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: serverless-data
    spec:
      master:
        disabled: true
      worker:
        disabled: true

    The following table describes some of the parameters that are specified in the preceding code block.

    Parameter

    Description

    mountPoint

    The path to which the UFS file system is mounted. The format of the path is oss://<oss_bucket>/<bucket_dir>.

    Do not include endpoint information in the path. <bucket_dir> is optional if you can directly access the bucket.

    fs.oss.endpoint

    The public or private endpoint of the OSS bucket.

    You can specify the private endpoint of the bucket to enhance data security. However, if you specify the private endpoint, make sure that your ACK cluster is deployed in the region where OSS is activated. For example, if your OSS bucket is created in the China (Hangzhou) region, the public endpoint of the bucket is oss-cn-hangzhou.aliyuncs.com and the private endpoint is oss-cn-hangzhou-internal.aliyuncs.com.

    fs.oss.accessKeyId

    The AccessKey ID that is used to access the bucket.

    fs.oss.accessKeySecret

    The AccessKey secret that is used to access the bucket.

    accessModes

    The access mode. Valid values: ReadWriteOnce, ReadOnlyMany, ReadWriteMany, and ReadWriteOncePod. Default value: ReadOnlyMany.

    disabled

    If you set this parameter to true for both master and worker nodes, the no cache mode is used.

  4. Run the following command to deploy the dataset and JindoRuntime:

    kubectl create -f resource.yaml
  5. Run the following command to check whether the dataset is deployed:

    kubectl get dataset serverless-data

    Expected output:

    NAME              UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    serverless-data                                                                  Bound   1d

    Bound is displayed in the PHASE column of the output. This indicates that the dataset is deployed.

  6. Run the following command to check whether the JindoRuntime is deployed:

    kubectl get jindo serverless-data

    Expected output:

    NAME              MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
    serverless-data                                 Ready        3m41s

    Ready is displayed in the FUSE column of the output. This indicates that the JindoRuntime is deployed.

Step 3: Create an Argo task to access OSS

You can create an application container to use the JindoFS acceleration service or submit a machine learning job to use the related features. This topic uses an Argo task that accesses OSS as an example.

  1. Create a file named workflow.yaml with the following content.

    apiVersion: argoproj.io/v1alpha1
    kind: Workflow
    metadata:
      generateName: serverless-workflow-
    spec:
      entrypoint: serverless-workflow-example
      volumes:
      - name: datadir
        persistentVolumeClaim:
          claimName: serverless-data
    
      templates:
      - name: serverless-workflow-example
        steps:
        - - name: copy
            template: copy-files
        - - name: check
            template: check-files
    
      - name: copy-files
        metadata:
          labels:
           alibabacloud.com/fluid-sidecar-target: eci
           alibabacloud.com/eci: "true"
          annotations:
             k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge
        container:
          image: debian:buster
          command: [bash, -c]
          args: ["time cp -r /data/ /tmp"]
          volumeMounts:
          - name: datadir
            mountPath: /data
    
      - name: check-files
        metadata:
          labels:
            alibabacloud.com/fluid-sidecar-target: eci
            alibabacloud.com/eci: "true"
          annotations:
             k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge
        container:
          image: debian:buster
          command: [bash, -c]
          args: ["du -sh /data; md5sum /data/*"]
          volumeMounts:
          - name: datadir
            mountPath: /data
  2. Run the following command to create the workflow.

    kubectl create -f workflow.yaml
  3. Run the following command to view the startup log.

    kubectl logs serverless-workflow-85sbr-4093682611

    Expected output:

    real    0m24.966s
    user    0m0.009s
    sys     0m0.677s

    The output shows that the file copy time (real) is 0m24.966s. The copy time depends on network latency and bandwidth. To accelerate data access, see Use the cache mode to accelerate Argo task data access.

  4. Verify that the file read using Fluid is consistent with the local file.

    1. Run the following command to view the MD5 validation value of the file read using Fluid.

      kubectl logs serverless-workflow-85sbr-1882013783

      Expected output:

      1.2G    /data
      871734851bf7d8d2d1193dc5f1f692e6  /data/wwm_uncased_L-24_H-1024_A-16.zip
    2. Run the following command to view the MD5 validation value of the local file.

      md5sum ./wwm_uncased_L-24_H-1024_A-16.zip

      Expected output:

      871734851bf7d8d2d1193dc5f1f692e6  ./wwm_uncased_L-24_H-1024_A-16.zip

      The output shows that the MD5 validation values are consistent, which indicates that the file read using Fluid is the same as the local file.

Step 4: Clean up the environment

When you no longer need the data access feature, clean up the environment to release resources. Perform the following operations:

  1. Run the following command to delete the application container.

    kubectl delete workflow serverless-workflow-85sbr
  2. Run the following command to delete the Dataset.

    kubectl delete dataset serverless-data