All Products
Search
Document Center

Container Service for Kubernetes:Accelerate data access for Job applications

Last Updated:Mar 26, 2026

Running Kubernetes Jobs that read large datasets from OSS can be slow when data is accessed through standard object storage APIs. Fluid uses JindoRuntime to mount OSS data directly as a POSIX file system, giving your Jobs low-latency access without a caching layer. This topic shows you how to set up JindoRuntime in no cache mode and run a Kubernetes Job against an OSS bucket.

Prerequisites

Before you begin, make sure you have:

Cluster and components:

  • An ACK Pro cluster running on non-containerOS nodes, with Kubernetes 1.18 or later. See Create an ACK Pro cluster.

    Important

    The ack-fluid component does not support ContainerOS nodes.

  • The ack-fluid component deployed. Choose one of the following:

    • If you have not yet installed the cloud-native AI suite: enable Fluid acceleration when you install the suite. See Deploy the cloud-native AI suite.

    • If you have already installed the cloud-native AI suite: go to the Cloud-native AI Suite page in the ACK console and deploy the ack-fluid component.

    Important

    If you have an existing open source Fluid installation, uninstall it before deploying ack-fluid.

  • Virtual nodes deployed in the ACK Pro cluster. See Schedule pods to Elastic Container Instances.

Tools and access:

Limits

No cache mode is mutually exclusive with the elastic scheduling feature of ACK. See Configure priority-based resource scheduling.

Step 1: Upload the test dataset to OSS

  1. Prepare a test dataset of 2 GB. This example uses the wwm_uncased_L-24_H-1024_A-16 BERT dataset.

  2. Upload the dataset to your OSS bucket using ossutil. See Install ossutil.

Step 2: Create a Dataset and JindoRuntime

The following steps create the Kubernetes Secret, Dataset, and JindoRuntime resources needed to mount the OSS bucket.

The deployment takes a few minutes.

Create a Secret for OSS credentials

  1. Create secret.yaml with the following content:

    apiVersion: v1
    kind: Secret
    metadata:
      name: access-key
    stringData:
      fs.oss.accessKeyId: ****
      fs.oss.accessKeySecret: ****
  2. Deploy the Secret:

    kubectl create -f secret.yaml

Create the Dataset and JindoRuntime

  1. Create resource.yaml with the following content:

    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: serverless-data
    spec:
      mounts:
      - mountPoint: oss://large-model-sh/
        name: demo
        path: /
        options:
          fs.oss.endpoint: oss-cn-shanghai.aliyuncs.com
        encryptOptions:
          - name: fs.oss.accessKeyId
            valueFrom:
              secretKeyRef:
                name: access-key
                key: fs.oss.accessKeyId
          - name: fs.oss.accessKeySecret
            valueFrom:
              secretKeyRef:
                name: access-key
                key: fs.oss.accessKeySecret
      accessModes:
        - ReadWriteMany
    ---
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: serverless-data
    spec:
      master:
        disabled: true
      worker:
        disabled: true

    Key parameters:

    Parameter Description
    mountPoint Mount path for the UFS. Format: oss://<oss_bucket>/<bucket_dir>. Do not include the endpoint in this field. <bucket_dir> is optional if you access the bucket root.
    fs.oss.endpoint Public or private endpoint of the OSS bucket. You can specify the private endpoint to enhance data security. If you specify the private endpoint, make sure your ACK cluster is deployed in the region where OSS is activated. For example, for China (Hangzhou): public endpoint is oss-cn-hangzhou.aliyuncs.com, private endpoint is oss-cn-hangzhou-internal.aliyuncs.com.
    fs.oss.accessKeyId AccessKey ID used to access the bucket.
    fs.oss.accessKeySecret AccessKey secret used to access the bucket.
    accessModes PVC access mode. Valid values: ReadWriteOnce, ReadOnlyMany, ReadWriteMany, ReadWriteOncePod. Default value: ReadOnlyMany.
    master.disabled / worker.disabled Set to true for both master and worker nodes to enable no cache mode.
  2. Deploy the Dataset and JindoRuntime:

    kubectl create -f resource.yaml

Verify the deployment

  1. Check the Dataset status:

    kubectl get dataset serverless-data

    Expected output:

    NAME              UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    serverless-data                                                                   Bound   1d

    The Dataset is ready when PHASE shows Bound.

  2. Check the JindoRuntime status:

    kubectl get jindo serverless-data

    Expected output:

    NAME              MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
    serverless-data                                 Ready        3m41s

    The JindoRuntime is ready when FUSE PHASE shows Ready.

Step 3: Run a Job to access OSS data

The Job mounts the serverless-data PVC at /data and copies all files to /tmp, measuring how long the transfer takes.

  1. Create job.yaml with the following content:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: demo-app
    spec:
      template:
        metadata:
          labels:
            alibabacloud.com/fluid-sidecar-target: eci
            alibabacloud.com/eci: "true"
          annotations:
            k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge
        spec:
          containers:
            - name: demo
              image: debian:buster
              command:
                - /bin/bash
              args:
                - -c
                - du -sh /data && time cp -r /data/ /tmp
              volumeMounts:
                - mountPath: /data
                  name: demo
          restartPolicy: Never
          volumes:
            - name: demo
              persistentVolumeClaim:
                claimName: serverless-data
      backoffLimit: 4
  2. Deploy the Job:

    kubectl create -f job.yaml
  3. View the container log to see the result:

    kubectl logs demo-app--1-5fr74 -c demo

    Expected output:

    real    0m23.644s
    user    0m0.004s
    sys     0m1.036s

    The real value shows the total wall-clock time to copy the dataset. Actual duration varies with network latency and bandwidth.

    No cache mode reads data directly from OSS on every access. To reduce latency for repeated reads, use cache mode instead. See Accelerate Jobs in cache mode.

Step 4: Clean up

Delete the Job and Dataset after you finish testing:

kubectl delete job demo-app
kubectl delete dataset serverless-data