All Products
Search
Document Center

Container Service for Kubernetes:Use Fluid to accelerate access to OSS files from edge nodes

Last Updated:Mar 26, 2026

In edge computing, each OSS file access travels over the cloud-edge network, adding significant latency. Fluid caches OSS data on edge node memory so that repeated reads bypass the network entirely. This tutorial walks you through uploading a test dataset to OSS, creating a Dataset and JindoRuntime on an edge node pool, deploying a test Pod, and verifying the caching effect—reducing a 210 MiB file read from 18 seconds down to 48 milliseconds.

Prerequisites

Before you begin, make sure you have:

Step 1: Upload data to OSS

  1. Download the test dataset to an Elastic Compute Service (ECS) instance:

    wget https://archive.apache.org/dist/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
  2. Upload the dataset to an OSS bucket.

    Important

    The following steps use an ECS instance running Alibaba Cloud Linux 3.2104 LTS 64-bit. For other operating systems, see ossutil and ossutil 1.0.

    1. Install ossutil.

    2. Create a bucket named examplebucket:

      ossutil mb oss://examplebucket

      Expected output:

      0.668238(s) elapsed
    3. Upload the test dataset to examplebucket:

      ossutil cp spark-3.0.1-bin-hadoop2.7.tgz oss://examplebucket

Step 2: Create a Dataset and a JindoRuntime

In an ACK Edge cluster, both edge node management and OSS access use the cloud-edge network. Deploy the Dataset and JindoRuntime to the same node pool as your workloads to keep data access within the node pool and to reserve bandwidth for the management channel.

  1. Create a file named mySecret.yaml with the following content. Replace xxx with the AccessKey ID and AccessKey secret used in Step 1.

    apiVersion: v1
    kind: Secret
    metadata:
      name: mysecret
    stringData:
      fs.oss.accessKeyId: xxx
      fs.oss.accessKeySecret: xxx
  2. Create the Secret. Kubernetes encrypts the Secret to prevent credentials from being stored as plaintext.

    kubectl create -f mySecret.yaml
  3. Create a file named resource.yaml with the following content:

    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: hadoop
    spec:
      nodeAffinity:
        required:
          nodeSelectorTerms:
            - matchExpressions:
              - key: alibabacloud.com/nodepool-id
                operator: In
                values:
                  - npxxxxxxxxxxxxxx
      mounts:
        - mountPoint: oss://<oss_bucket>/<bucket_dir>
          options:
            fs.oss.endpoint: <oss_endpoint>
          name: hadoop
          path: "/"
          encryptOptions:
            - name: fs.oss.accessKeyId
              valueFrom:
                secretKeyRef:
                  name: mysecret
                  key: fs.oss.accessKeyId
            - name: fs.oss.accessKeySecret
              valueFrom:
                secretKeyRef:
                  name: mysecret
                  key: fs.oss.accessKeySecret
    ---
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: hadoop
    spec:
      nodeSelector:
        alibabacloud.com/nodepool-id: npxxxxxxxxxxxxxx
      replicas: 2
      tieredstore:
        levels:
          - mediumtype: MEM
            path: /dev/shm
            volumeType: emptyDir
            quota: 2Gi
            high: "0.99"
            low: "0.95"

    This template creates two resources:

    • A Dataset that specifies the OSS path to mount and references the Secret for credentials. The nodeAffinity field pins the Dataset to the target node pool.

    • A JindoRuntime that launches a JindoFS cluster for data caching. Set nodeSelector to the same node pool as the Dataset's nodeAffinity.

    Key parameters:

    ParameterDescription
    mountPointThe OSS path to mount, in the format oss://<oss_bucket>/<bucket_dir>. Must point to a directory, not a file. The endpoint is specified separately.
    fs.oss.endpointThe public or private endpoint of the OSS bucket. See Regions and endpoints.
    replicasThe number of workers in the JindoFS cluster.
    mediumtypeThe cache storage type. Valid values: HDD, SDD, MEM.
    pathThe local storage path for cache data. Required when mediumtype is MEM.
    quotaThe maximum cache size. Unit: GiB.
    highThe upper limit of the storage capacity.
    lowThe lower limit of the storage capacity.
  4. Create the Dataset and JindoRuntime:

    kubectl create -f resource.yaml
  5. Verify the Dataset is bound:

    kubectl get dataset hadoop

    Expected output:

    NAME     UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    hadoop        210MiB       0.00B    4.00GiB              0.0%          Bound   1h
  6. Verify the JindoRuntime is ready:

    kubectl get jindoruntime hadoop

    Expected output:

    NAME     MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
    hadoop   Ready          Ready          Ready        4m45s
  7. Verify the persistent volume (PV) and persistent volume claim (PVC) are created:

    kubectl get pv,pvc

    Expected output:

    NAME                      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS   REASON   AGE
    persistentvolume/hadoop   100Gi      RWX            Retain           Bound    default/hadoop                           52m
    
    NAME                           STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    persistentvolumeclaim/hadoop   Bound    hadoop   100Gi      RWX                           52m

The Dataset and JindoRuntime are ready when all phases show Ready and the PVC status shows Bound.

Step 3: Test data access acceleration

Deploy a test Pod to the same node pool, read a file twice, and compare the access times to observe the JindoFS caching effect.

  1. Create a file named app.yaml with the following content:

    apiVersion: v1
    kind: Pod
    metadata:
      name: demo-app
    spec:
      nodeSelector:
        alibabacloud.com/nodepool-id: npxxxxxxxxxxxxx
      containers:
        - name: demo
          image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
          volumeMounts:
            - mountPath: /data
              name: hadoop
      volumes:
        - name: hadoop
          persistentVolumeClaim:
            claimName: hadoop
    Note

    Set nodeSelector to the same node pool ID used in Step 2.

  2. Deploy the Pod:

    kubectl create -f app.yaml
  3. Verify the file is accessible and check its size:

    kubectl exec -it demo-app -- bash
    du -sh /data/spark-3.0.1-bin-hadoop2.7.tgz

    Expected output:

    210M    /data/spark-3.0.1-bin-hadoop2.7.tgz
  4. Measure the first read time. Because no data is cached yet, JindoFS fetches the file from OSS over the cloud-edge network—this read will be slow.

    time cp /data/spark-3.0.1-bin-hadoop2.7.tgz /dev/null

    Expected output:

    real    0m18.386s
    user    0m0.002s
    sys    0m0.105s
  5. Confirm that the file is now fully cached:

    kubectl get dataset hadoop

    Expected output:

    NAME     UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    hadoop   210.00MiB       210.00MiB    4.00GiB        100.0%           Bound   1h

    The dataset shows 100% cached, meaning all data is stored in the JindoFS workers on the node pool.

  6. Recreate the Pod to clear the Linux page cache. This ensures the second read uses only the JindoFS cache, not the OS-level cache.

    kubectl delete -f app.yaml && kubectl create -f app.yaml
  7. Measure the second read time:

    kubectl exec -it demo-app -- bash
    time cp /data/spark-3.0.1-bin-hadoop2.7.tgz /dev/null

    Expected output:

    real    0m0.048s
    user    0m0.001s
    sys     0m0.046s

    The second read completes in 48 milliseconds—more than 300x faster—because JindoFS serves the data from local node memory instead of fetching it from OSS.

(Optional) Clear the environment

Delete the test Pod:

kubectl delete pod demo-app

Delete the Dataset and JindoRuntime:

kubectl delete dataset hadoop