All Products
Search
Document Center

Container Service for Kubernetes:Accelerate OSS access from edge nodes with Fluid

Last Updated:Apr 03, 2026

Fluid is an open-source, Kubernetes-native, distributed dataset orchestration and acceleration engine. It is designed for data-intensive applications in cloud-native environments, such as big data and AI applications. In edge scenarios, Fluid can significantly accelerate file access from edge nodes to Object Storage Service (OSS). This topic describes how to use the Fluid data acceleration feature in an ACK Edge cluster.

Prerequisites

  • You have an ACK Edge cluster, version 1.18 or later. For more information, see Create an ACK Edge cluster.

  • You have created an edge node pool and added edge nodes to it. For more information, see Create an edge node pool and Add an edge node.

  • You have installed the cloud-native AI suite and deployed the ack-fluid component.

    Important

    If you have installed open source Fluid, uninstall it before you deploy the ack-fluid component.

    • If the cloud-native AI suite is not installed: enable Fluid data acceleration when you install the suite. For more information, see Deploy the AI suite console.

    • If the cloud-native AI suite is already installed: deploy ack-fluid on the Cloud-native AI Suite page of the ACK Console.

  • You have connected to your Kubernetes cluster by using kubectl. For more information, see Connect to an ACK cluster by using kubectl.

  • You have activated Object Storage Service (OSS). For more information, see Activate OSS.

Step 1: Prepare OSS data

  1. Run the following command to download the test data to an ECS instance.

    wget https://archive.apache.org/dist/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
  2. Upload the downloaded test data to an OSS bucket.

    Important

    The following steps use an ECS instance running Alibaba Cloud Linux 3.2104 LTS 64-bit as an example to show how to upload data to OSS. For other operating systems, see ossutil quick start and ossutil command reference 1.0.

    1. Install ossutil.

    2. Create a bucket named examplebucket.

      • Run the following command to create examplebucket.

        ossutil mb oss://examplebucket
      • The following output indicates that examplebucket has been created.

        0.668238(s) elapsed
    3. Upload the downloaded test data to the examplebucket bucket.

      ossutil cp spark-3.0.1-bin-hadoop2.7.tgz oss://examplebucket

Step 2: Create a Dataset and a JindoRuntime

  1. Before you create a Dataset, create a file named mySecret.yaml.

    apiVersion: v1
    kind: Secret
    metadata:
      name: mysecret
    stringData:
      fs.oss.accessKeyId: xxx
      fs.oss.accessKeySecret: xxx

    The fs.oss.accessKeyId and fs.oss.accessKeySecret parameters are the AccessKey ID and AccessKey Secret from Step 1 that are used to access OSS.

  2. Run the following command to create the Secret. Kubernetes encrypts and encodes the Secret to prevent it from being exposed as plaintext.

    kubectl create -f mySecret.yaml
  3. Create a resource.yaml file with the following content. This file serves two purposes:

    • Create a Dataset, which describes the remote dataset and provides information about the underlying file system (UFS).

    • Create a JindoRuntime to launch a JindoFS cluster for data caching.

    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: hadoop
    spec:
      nodeAffinity:
        required:
          nodeSelectorTerms:
            - matchExpressions:
              - key: alibabacloud.com/nodepool-id
                operator: In
                values:
                  - npxxxxxxxxxxxxxx
      mounts:
        - mountPoint: oss://<oss_bucket>/<bucket_dir>
          options:
            fs.oss.endpoint: <oss_endpoint>
          name: hadoop
          path: "/"
          encryptOptions:
            - name: fs.oss.accessKeyId
              valueFrom:
                secretKeyRef:
                  name: mysecret
                  key: fs.oss.accessKeyId
            - name: fs.oss.accessKeySecret
              valueFrom:
                secretKeyRef:
                  name: mysecret
                  key: fs.oss.accessKeySecret
    ---
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: hadoop
    spec:
      nodeSelector:
        alibabacloud.com/nodepool-id: npxxxxxxxxxxxxxx
      replicas: 2
      tieredstore:
        levels:
          - mediumtype: MEM
            path: /dev/shm
            volumeType: emptyDir
            quota: 2Gi
            high: "0.99"
            low: "0.95"
    Note
    • In an ACK Edge cluster, you must use nodeAffinity and nodeSelector to deploy the Dataset and JindoRuntime to the same node pool. This ensures that nodes in the node pool can communicate.

    • Because both edge node management and OSS access require cloud-to-edge network communication, we recommend that you ensure sufficient network bandwidth to maintain the stability of the control channel.

    The following table describes the parameters.

    Parameter

    Description

    mountPoint

    oss://<oss_bucket>/<bucket_dir> specifies the path of the UFS to mount. This path must point to a directory, not a single file. Do not include the endpoint in this path.

    fs.oss.endpoint

    The endpoint of the OSS bucket. You can use a public or internal endpoint. For more information, see Regions and endpoints.

    replicas

    The number of workers in the JindoFS cluster.

    mediumtype

    The cache medium type. JindoFS supports only one cache type at a time: HDD, SSD, or MEM.

    path

    The storage path. Only a single path is supported. If mediumtype is MEM, you must specify a local path for files such as logs.

    quota

    The maximum cache capacity, in GB.

    high

    The high watermark for storage usage.

    low

    The low watermark for storage usage.

  4. Run the following command to create the JindoRuntime and Dataset.

    kubectl create -f resource.yaml
  5. Run the following command to check the status of the Dataset.

    kubectl get dataset hadoop

    Expected output:

    NAME     UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    hadoop        210MiB       0.00B    4.00GiB              0.0%          Bound   1h
  6. Run the following command to check the status of the JindoRuntime.

    kubectl get jindoruntime hadoop

    Expected output:

    NAME     MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
    hadoop   Ready          Ready          Ready        4m45s
  7. Run the following command to verify that the PersistentVolume (PV) and PersistentVolumeClaim (PVC) have been created.

    kubectl get pv,pvc

    Expected output:

    NAME                      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM            STORAGECLASS   REASON   AGE
    persistentvolume/hadoop   100Gi      RWX            Retain           Bound    default/hadoop                           52m
    
    NAME                           STATUS   VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    persistentvolumeclaim/hadoop   Bound    hadoop   100Gi      RWX                           52m

The preceding output shows that the Dataset and JindoRuntime have been created.

Step 3: Test data access acceleration

You can create an application container or submit a machine learning job to use the JindoFS acceleration service. This topic demonstrates the acceleration effect of JindoRuntime by using an application container to access the same data multiple times and comparing the access times.

  1. Create a file named app.yaml with the following content.

    apiVersion: v1
    kind: Pod
    metadata:
      name: demo-app
    spec:
      nodeSelector:
        alibabacloud.com/nodepool-id: npxxxxxxxxxxxxx
      containers:
        - name: demo
          image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
          volumeMounts:
            - mountPath: /data
              name: hadoop
      volumes:
        - name: hadoop
          persistentVolumeClaim:
            claimName: hadoop
    Note

    In an ACK Edge cluster, you must use nodeSelector to deploy the test pod to the node pool specified in Step 2.

  2. Run the following command to create the application container.

    kubectl create -f app.yaml
  3. Open a shell in the pod and check the file size.

    kubectl exec -it demo-app -- bash
    du -sh /data/spark-3.0.1-bin-hadoop2.7.tgz

    Expected output:

    210M    /data/spark-3.0.1-bin-hadoop2.7.tgz
  4. Run the following command to time the file copy.

    time cp /data/spark-3.0.1-bin-hadoop2.7.tgz /dev/null

    Expected output:

    real    0m18.386s
    user    0m0.002s
    sys    0m0.105s

    The output shows that it took about 18 seconds to copy the file.

  5. Run the following command to check the cache status of the Dataset.

    kubectl get dataset hadoop

    Expected output:

    NAME     UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    hadoop   210.00MiB       210.00MiB    4.00GiB        100.0%           Bound   1h

    The output shows that all 210 MiB of data has been cached locally.

  6. Run the following command to delete the previous application container and create a new one.

    Note

    This step prevents interference from other factors, such as the operating system's page cache.

    kubectl delete -f app.yaml && kubectl create -f app.yaml
  7. Run the following commands to time the file copy again.

    kubectl exec -it demo-app -- bash
    time cp /data/spark-3.0.1-bin-hadoop2.7.tgz /dev/null

    Expected output:

    real    0m0.048s
    user    0m0.001s
    sys     0m0.046s

    The output shows that copying the file now takes about 48 milliseconds, over 300 times faster than the initial copy.

    Note

    The second access is much faster because JindoFS has cached the file.

(Optional) Clean up

When data acceleration is no longer needed, delete the pod, the Dataset, and the JindoRuntime.

Delete the pod:

kubectl delete pod demo-app

Delete the Dataset and JindoRuntime:

kubectl delete dataset hadoop