All Products
Search
Document Center

Container Service for Kubernetes:Accelerate online applications

Last Updated:Mar 26, 2026

When online applications load large models or datasets from Object Storage Service (OSS) on every startup, cold-start latency can reach 27 seconds or more per pod. Fluid with JindoRuntime caches OSS data in cluster memory so repeated reads are served locally, reducing load time to under 3 seconds. This topic walks you through the end-to-end setup for elastic container instances (ECI) and Alibaba Cloud Container Compute Service (ACS) pods running in ACK Pro serverless environments.

Prerequisites

Before you begin, ensure that you have:

Limitations

This feature is mutually exclusive with the elastic scheduling feature of ACK. See Configure priority-based resource scheduling.

Step 1: Upload a test dataset to OSS

Download the dataset. Download a 2 GB test dataset. This example uses the BERT wwm_uncased_L-24_H-1024_A-16 model.

Upload the dataset to your OSS bucket. Use the ossutil tool to upload the dataset. See Install ossutil.

Step 2: Create a Dataset and JindoRuntime

Deploy the Dataset and JindoRuntime to bind your OSS data to the cluster. The deployment takes a few minutes.

Create the Secret.

Create a file named secret.yaml. The Secret stores the AccessKey ID and AccessKey secret used to access OSS.

apiVersion: v1
kind: Secret
metadata:
  name: access-key
stringData:
  fs.oss.accessKeyId: ****
  fs.oss.accessKeySecret: ****

Deploy the Secret.

kubectl create -f secret.yaml

Create the Dataset and JindoRuntime.

Create a file named dataset.yaml with the following content:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: serverless-data
spec:
  mounts:
  - mountPoint: oss://<oss_bucket>/<bucket_dir>
    name: demo
    path: /
    options:
      fs.oss.endpoint: <oss_endpoint>
    encryptOptions:
      - name: fs.oss.accessKeyId
        valueFrom:
          secretKeyRef:
            name: access-key
            key: fs.oss.accessKeyId
      - name: fs.oss.accessKeySecret
        valueFrom:
          secretKeyRef:
            name: access-key
            key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: serverless-data
spec:
  replicas: 1
  tieredstore:
    levels:
      - mediumtype: MEM
        volumeType: emptyDir
        path: /dev/shm
        quota: 5Gi
        high: "0.95"
        low: "0.7"

The following table describes key parameters in the configuration.

Parameter Description
mountPoint The OSS path to mount, in the format oss://<oss_bucket>/<bucket_dir>. Do not include endpoint information. Example: oss://mybucket/path/to/dir. If you use a single mount target, set path to /.
fs.oss.endpoint The public or private endpoint of the OSS bucket. To use a private endpoint, make sure your ACK cluster is in the same region as the OSS bucket. Example: public endpoint oss-cn-hangzhou.aliyuncs.com, private endpoint oss-cn-hangzhou-internal.aliyuncs.com.
fs.oss.accessKeyId The AccessKey ID used to access the bucket.
fs.oss.accessKeySecret The AccessKey secret used to access the bucket.
replicas The number of workers to create in the JindoFS cluster.
mediumtype The cache medium type. Valid values: HDD, SSD, MEM. See Policy 2: Select proper cache media.
volumeType The volume type for the cache medium. Valid values: emptyDir (default: hostPath). Use emptyDir for memory or local system disk cache to prevent residual data on the node. Use hostPath for local data disks, and set path to the mount path of the disk on the host. See Policy 2: Select proper cache media.
path The cache directory path. Only one path can be specified.
quota The maximum cache size. Example: 100Gi sets the limit to 100 GiB.
high The upper limit of the storage.
low The lower limit of the storage.
Important

The default dataset access mode is read-only. To use read/write mode, see Configure the access mode of a dataset.

Deploy the Dataset and JindoRuntime.

kubectl create -f dataset.yaml

Verify the Dataset.

kubectl get dataset serverless-data

Expected output:

NAME              UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
serverless-data   1.16GiB          0.00B    5.00GiB          0.0%                Bound   2m8s

PHASE: Bound confirms the Dataset is ready.

Verify the JindoRuntime.

kubectl get jindo serverless-data

Expected output:

NAME              MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
serverless-data   Ready          Ready          Ready        2m51s

FUSE: Ready confirms the JindoRuntime is ready.

(Optional) Step 3: Prefetch data

Prefetching loads OSS data into the cache before any application requests it. If you skip this step, the first access to each file triggers a cache miss and incurs latency similar to no-cache mode (approximately 27 seconds in the example below). Prefetch when you need consistent low-latency access from the first request.

Create the DataLoad.

Create a file named dataload.yaml with the following content:

apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: serverless-data-warmup
spec:
  dataset:
    name: serverless-data
    namespace: default
  loadMetadata: true

Deploy the DataLoad.

kubectl create -f dataload.yaml

Monitor prefetch progress.

kubectl get dataload

Expected output:

NAME                     DATASET           PHASE      AGE     DURATION
serverless-data-warmup   serverless-data   Complete   2m49s   45s

PHASE: Complete confirms prefetching finished. DURATION: 45s shows how long the process took.

Verify the cache.

kubectl get dataset

Expected output:

NAME              UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
serverless-data   1.16GiB          1.16GiB   5.00GiB          100.0%              Bound   5m20s

CACHED PERCENTAGE: 100.0% confirms all data is cached and ready for low-latency access.

Step 4: Deploy an application to access OSS data

Deploy a Kubernetes Deployment to test data access accelerated by JindoFS or run machine learning inference workloads. The examples below cover two serverless compute options.

Create the serving.yaml file with the configuration for your compute target.

Deploy as an elastic container instance

Add the alibabacloud.com/fluid-sidecar-target: eci label to the pod to declare that it runs as an elastic container instance (ECI). When the pod is created, Fluid automatically converts it to an ECI-compatible format — no manual intervention required.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  selector:
    matchLabels:
      app: model-serving
  template:
    metadata:
      labels:
        app: model-serving
        alibabacloud.com/fluid-sidecar-target: eci
        alibabacloud.com/eci: "true"
    spec:
      containers:
        - image: fluidcloudnative/serving
          name: serving
          ports:
            - name: http1
              containerPort: 8080
          env:
            - name: TARGET
              value: "World"
          volumeMounts:
            - mountPath: /data
              name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: serverless-data

Deploy as an Alibaba Cloud Container Compute Service pod

Important
  • Accessing cached Fluid data in Alibaba Cloud Container Compute Service (ACS) application containers requires ack-fluid v1.0.11 or later.

  • This relies on advanced ACS pod features. Submit a support ticket to enable this feature before proceeding.

Add the alibabacloud.com/fluid-sidecar-target: acs label to declare that the pod uses ACS compute resources. Fluid automatically adapts the pod to run in the ACS environment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  selector:
    matchLabels:
      app: model-serving
  template:
    metadata:
      labels:
        app: model-serving
        alibabacloud.com/fluid-sidecar-target: acs
        alibabacloud.com/acs: "true"
        alibabacloud.com/compute-qos: default
        alibabacloud.com/compute-class: general-purpose
    spec:
      containers:
        - image: fluidcloudnative/serving
          name: serving
          ports:
            - name: http1
              containerPort: 8080
          env:
            - name: TARGET
              value: "World"
          volumeMounts:
            - mountPath: /data
              name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: serverless-data

Deploy the Deployment.

kubectl create -f serving.yaml

Verify data access.

Log in to a running container:

kubectl exec -it model-serving-85b645b5d5-2trnf -c serving -- bash

Check the size of the cached file:

bash-4.4# du -sh /data/wwm_uncased_L-24_H-1024_A-16.zip

Expected output:

1.2G    /data/wwm_uncased_L-24_H-1024_A-16.zip

Check the model load time.

kubectl logs model-serving-85b9587c5b-9dpbc -c serving

Expected output:

Begin loading models at 18:18:25

real    0m2.142s
user    0m0.000s
sys    0m0.755s
Finish loading models at 18:18:27

The real field shows the file replication took 2.142 seconds (0m2.142s) in cache mode. In the Accelerate online applications topic, it took 27.107 seconds (0m27.107s) in no-cache mode. The duration in no-cache mode increases by almost 14 times compared with the duration in cache mode.

Step 5: Clean up

Delete the Deployment and Dataset after testing to release cluster resources.

Delete the Deployment.

kubectl delete deployment model-serving

Delete the Dataset.

kubectl delete dataset serverless-data