Accelerate OSS Access for Serverless Pods with Fluid - Container Service for Kubernetes

Prerequisites

Before you begin, ensure that you have:

An ACK Pro cluster running Kubernetes 1.18 or later. See Create an ACK Pro cluster
The cloud-native AI suite installed with the ack-fluid component deployed:
- If you have not installed the suite, enable Fluid acceleration when installing it. See Deploy the cloud-native AI suite
- If you have already installed the suite, go to the Cloud-native AI Suite page in the ACK console and deploy the ack-fluid component
Important
If you have previously installed open source Fluid, uninstall it before deploying the ack-fluid component.
Virtual nodes deployed in the ACK Pro cluster. See Schedule pods to elastic container instances through virtual nodes
A kubectl client connected to the ACK Pro cluster. See Connect to a cluster by using kubectl
OSS activated and a bucket created. See Activate OSS and Create buckets

Limitations

This feature is mutually exclusive with the elastic scheduling feature of ACK. See Configure priority-based resource scheduling.

Step 1: Upload a test dataset to OSS

Download the dataset. Download a 2 GB test dataset. This example uses the BERT wwm_uncased_L-24_H-1024_A-16 model.

Upload the dataset to your OSS bucket. Use the ossutil tool to upload the dataset. See Install ossutil.

Step 2: Create a Dataset and JindoRuntime

Deploy the Dataset and JindoRuntime to bind your OSS data to the cluster. The deployment takes a few minutes.

Create the Secret.

Create a file named secret.yaml. The Secret stores the AccessKey ID and AccessKey secret used to access OSS.

apiVersion: v1
kind: Secret
metadata:
  name: access-key
stringData:
  fs.oss.accessKeyId: ****
  fs.oss.accessKeySecret: ****

Deploy the Secret.

kubectl create -f secret.yaml

Create the Dataset and JindoRuntime.

Create a file named dataset.yaml with the following content:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: serverless-data
spec:
  mounts:
  - mountPoint: oss://<oss_bucket>/<bucket_dir>
    name: demo
    path: /
    options:
      fs.oss.endpoint: <oss_endpoint>
    encryptOptions:
      - name: fs.oss.accessKeyId
        valueFrom:
          secretKeyRef:
            name: access-key
            key: fs.oss.accessKeyId
      - name: fs.oss.accessKeySecret
        valueFrom:
          secretKeyRef:
            name: access-key
            key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: serverless-data
spec:
  replicas: 1
  tieredstore:
    levels:
      - mediumtype: MEM
        volumeType: emptyDir
        path: /dev/shm
        quota: 5Gi
        high: "0.95"
        low: "0.7"

The following table describes key parameters in the configuration.

Parameter	Description
`mountPoint`	The OSS path to mount, in the format `oss://<oss_bucket>/<bucket_dir>`. Do not include endpoint information. Example: `oss://mybucket/path/to/dir`. If you use a single mount target, set `path` to `/`.
`fs.oss.endpoint`	The public or private endpoint of the OSS bucket. To use a private endpoint, make sure your ACK cluster is in the same region as the OSS bucket. Example: public endpoint `oss-cn-hangzhou.aliyuncs.com`, private endpoint `oss-cn-hangzhou-internal.aliyuncs.com`.
`fs.oss.accessKeyId`	The AccessKey ID used to access the bucket.
`fs.oss.accessKeySecret`	The AccessKey secret used to access the bucket.
`replicas`	The number of workers to create in the JindoFS cluster.
`mediumtype`	The cache medium type. Valid values: `HDD`, `SSD`, `MEM`. See Policy 2: Select proper cache media.
`volumeType`	The volume type for the cache medium. Valid values: `emptyDir` (default: `hostPath`). Use `emptyDir` for memory or local system disk cache to prevent residual data on the node. Use `hostPath` for local data disks, and set `path` to the mount path of the disk on the host. See Policy 2: Select proper cache media.
`path`	The cache directory path. Only one path can be specified.
`quota`	The maximum cache size. Example: `100Gi` sets the limit to 100 GiB.
`high`	The upper limit of the storage.
`low`	The lower limit of the storage.

Important

The default dataset access mode is read-only. To use read/write mode, see Configure the access mode of a dataset.

Deploy the Dataset and JindoRuntime.

kubectl create -f dataset.yaml

Verify the Dataset.

kubectl get dataset serverless-data

Expected output:

NAME              UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
serverless-data   1.16GiB          0.00B    5.00GiB          0.0%                Bound   2m8s

PHASE: Bound confirms the Dataset is ready.

Verify the JindoRuntime.

kubectl get jindo serverless-data

Expected output:

NAME              MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
serverless-data   Ready          Ready          Ready        2m51s

FUSE: Ready confirms the JindoRuntime is ready.

(Optional) Step 3: Prefetch data

Prefetching loads OSS data into the cache before any application requests it. If you skip this step, the first access to each file triggers a cache miss and incurs latency similar to no-cache mode (approximately 27 seconds in the example below). Prefetch when you need consistent low-latency access from the first request.

Create the DataLoad.

Create a file named dataload.yaml with the following content:

apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
  name: serverless-data-warmup
spec:
  dataset:
    name: serverless-data
    namespace: default
  loadMetadata: true

Deploy the DataLoad.

kubectl create -f dataload.yaml

Monitor prefetch progress.

kubectl get dataload

Expected output:

NAME                     DATASET           PHASE      AGE     DURATION
serverless-data-warmup   serverless-data   Complete   2m49s   45s

PHASE: Complete confirms prefetching finished. DURATION: 45s shows how long the process took.

Verify the cache.

kubectl get dataset

Expected output:

NAME              UFS TOTAL SIZE   CACHED    CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
serverless-data   1.16GiB          1.16GiB   5.00GiB          100.0%              Bound   5m20s

CACHED PERCENTAGE: 100.0% confirms all data is cached and ready for low-latency access.

Step 4: Deploy an application to access OSS data

Deploy a Kubernetes Deployment to test data access accelerated by JindoFS or run machine learning inference workloads. The examples below cover two serverless compute options.

Create the serving.yaml file with the configuration for your compute target.

Deploy as an elastic container instance

Add the alibabacloud.com/fluid-sidecar-target: eci label to the pod to declare that it runs as an elastic container instance (ECI). When the pod is created, Fluid automatically converts it to an ECI-compatible format — no manual intervention required.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  selector:
    matchLabels:
      app: model-serving
  template:
    metadata:
      labels:
        app: model-serving
        alibabacloud.com/fluid-sidecar-target: eci
        alibabacloud.com/eci: "true"
    spec:
      containers:
        - image: fluidcloudnative/serving
          name: serving
          ports:
            - name: http1
              containerPort: 8080
          env:
            - name: TARGET
              value: "World"
          volumeMounts:
            - mountPath: /data
              name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: serverless-data

Deploy as an Alibaba Cloud Container Compute Service pod

Important

Accessing cached Fluid data in Alibaba Cloud Container Compute Service (ACS) application containers requires ack-fluid v1.0.11 or later.
This relies on advanced ACS pod features. Submit a support ticket to enable this feature before proceeding.

Add the alibabacloud.com/fluid-sidecar-target: acs label to declare that the pod uses ACS compute resources. Fluid automatically adapts the pod to run in the ACS environment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model-serving
spec:
  selector:
    matchLabels:
      app: model-serving
  template:
    metadata:
      labels:
        app: model-serving
        alibabacloud.com/fluid-sidecar-target: acs
        alibabacloud.com/acs: "true"
        alibabacloud.com/compute-qos: default
        alibabacloud.com/compute-class: general-purpose
    spec:
      containers:
        - image: fluidcloudnative/serving
          name: serving
          ports:
            - name: http1
              containerPort: 8080
          env:
            - name: TARGET
              value: "World"
          volumeMounts:
            - mountPath: /data
              name: data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: serverless-data

Deploy the Deployment.

kubectl create -f serving.yaml

Verify data access.

Log in to a running container:

kubectl exec -it model-serving-85b645b5d5-2trnf -c serving -- bash

Check the size of the cached file:

bash-4.4# du -sh /data/wwm_uncased_L-24_H-1024_A-16.zip

Expected output:

1.2G    /data/wwm_uncased_L-24_H-1024_A-16.zip

Check the model load time.

kubectl logs model-serving-85b9587c5b-9dpbc -c serving

Expected output:

Begin loading models at 18:18:25

real    0m2.142s
user    0m0.000s
sys    0m0.755s
Finish loading models at 18:18:27

The real field shows the file replication took 2.142 seconds (0m2.142s) in cache mode. In the Accelerate online applications topic, it took 27.107 seconds (0m27.107s) in no-cache mode. The duration in no-cache mode increases by almost 14 times compared with the duration in cache mode.

Step 5: Clean up

Delete the Deployment and Dataset after testing to release cluster resources.

Delete the Deployment.

kubectl delete deployment model-serving

Delete the Dataset.

kubectl delete dataset serverless-data