When online applications load large models or datasets from Object Storage Service (OSS) on every startup, cold-start latency can reach 27 seconds or more per pod. Fluid with JindoRuntime caches OSS data in cluster memory so repeated reads are served locally, reducing load time to under 3 seconds. This topic walks you through the end-to-end setup for elastic container instances (ECI) and Alibaba Cloud Container Compute Service (ACS) pods running in ACK Pro serverless environments.
Prerequisites
Before you begin, ensure that you have:
-
An ACK Pro cluster running Kubernetes 1.18 or later. See Create an ACK Pro cluster
-
The cloud-native AI suite installed with the ack-fluid component deployed:
-
If you have not installed the suite, enable Fluid acceleration when installing it. See Deploy the cloud-native AI suite
-
If you have already installed the suite, go to the Cloud-native AI Suite page in the ACK console and deploy the ack-fluid component
ImportantIf you have previously installed open source Fluid, uninstall it before deploying the ack-fluid component.
-
-
Virtual nodes deployed in the ACK Pro cluster. See Schedule pods to elastic container instances through virtual nodes
-
A kubectl client connected to the ACK Pro cluster. See Connect to a cluster by using kubectl
-
OSS activated and a bucket created. See Activate OSS and Create buckets
Limitations
This feature is mutually exclusive with the elastic scheduling feature of ACK. See Configure priority-based resource scheduling.
Step 1: Upload a test dataset to OSS
Download the dataset. Download a 2 GB test dataset. This example uses the BERT wwm_uncased_L-24_H-1024_A-16 model.
Upload the dataset to your OSS bucket. Use the ossutil tool to upload the dataset. See Install ossutil.
Step 2: Create a Dataset and JindoRuntime
Deploy the Dataset and JindoRuntime to bind your OSS data to the cluster. The deployment takes a few minutes.
Create the Secret.
Create a file named secret.yaml. The Secret stores the AccessKey ID and AccessKey secret used to access OSS.
apiVersion: v1
kind: Secret
metadata:
name: access-key
stringData:
fs.oss.accessKeyId: ****
fs.oss.accessKeySecret: ****
Deploy the Secret.
kubectl create -f secret.yaml
Create the Dataset and JindoRuntime.
Create a file named dataset.yaml with the following content:
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: serverless-data
spec:
mounts:
- mountPoint: oss://<oss_bucket>/<bucket_dir>
name: demo
path: /
options:
fs.oss.endpoint: <oss_endpoint>
encryptOptions:
- name: fs.oss.accessKeyId
valueFrom:
secretKeyRef:
name: access-key
key: fs.oss.accessKeyId
- name: fs.oss.accessKeySecret
valueFrom:
secretKeyRef:
name: access-key
key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
name: serverless-data
spec:
replicas: 1
tieredstore:
levels:
- mediumtype: MEM
volumeType: emptyDir
path: /dev/shm
quota: 5Gi
high: "0.95"
low: "0.7"
The following table describes key parameters in the configuration.
| Parameter | Description |
|---|---|
mountPoint |
The OSS path to mount, in the format oss://<oss_bucket>/<bucket_dir>. Do not include endpoint information. Example: oss://mybucket/path/to/dir. If you use a single mount target, set path to /. |
fs.oss.endpoint |
The public or private endpoint of the OSS bucket. To use a private endpoint, make sure your ACK cluster is in the same region as the OSS bucket. Example: public endpoint oss-cn-hangzhou.aliyuncs.com, private endpoint oss-cn-hangzhou-internal.aliyuncs.com. |
fs.oss.accessKeyId |
The AccessKey ID used to access the bucket. |
fs.oss.accessKeySecret |
The AccessKey secret used to access the bucket. |
replicas |
The number of workers to create in the JindoFS cluster. |
mediumtype |
The cache medium type. Valid values: HDD, SSD, MEM. See Policy 2: Select proper cache media. |
volumeType |
The volume type for the cache medium. Valid values: emptyDir (default: hostPath). Use emptyDir for memory or local system disk cache to prevent residual data on the node. Use hostPath for local data disks, and set path to the mount path of the disk on the host. See Policy 2: Select proper cache media. |
path |
The cache directory path. Only one path can be specified. |
quota |
The maximum cache size. Example: 100Gi sets the limit to 100 GiB. |
high |
The upper limit of the storage. |
low |
The lower limit of the storage. |
The default dataset access mode is read-only. To use read/write mode, see Configure the access mode of a dataset.
Deploy the Dataset and JindoRuntime.
kubectl create -f dataset.yaml
Verify the Dataset.
kubectl get dataset serverless-data
Expected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
serverless-data 1.16GiB 0.00B 5.00GiB 0.0% Bound 2m8s
PHASE: Bound confirms the Dataset is ready.
Verify the JindoRuntime.
kubectl get jindo serverless-data
Expected output:
NAME MASTER PHASE WORKER PHASE FUSE PHASE AGE
serverless-data Ready Ready Ready 2m51s
FUSE: Ready confirms the JindoRuntime is ready.
(Optional) Step 3: Prefetch data
Prefetching loads OSS data into the cache before any application requests it. If you skip this step, the first access to each file triggers a cache miss and incurs latency similar to no-cache mode (approximately 27 seconds in the example below). Prefetch when you need consistent low-latency access from the first request.
Create the DataLoad.
Create a file named dataload.yaml with the following content:
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
name: serverless-data-warmup
spec:
dataset:
name: serverless-data
namespace: default
loadMetadata: true
Deploy the DataLoad.
kubectl create -f dataload.yaml
Monitor prefetch progress.
kubectl get dataload
Expected output:
NAME DATASET PHASE AGE DURATION
serverless-data-warmup serverless-data Complete 2m49s 45s
PHASE: Complete confirms prefetching finished. DURATION: 45s shows how long the process took.
Verify the cache.
kubectl get dataset
Expected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
serverless-data 1.16GiB 1.16GiB 5.00GiB 100.0% Bound 5m20s
CACHED PERCENTAGE: 100.0% confirms all data is cached and ready for low-latency access.
Step 4: Deploy an application to access OSS data
Deploy a Kubernetes Deployment to test data access accelerated by JindoFS or run machine learning inference workloads. The examples below cover two serverless compute options.
Create the serving.yaml file with the configuration for your compute target.
Deploy as an elastic container instance
Add the alibabacloud.com/fluid-sidecar-target: eci label to the pod to declare that it runs as an elastic container instance (ECI). When the pod is created, Fluid automatically converts it to an ECI-compatible format — no manual intervention required.
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-serving
spec:
selector:
matchLabels:
app: model-serving
template:
metadata:
labels:
app: model-serving
alibabacloud.com/fluid-sidecar-target: eci
alibabacloud.com/eci: "true"
spec:
containers:
- image: fluidcloudnative/serving
name: serving
ports:
- name: http1
containerPort: 8080
env:
- name: TARGET
value: "World"
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: serverless-data
Deploy as an Alibaba Cloud Container Compute Service pod
-
Accessing cached Fluid data in Alibaba Cloud Container Compute Service (ACS) application containers requires ack-fluid v1.0.11 or later.
-
This relies on advanced ACS pod features. Submit a support ticket to enable this feature before proceeding.
Add the alibabacloud.com/fluid-sidecar-target: acs label to declare that the pod uses ACS compute resources. Fluid automatically adapts the pod to run in the ACS environment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-serving
spec:
selector:
matchLabels:
app: model-serving
template:
metadata:
labels:
app: model-serving
alibabacloud.com/fluid-sidecar-target: acs
alibabacloud.com/acs: "true"
alibabacloud.com/compute-qos: default
alibabacloud.com/compute-class: general-purpose
spec:
containers:
- image: fluidcloudnative/serving
name: serving
ports:
- name: http1
containerPort: 8080
env:
- name: TARGET
value: "World"
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: serverless-data
Deploy the Deployment.
kubectl create -f serving.yaml
Verify data access.
Log in to a running container:
kubectl exec -it model-serving-85b645b5d5-2trnf -c serving -- bash
Check the size of the cached file:
bash-4.4# du -sh /data/wwm_uncased_L-24_H-1024_A-16.zip
Expected output:
1.2G /data/wwm_uncased_L-24_H-1024_A-16.zip
Check the model load time.
kubectl logs model-serving-85b9587c5b-9dpbc -c serving
Expected output:
Begin loading models at 18:18:25
real 0m2.142s
user 0m0.000s
sys 0m0.755s
Finish loading models at 18:18:27
The real field shows the file replication took 2.142 seconds (0m2.142s) in cache mode. In the Accelerate online applications topic, it took 27.107 seconds (0m27.107s) in no-cache mode. The duration in no-cache mode increases by almost 14 times compared with the duration in cache mode.
Step 5: Clean up
Delete the Deployment and Dataset after testing to release cluster resources.
Delete the Deployment.
kubectl delete deployment model-serving
Delete the Dataset.
kubectl delete dataset serverless-data