Fluid uses JindoRuntime to cache Object Storage Service (OSS) data locally within your ACK Pro cluster, so Argo workflow containers read from in-cluster memory instead of fetching from OSS on every access. In cache mode, the first workflow run prefetches data into a JindoFS memory cache. Subsequent runs read directly from that cache—cutting file-copy time from 24.966 seconds to 1.948 seconds (13x faster).
This guide walks you through setting up cache-mode acceleration for Argo workflows running as elastic container instances (ECI) or Alibaba Cloud Container Compute Service (ACS) pods.
Considerations
Before you start, be aware of the following constraints:
-
This feature is mutually exclusive with the elastic scheduling feature of ACK. See Configure priority-based resource scheduling.
-
ack-fluid does not support ContainerOS nodes. Use non-containerOS node pools.
-
If you have a previous open-source Fluid installation, uninstall it before deploying ack-fluid. The open-source and ACK versions cannot coexist.
-
ack-ai-pipeline is incompatible with Argo Workflows. Deselect ack-ai-pipeline when deploying the Cloud-native AI Suite.
-
ACS pods require ack-fluid v1.0.11 or later, and the ACS pod advanced features must be enabled via a support ticket before deployment.
Prerequisites
Before you begin, ensure that you have:
-
Argo Workflows installed via the Argo quick-start guide or the ack-workflow component. See Argo Workflows.
-
Virtual nodes deployed in your ACK Pro cluster. See Schedule pods to elastic container instances through virtual nodes.
-
An ACK Pro cluster running on non-containerOS nodes with Kubernetes 1.18 or later. See Create an ACK Pro cluster.
-
The ack-fluid component deployed (see Deploy ack-fluid below).
-
kubectl connected to your ACK Pro cluster. See Connect to a cluster by using kubectl.
-
An OSS bucket with data to accelerate. See Activate OSS and Create buckets.
Deploy ack-fluid
If you have not installed the Cloud-native AI Suite, install it and enable Fluid under Data Access Acceleration. See Deploy Cloud-native AI Suite.
If the Cloud-native AI Suite is already installed, go to Cloud-native AI Component Set in the ACK console and deploy the ack-fluid component.
Step 1: Upload the test dataset to OSS
Create a 2 GB test dataset and upload it to your OSS bucket. This guide uses the BERT wwm_uncased_L-24_H-1024_A-16 dataset as an example.
Upload the dataset using ossutil. See Install ossutil.
Step 2: Create a Dataset and JindoRuntime
The Dataset tells Fluid where your OSS data lives. The JindoRuntime manages a JindoFS cache cluster that stores that data in local memory. Together they make the data available to workflow pods as a standard Kubernetes PersistentVolumeClaim (PVC).
-
Create
secret.yamlwith your OSS credentials:apiVersion: v1 kind: Secret metadata: name: access-key stringData: fs.oss.accessKeyId: <your-access-key-id> fs.oss.accessKeySecret: <your-access-key-secret> -
Deploy the Secret:
kubectl create -f secret.yaml -
Create
dataset.yaml. Replace<oss_bucket>,<bucket_dir>, and<oss_endpoint>with your values:ImportantThe default access mode is read-only. To use read/write mode, see Configure the access mode of a dataset.
Parameter Description mountPointOSS path to mount, in the format oss://<oss_bucket>/<bucket_dir>. Do not include the endpoint. Example:oss://mybucket/path/to/dir. Setpathto/when using a single mount target.fs.oss.endpointPublic or private endpoint of your OSS bucket. Use the private endpoint for better security—make sure your ACK cluster is in the same region as your OSS bucket. Example private endpoint: oss-cn-hangzhou-internal.aliyuncs.com.fs.oss.accessKeyIdAccessKey ID for accessing the bucket. fs.oss.accessKeySecretAccessKey secret for accessing the bucket. replicasNumber of JindoFS worker nodes to create. mediumtypeCache storage medium: HDD,SSD, orMEM. See Policy 2: Select proper cache media.volumeTypeVolume type for the cache medium: emptyDir(recommended for memory or local system disks) orhostPath(for dedicated data disks). Default:hostPath. See Policy 2: Select proper cache media.pathCache storage path on the node. Supports a single path only. quotaMaximum cache size. Example: 5Gilimits the cache to 5 GiB.highCache eviction upper threshold (fraction of quota).lowCache eviction lower threshold (fraction of quota).apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: serverless-data spec: mounts: - mountPoint: oss://<oss_bucket>/<bucket_dir> name: demo path: / options: fs.oss.endpoint: <oss_endpoint> encryptOptions: - name: fs.oss.accessKeyId valueFrom: secretKeyRef: name: access-key key: fs.oss.accessKeyId - name: fs.oss.accessKeySecret valueFrom: secretKeyRef: name: access-key key: fs.oss.accessKeySecret --- apiVersion: data.fluid.io/v1alpha1 kind: JindoRuntime metadata: name: serverless-data spec: replicas: 1 tieredstore: levels: - mediumtype: MEM volumeType: emptyDir path: /dev/shm quota: 5Gi high: "0.95" low: "0.7"Key parameters:
-
Deploy the Dataset and JindoRuntime:
kubectl create -f dataset.yaml -
Verify the Dataset is bound:
kubectl get dataset serverless-dataExpected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE serverless-data 1.16GiB 0.00B 5.00GiB 0.0% Bound 2m8sPHASE: Boundconfirms the Dataset is ready. -
Verify the JindoRuntime is ready:
kubectl get jindo serverless-dataExpected output:
NAME MASTER PHASE WORKER PHASE FUSE PHASE AGE serverless-data Ready Ready Ready 2m51sFUSE PHASE: Readyconfirms the JindoRuntime is running.
(Optional) Step 3: Prefetch data
Prefetching loads OSS data into the JindoFS cache before your workflow runs, so the first workflow execution reads from cache rather than OSS. Skip this step if you don't need to optimize first-run latency.
-
Create
dataload.yaml:apiVersion: data.fluid.io/v1alpha1 kind: DataLoad metadata: name: serverless-data-warmup spec: dataset: name: serverless-data namespace: default loadMetadata: true -
Start the data prefetch:
kubectl create -f dataload.yaml -
Monitor prefetch progress:
kubectl get dataloadWait until
PHASEshowsComplete:NAME DATASET PHASE AGE DURATION serverless-data-warmup serverless-data Complete 2m49s 45s -
Confirm the cache is fully populated before proceeding:
kubectl get datasetExpected output after prefetch completes:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE serverless-data 1.16GiB 1.16GiB 5.00GiB 100.0% Bound 5m20sProceed to the next step only after
CACHED PERCENTAGEshows100.0%.
Step 4: Deploy a workflow to access OSS data
Create workflow.yaml based on your compute target—ECI or ACS. Both examples mount serverless-data as a volume and use Fluid to serve cached OSS data to the container.
Deploy on ECI (elastic container instances)
Add the alibabacloud.com/fluid-sidecar-target: eci label to the pod. Fluid automatically adapts the pod spec to run as an elastic container instance—no manual changes needed.
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-serving
spec:
selector:
matchLabels:
app: model-serving
template:
metadata:
labels:
app: model-serving
alibabacloud.com/fluid-sidecar-target: eci
alibabacloud.com/eci: "true"
spec:
containers:
- image: fluidcloudnative/serving
name: serving
ports:
- name: http1
containerPort: 8080
env:
- name: TARGET
value: "World"
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: serverless-data
Deploy on ACS (Alibaba Cloud Container Compute Service)
ACS pods require ack-fluid v1.0.11 or later. Accessing cached Fluid data in ACS containers relies on advanced ACS pod features—submit a support ticket to enable this feature before deploying.
Add the alibabacloud.com/fluid-sidecar-target: acs label to declare ACS compute resources. Fluid adapts the pod for the ACS environment automatically.
apiVersion: apps/v1
kind: Deployment
metadata:
name: model-serving
spec:
selector:
matchLabels:
app: model-serving
template:
metadata:
labels:
app: model-serving
alibabacloud.com/fluid-sidecar-target: acs
alibabacloud.com/acs: "true"
alibabacloud.com/compute-qos: default
alibabacloud.com/compute-class: general-purpose
spec:
containers:
- image: fluidcloudnative/serving
name: serving
ports:
- name: http1
containerPort: 8080
env:
- name: TARGET
value: "World"
volumeMounts:
- mountPath: /data
name: data
volumes:
- name: data
persistentVolumeClaim:
claimName: serverless-data
Run the workflow
-
Deploy the workflow:
kubectl create -f workflow.yaml -
Check the container logs to confirm cache acceleration is working:
kubectl logs serverless-workflow-g5knn-3271897614Expected output:
real 0m1.948s user 0m0.000s sys 0m0.668sThe
realtime of 1.948 seconds shows the file copy completed from cache. Without caching, the same operation takes 24.966 seconds—13x slower. See Accelerate Argo workflows (no cache mode) for comparison.
Step 5: Clean up
After testing, delete the workflow and dataset to free resources.
-
Delete the workflow:
kubectl delete workflow serverless-workflow-g5knn -
Delete the dataset:
kubectl delete dataset serverless-data
What's next
-
Accelerate Argo workflows (no cache mode) — compare performance without caching
-
Configure the access mode of a dataset — enable read/write mode for the Dataset
-
Policy 2: Select proper cache media — choose between HDD, SSD, and MEM cache tiers