In serverless (ECI) environments, pods are ephemeral and have no persistent workers to pre-warm a local cache. Fluid's cache-free mode addresses this by routing data access directly through JindoRuntime to Object Storage Service (OSS), without requiring dedicated master or worker nodes. This guide walks you through running an Argo Workflow on ACK virtual nodes with OSS data mounted via Fluid in cache-free mode.
Prerequisites
Before you begin, ensure that you have:
-
Argo Workflows installed. Use either the open-source Argo quick-start or the ack-workflow component.
-
ACK virtual nodes deployed. See Schedule pods to ECI.
-
A Container Service for Kubernetes (ACK) Pro cluster running a non-ContainerOS node image with Kubernetes 1.18 or later. See Create an ACK Pro cluster.
Importantack-fluid is not supported on ContainerOS.
-
The ack-fluid component deployed:
-
If the cloud-native AI suite is not installed: install the suite and deploy Fluid Data Acceleration. See Install the cloud-native AI suite.
-
If the cloud-native AI suite is already installed: go to the Cloud-native AI Suite page in the Container Service Management Console and deploy ack-fluid from there.
ImportantIf open-source Fluid is already installed, uninstall it before deploying ack-fluid.
-
-
A kubectl connection to the cluster. See Connect to a cluster using kubectl.
-
Alibaba Cloud OSS activated and a bucket created. See Activate OSS and Create a bucket.
Limitations
This feature is mutually exclusive with the elastic scheduling feature of ACK. For details, see Configure priority-based resource scheduling.
Step 1: Upload the test dataset to the OSS bucket
-
Download the test dataset (approximately 2 GB).
-
Upload it to your OSS bucket using ossutil. See Install ossutil.
Step 2: Create a Dataset and JindoRuntime
Deploy a Fluid Dataset and JindoRuntime configured for cache-free mode. This takes a few minutes.
-
Create
secret.yamlwith your OSS credentials:apiVersion: v1 kind: Secret metadata: name: access-key stringData: fs.oss.accessKeyId: **** fs.oss.accessKeySecret: **** -
Deploy the Secret:
kubectl create -f secret.yaml -
Create
resource.yamlwith the Dataset and JindoRuntime definitions:apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: serverless-data spec: mounts: - mountPoint: oss://<your-bucket>/ # format: oss://<bucket-name>/<optional-path> name: demo path: / options: fs.oss.endpoint: oss-cn-shanghai.aliyuncs.com encryptOptions: - name: fs.oss.accessKeyId valueFrom: secretKeyRef: name: access-key key: fs.oss.accessKeyId - name: fs.oss.accessKeySecret valueFrom: secretKeyRef: name: access-key key: fs.oss.accessKeySecret accessModes: - ReadWriteMany --- apiVersion: data.fluid.io/v1alpha1 kind: JindoRuntime metadata: name: serverless-data spec: master: disabled: true # cache-free mode: no master node worker: disabled: true # cache-free mode: no worker nodeKey parameters:
Parameter Description mountPointPath to the OSS bucket, in the format oss://<bucket>/<path>. Do not include the endpoint here; specify it separately underoptions.fs.oss.endpointPublic or private endpoint of the OSS bucket. You can use the private endpoint to enhance data security, but make sure that your ACK cluster is deployed in the same region as OSS. For example, if your OSS bucket is in the China (Hangzhou) region, the public endpoint is oss-cn-hangzhou.aliyuncs.comand the private endpoint isoss-cn-hangzhou-internal.aliyuncs.com.fs.oss.accessKeyIdAccessKey ID used to access the bucket. fs.oss.accessKeySecretAccessKey secret used to access the bucket. accessModesAccess mode for the volume. Valid values: ReadWriteOnce,ReadOnlyMany,ReadWriteMany,ReadWriteOncePod. Default:ReadOnlyMany.master.disabled/worker.disabledSetting both to trueenables cache-free mode. JindoRuntime forwards requests directly to OSS without caching data locally. -
Deploy the Dataset and JindoRuntime:
kubectl create -f resource.yaml -
Verify the Dataset is bound:
kubectl get dataset serverless-dataExpected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE serverless-data Bound 1dThe Dataset is ready when
PHASEshowsBound. -
Verify the JindoRuntime is ready:
kubectl get jindo serverless-dataExpected output:
NAME MASTER PHASE WORKER PHASE FUSE PHASE AGE serverless-data Ready 3m41sThe JindoRuntime is ready when
FUSE PHASEshowsReady.
Step 3: Create an Argo Workflow to access OSS data
-
Create
workflow.yaml:apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: serverless-workflow- spec: entrypoint: serverless-workflow-example volumes: - name: datadir persistentVolumeClaim: claimName: serverless-data # references the Dataset created in Step 2 templates: - name: serverless-workflow-example steps: - - name: copy template: copy-files - - name: check template: check-files - name: copy-files metadata: labels: alibabacloud.com/fluid-sidecar-target: eci # injects the Fluid sidecar into the ECI pod alibabacloud.com/eci: "true" # schedules this pod to an ECI virtual node annotations: k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge # ECI instance type container: image: debian:buster command: [bash, -c] args: ["time cp -r /data/ /tmp"] volumeMounts: - name: datadir mountPath: /data - name: check-files metadata: labels: alibabacloud.com/fluid-sidecar-target: eci # injects the Fluid sidecar into the ECI pod alibabacloud.com/eci: "true" # schedules this pod to an ECI virtual node annotations: k8s.aliyun.com/eci-use-specs: ecs.g7.4xlarge # ECI instance type container: image: debian:buster command: [bash, -c] args: ["du -sh /data; md5sum /data/*"] volumeMounts: - name: datadir mountPath: /data -
Submit the Workflow:
kubectl create -f workflow.yaml -
Check the copy time from the
copy-filesstep log:kubectl logs serverless-workflow-85sbr-4093682611Expected output:
real 0m24.966s user 0m0.009s sys 0m0.677sThe
realvalue is the total copy time. Actual time varies with network latency and bandwidth. To reduce copy time through caching, see Use cache mode to accelerate data access for Argo jobs. -
Verify data integrity by comparing MD5 checksums.
-
Get the MD5 value from the Fluid-mounted file:
kubectl logs serverless-workflow-85sbr-1882013783Expected output:
1.2G /data 871734851bf7d8d2d1193dc5f1f692e6 /data/wwm_uncased_L-24_H-1024_A-16.zip -
Get the MD5 value of your local copy:
md5sum ./wwm_uncased_L-24_H-1024_A-16.zipExpected output:
871734851bf7d8d2d1193dc5f1f692e6 ./wwm_uncased_L-24_H-1024_A-16.zip
Matching checksums confirm that Fluid correctly served the data from OSS.
-
Step 4: Clean up
When you no longer need the environment, delete the Workflow and Dataset:
kubectl delete workflow serverless-workflow-85sbr
kubectl delete dataset serverless-data
What's next
-
To reduce data access latency by caching OSS data on cluster nodes, see Use cache mode to accelerate data access for Argo jobs.