When multiple teams run AI or ML workloads in separate Kubernetes namespaces, each team creating its own cache wastes storage and slows down data access. Fluid lets you cache a dataset once in a source namespace and share that cache with any number of reference namespaces — no duplicate caches, no extra runtime overhead.
The setup uses two namespace roles:
-
Source namespace (
share): holds the Dataset and a cache runtime (JindoRuntime or JuiceFSRuntime). This is where the actual data cache lives. -
Reference namespace (
ref): holds a reference Dataset that points to the source Dataset using adataset://mount point. Pods in this namespace read from the shared cache without running their own cache runtime.
How it works
Fluid uses ThinRuntime to link a Dataset in one namespace to a Dataset in another. When a pod in the reference namespace reads data, requests are routed to the cache runtime in the source namespace. No additional cache runtime is created in the reference namespace.
Prerequisites
Before you begin, ensure that you have:
-
An ACK Pro cluster running Kubernetes 1.18 or later, with a non-ContainerOS node pool (the ack-fluid component does not support ContainerOS). For more information, see Create an ACK Pro cluster.
-
The cloud-native AI suite installed with the ack-fluid component deployed:
-
If you have not installed the cloud-native AI suite, enable Fluid acceleration when you install it. For more information, see Deploy the cloud-native AI suite.
-
If the cloud-native AI suite is already installed, go to the Cloud-native AI Suite page in the ACK console and deploy the ack-fluid component.
-
If open-source Fluid is already installed, uninstall it before deploying the ack-fluid component.
-
-
A kubectl client connected to your ACK Pro cluster. For more information, see Connect to a cluster by using kubectl.
Step 1: Upload the test dataset to OSS
-
Download the test dataset (approximately 2 GB).
-
Upload the dataset to your Object Storage Service (OSS) bucket using ossutil. For more information, see Install ossutil.
Step 2: Create a shared dataset and runtime
Create a namespace named share to hold the shared Dataset and runtime. Choose the runtime type that matches your storage setup.
JindoRuntime
-
Create the
sharenamespace:kubectl create ns share -
Create a Secret to store the AccessKey pair for your OSS bucket:
kubectl apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: dataset-secret namespace: share stringData: fs.oss.accessKeyId: <YourAccessKey ID> fs.oss.accessKeySecret: <YourAccessKey Secret> EOFReplace
<YourAccessKey ID>and<YourAccessKey Secret>with your AccessKey ID and AccessKey secret. For more information, see Obtain an AccessKey pair. -
Create a file named
shared-dataset.yamlwith the following content:# Dataset: describes the data stored in OSS (the underlying file system, UFS). apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: shared-dataset namespace: share spec: mounts: - mountPoint: oss://<oss_bucket>/<bucket_dir> # Path to the data in your OSS bucket. options: fs.oss.endpoint: <oss_endpoint> # Endpoint of your OSS bucket. name: hadoop path: "/" encryptOptions: - name: fs.oss.accessKeyId valueFrom: secretKeyRef: name: dataset-secret key: fs.oss.accessKeyId - name: fs.oss.accessKeySecret valueFrom: secretKeyRef: name: dataset-secret key: fs.oss.accessKeySecret --- # JindoRuntime: enables JindoFS-based data caching in the cluster. apiVersion: data.fluid.io/v1alpha1 kind: JindoRuntime metadata: name: shared-dataset namespace: share spec: replicas: 1 tieredstore: levels: - mediumtype: MEM path: /dev/shm quota: 4Gi high: "0.95" low: "0.7"For more information about configuring a Dataset and JindoRuntime, see Use JindoFS to accelerate access to OSS.
-
Apply the configuration:
kubectl apply -f shared-dataset.yamlExpected output:
dataset.data.fluid.io/shared-dataset created jindoruntime.data.fluid.io/shared-dataset created -
Wait a few minutes, then verify that the Dataset is bound and the JindoRuntime is ready:
kubectl get dataset,jindoruntime -n shareExpected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE dataset.data.fluid.io/shared-dataset 1.16GiB 0.00B 4.00GiB 0.0% Bound 4m1s NAME MASTER PHASE WORKER PHASE FUSE PHASE AGE jindoruntime.data.fluid.io/shared-dataset Ready Ready Ready 15mThe Dataset is bound and the JindoRuntime is ready when all phases show
Ready.
JuiceFSRuntime
-
Create the
sharenamespace:kubectl create ns share -
Create a Secret to store the credentials for your OSS bucket and JuiceFS volume:
kubectl apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: dataset-secret namespace: share type: Opaque stringData: token: <JUICEFS_VOLUME_TOKEN> access-key: <OSS_ACCESS_KEY> secret-key: <OSS_SECRET_KEY> EOFReplace
<OSS_ACCESS_KEY>and<OSS_SECRET_KEY>with your AccessKey ID and AccessKey secret. For more information, see Obtain an AccessKey pair. -
Create a file named
shared-dataset.yamlwith the following content:# Dataset: describes the data stored in OSS (the underlying file system, UFS). apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: shared-dataset namespace: share spec: accessModes: ["ReadOnlyMany"] sharedEncryptOptions: - name: access-key valueFrom: secretKeyRef: name: dataset-secret key: access-key - name: secret-key valueFrom: secretKeyRef: name: dataset-secret key: secret-key - name: token valueFrom: secretKeyRef: name: dataset-secret key: token mounts: - name: <JUICEFS_VOLUME_NAME> mountPoint: juicefs:/// # Mount point of the JuiceFS file system. options: bucket: https://<OSS_BUCKET_NAME>.oss-<REGION_ID>.aliyuncs.com # Example: https://mybucket.oss-cn-beijing-internal.aliyuncs.com --- # JuiceFSRuntime: enables JuiceFS-based data caching in the cluster. apiVersion: data.fluid.io/v1alpha1 kind: JuiceFSRuntime metadata: name: shared-dataset namespace: share spec: replicas: 1 tieredstore: levels: - mediumtype: MEM path: /dev/shm quota: 1Gi high: "0.95" low: "0.7" -
Apply the configuration:
kubectl apply -f shared-dataset.yamlExpected output:
dataset.data.fluid.io/shared-dataset created juicefsruntime.data.fluid.io/shared-dataset created -
Wait a few minutes, then verify that the Dataset is bound:
kubectl get dataset,juicefsruntime -n shareExpected output:
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE dataset.data.fluid.io/shared-dataset 2.32GiB 0.00B 4.00GiB 0.0% Bound 3d16h NAME WORKER PHASE FUSE PHASE AGE juicefsruntime.data.fluid.io/shared-dataset 3m50s
Step 3: Create a reference dataset and a pod
-
Create the
refnamespace:kubectl create ns ref -
Create a file named
ref-dataset.yamlwith the following content:-
dataset://— the protocol prefix, indicating this Dataset references another Dataset. -
share— the namespace where the source Dataset lives. -
shared-dataset— the name of the source Dataset.
ImportantThe
mountPointvalue must use thedataset://protocol prefix. Any other format causes dataset creation to fail, and fields in thespecsection have no effect.apiVersion: data.fluid.io/v1alpha1 kind: Dataset metadata: name: ref-dataset namespace: ref spec: mounts: - mountPoint: dataset://share/shared-datasetThe
mountPointvalue follows the formatdataset://<namespace>/<dataset-name>: -
-
Apply the reference Dataset:
kubectl apply -f ref-dataset.yaml -
Create a file named
app.yamlwith the following content. This creates a pod in therefnamespace that mounts the reference Dataset at/data.apiVersion: v1 kind: Pod metadata: name: nginx namespace: ref spec: containers: - name: nginx image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6 command: - "bash" - "-c" - "sleep inf" volumeMounts: - mountPath: /data name: ref-data volumes: - name: ref-data persistentVolumeClaim: claimName: ref-dataset -
Deploy the pod:
kubectl apply -f app.yaml -
Verify that the pod is running:
kubectl get pods -n ref -o wideThe pod is ready when its status shows
Running.
Step 4: Test data sharing and caching
-
Check the pods in both namespaces:
kubectl get pods -n share kubectl get pods -n refExpected output:
# Pods in the share namespace NAME READY STATUS RESTARTS AGE shared-dataset-jindofs-fuse-ftkb5 1/1 Running 0 44s shared-dataset-jindofs-master-0 1/1 Running 0 9m13s shared-dataset-jindofs-worker-0 1/1 Running 0 9m13s # Pods in the ref namespace NAME READY STATUS RESTARTS AGE nginx 1/1 Running 0 118sThree cache-related pods run in the
sharenamespace. Therefnamespace has only thenginxpod — no cache runtime pods are created there. -
Log in to the
nginxpod:kubectl exec nginx -n ref -it -- sh -
Test data sharing by querying the file in the
/datadirectory:du -sh /data/wwm_uncased_L-24_H-1024_A-16.zipExpected output:
1.3G /data/wwm_uncased_L-24_H-1024_A-16.zipThe
nginxpod in therefnamespace can access the file stored in thesharenamespace. -
Test data caching by reading the file twice:
The following latency values are for reference only. Actual results vary based on your environment.
# First read — data is fetched from OSS and written to cache time cat /data/wwm_uncased_L-24_H-1024_A-16.zip > /dev/null real 0m1.166s user 0m0.007s sys 0m1.154s # Second read — data is served from cache time cat /data/wwm_uncased_L-24_H-1024_A-16.zip > /dev/null real 0m0.289s user 0m0.011s sys 0m0.274sThe second read completes in 0.289 seconds compared to 1.166 seconds for the first read, confirming that the file is cached after the first access.