All Products
Search
Document Center

Container Service for Kubernetes:Share a Dataset across namespaces

Last Updated:Mar 26, 2026

When multiple teams run AI or ML workloads in separate Kubernetes namespaces, each team creating its own cache wastes storage and slows down data access. Fluid lets you cache a dataset once in a source namespace and share that cache with any number of reference namespaces — no duplicate caches, no extra runtime overhead.

The setup uses two namespace roles:

  • Source namespace (share): holds the Dataset and a cache runtime (JindoRuntime or JuiceFSRuntime). This is where the actual data cache lives.

  • Reference namespace (ref): holds a reference Dataset that points to the source Dataset using a dataset:// mount point. Pods in this namespace read from the shared cache without running their own cache runtime.

How it works

Fluid uses ThinRuntime to link a Dataset in one namespace to a Dataset in another. When a pod in the reference namespace reads data, requests are routed to the cache runtime in the source namespace. No additional cache runtime is created in the reference namespace.

Prerequisites

Before you begin, ensure that you have:

  • An ACK Pro cluster running Kubernetes 1.18 or later, with a non-ContainerOS node pool (the ack-fluid component does not support ContainerOS). For more information, see Create an ACK Pro cluster.

  • The cloud-native AI suite installed with the ack-fluid component deployed:

    • If you have not installed the cloud-native AI suite, enable Fluid acceleration when you install it. For more information, see Deploy the cloud-native AI suite.

    • If the cloud-native AI suite is already installed, go to the Cloud-native AI Suite page in the ACK console and deploy the ack-fluid component.

    • If open-source Fluid is already installed, uninstall it before deploying the ack-fluid component.

  • A kubectl client connected to your ACK Pro cluster. For more information, see Connect to a cluster by using kubectl.

Step 1: Upload the test dataset to OSS

  1. Download the test dataset (approximately 2 GB).

  2. Upload the dataset to your Object Storage Service (OSS) bucket using ossutil. For more information, see Install ossutil.

Step 2: Create a shared dataset and runtime

Create a namespace named share to hold the shared Dataset and runtime. Choose the runtime type that matches your storage setup.

JindoRuntime

  1. Create the share namespace:

    kubectl create ns share
  2. Create a Secret to store the AccessKey pair for your OSS bucket:

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: dataset-secret
      namespace: share
    stringData:
      fs.oss.accessKeyId: <YourAccessKey ID>
      fs.oss.accessKeySecret: <YourAccessKey Secret>
    EOF

    Replace <YourAccessKey ID> and <YourAccessKey Secret> with your AccessKey ID and AccessKey secret. For more information, see Obtain an AccessKey pair.

  3. Create a file named shared-dataset.yaml with the following content:

    # Dataset: describes the data stored in OSS (the underlying file system, UFS).
    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: shared-dataset
      namespace: share
    spec:
      mounts:
      - mountPoint: oss://<oss_bucket>/<bucket_dir> # Path to the data in your OSS bucket.
        options:
          fs.oss.endpoint: <oss_endpoint> # Endpoint of your OSS bucket.
        name: hadoop
        path: "/"
        encryptOptions:
          - name: fs.oss.accessKeyId
            valueFrom:
              secretKeyRef:
                name: dataset-secret
                key: fs.oss.accessKeyId
          - name: fs.oss.accessKeySecret
            valueFrom:
              secretKeyRef:
                name: dataset-secret
                key: fs.oss.accessKeySecret
    ---
    # JindoRuntime: enables JindoFS-based data caching in the cluster.
    apiVersion: data.fluid.io/v1alpha1
    kind: JindoRuntime
    metadata:
      name: shared-dataset
      namespace: share
    spec:
      replicas: 1
      tieredstore:
        levels:
          - mediumtype: MEM
            path: /dev/shm
            quota: 4Gi
            high: "0.95"
            low: "0.7"

    For more information about configuring a Dataset and JindoRuntime, see Use JindoFS to accelerate access to OSS.

  4. Apply the configuration:

    kubectl apply -f shared-dataset.yaml

    Expected output:

    dataset.data.fluid.io/shared-dataset created
    jindoruntime.data.fluid.io/shared-dataset created
  5. Wait a few minutes, then verify that the Dataset is bound and the JindoRuntime is ready:

    kubectl get dataset,jindoruntime -n share

    Expected output:

    NAME                                   UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    dataset.data.fluid.io/shared-dataset   1.16GiB          0.00B    4.00GiB          0.0%                Bound   4m1s
    
    NAME                                        MASTER PHASE   WORKER PHASE   FUSE PHASE   AGE
    jindoruntime.data.fluid.io/shared-dataset   Ready          Ready          Ready        15m

    The Dataset is bound and the JindoRuntime is ready when all phases show Ready.

JuiceFSRuntime

  1. Create the share namespace:

    kubectl create ns share
  2. Create a Secret to store the credentials for your OSS bucket and JuiceFS volume:

    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: dataset-secret
      namespace: share
    type: Opaque
    stringData:
      token: <JUICEFS_VOLUME_TOKEN>
      access-key: <OSS_ACCESS_KEY>
      secret-key: <OSS_SECRET_KEY>
    EOF

    Replace <OSS_ACCESS_KEY> and <OSS_SECRET_KEY> with your AccessKey ID and AccessKey secret. For more information, see Obtain an AccessKey pair.

  3. Create a file named shared-dataset.yaml with the following content:

    # Dataset: describes the data stored in OSS (the underlying file system, UFS).
    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: shared-dataset
      namespace: share
    spec:
      accessModes: ["ReadOnlyMany"]
      sharedEncryptOptions:
      - name: access-key
        valueFrom:
          secretKeyRef:
            name: dataset-secret
            key: access-key
      - name: secret-key
        valueFrom:
          secretKeyRef:
            name: dataset-secret
            key: secret-key
      - name: token
        valueFrom:
          secretKeyRef:
            name: dataset-secret
            key: token
      mounts:
      - name: <JUICEFS_VOLUME_NAME>
        mountPoint: juicefs:/// # Mount point of the JuiceFS file system.
        options:
          bucket: https://<OSS_BUCKET_NAME>.oss-<REGION_ID>.aliyuncs.com # Example: https://mybucket.oss-cn-beijing-internal.aliyuncs.com
    ---
    # JuiceFSRuntime: enables JuiceFS-based data caching in the cluster.
    apiVersion: data.fluid.io/v1alpha1
    kind: JuiceFSRuntime
    metadata:
      name: shared-dataset
      namespace: share
    spec:
      replicas: 1
      tieredstore:
        levels:
        - mediumtype: MEM
          path: /dev/shm
          quota: 1Gi
          high: "0.95"
          low: "0.7"
  4. Apply the configuration:

    kubectl apply -f shared-dataset.yaml

    Expected output:

    dataset.data.fluid.io/shared-dataset created
    juicefsruntime.data.fluid.io/shared-dataset created
  5. Wait a few minutes, then verify that the Dataset is bound:

    kubectl get dataset,juicefsruntime -n share

    Expected output:

    NAME                                   UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
    dataset.data.fluid.io/shared-dataset   2.32GiB          0.00B    4.00GiB          0.0%                Bound   3d16h
    
    NAME                                          WORKER PHASE   FUSE PHASE   AGE
    juicefsruntime.data.fluid.io/shared-dataset                               3m50s

Step 3: Create a reference dataset and a pod

  1. Create the ref namespace:

    kubectl create ns ref
  2. Create a file named ref-dataset.yaml with the following content:

    • dataset:// — the protocol prefix, indicating this Dataset references another Dataset.

    • share — the namespace where the source Dataset lives.

    • shared-dataset — the name of the source Dataset.

    Important

    The mountPoint value must use the dataset:// protocol prefix. Any other format causes dataset creation to fail, and fields in the spec section have no effect.

    apiVersion: data.fluid.io/v1alpha1
    kind: Dataset
    metadata:
      name: ref-dataset
      namespace: ref
    spec:
      mounts:
      - mountPoint: dataset://share/shared-dataset

    The mountPoint value follows the format dataset://<namespace>/<dataset-name>:

  3. Apply the reference Dataset:

    kubectl apply -f ref-dataset.yaml
  4. Create a file named app.yaml with the following content. This creates a pod in the ref namespace that mounts the reference Dataset at /data.

    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
      namespace: ref
    spec:
      containers:
      - name: nginx
        image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
        command:
        - "bash"
        - "-c"
        - "sleep inf"
        volumeMounts:
        - mountPath: /data
          name: ref-data
      volumes:
      - name: ref-data
        persistentVolumeClaim:
          claimName: ref-dataset
  5. Deploy the pod:

    kubectl apply -f app.yaml
  6. Verify that the pod is running:

    kubectl get pods -n ref -o wide

    The pod is ready when its status shows Running.

Step 4: Test data sharing and caching

  1. Check the pods in both namespaces:

    kubectl get pods -n share
    kubectl get pods -n ref

    Expected output:

    # Pods in the share namespace
    NAME                                READY   STATUS    RESTARTS   AGE
    shared-dataset-jindofs-fuse-ftkb5   1/1     Running   0          44s
    shared-dataset-jindofs-master-0     1/1     Running   0          9m13s
    shared-dataset-jindofs-worker-0     1/1     Running   0          9m13s
    
    # Pods in the ref namespace
    NAME    READY   STATUS    RESTARTS   AGE
    nginx   1/1     Running   0          118s

    Three cache-related pods run in the share namespace. The ref namespace has only the nginx pod — no cache runtime pods are created there.

  2. Log in to the nginx pod:

    kubectl exec nginx -n ref -it -- sh
  3. Test data sharing by querying the file in the /data directory:

    du -sh /data/wwm_uncased_L-24_H-1024_A-16.zip

    Expected output:

    1.3G    /data/wwm_uncased_L-24_H-1024_A-16.zip

    The nginx pod in the ref namespace can access the file stored in the share namespace.

  4. Test data caching by reading the file twice:

    The following latency values are for reference only. Actual results vary based on your environment.
    # First read — data is fetched from OSS and written to cache
    time cat /data/wwm_uncased_L-24_H-1024_A-16.zip > /dev/null
    real    0m1.166s
    user    0m0.007s
    sys     0m1.154s
    
    # Second read — data is served from cache
    time cat /data/wwm_uncased_L-24_H-1024_A-16.zip > /dev/null
    real    0m0.289s
    user    0m0.011s
    sys     0m0.274s

    The second read completes in 0.289 seconds compared to 1.166 seconds for the first read, confirming that the file is cached after the first access.