Use Fluid to schedule pods based on cache affinity to improve data access efficiency - Container Service for Kubernetes

Fluid's mutating webhook injects affinity rules into pod specs so that kube-scheduler places application pods on the nodes that hold cached data—or on nodes in the same zone or region as the cache when no cache-holding node is available. With Fluid, you can:

Schedule pods to the node holding cached data (node affinity, weight 100)
Schedule pods to any node in the same zone as the cache (zone affinity, weight 50)
Schedule pods to any node in the same region as the cache (region affinity, weight 20)
Force pods onto cache-holding nodes when data locality is critical (required affinity)
Steer pods that do not use datasets away from cache-holding nodes to reduce resource contention

Limitations

Supported only on ACK Pro clusters.
Incompatible with Elastic Container Instance-based scheduling and priority-based resource scheduling.
If spec.affinity or spec.nodeSelector is already set in a pod spec, Fluid skips affinity injection for that pod.

Prerequisites

Before you begin, ensure that you have:

An ACK Pro cluster running Kubernetes 1.18 or later. For more information, see Create an ACK Pro cluster.
The cloud-native AI suite and ack-fluid 1.0.6 or later deployed in the cluster. For more information, see Deploy the cloud-native AI suite.

Important
If you already have open-source Fluid installed, uninstall it before deploying the ack-fluid component.
A kubectl client connected to the cluster. For more information, see Obtain the kubeconfig file of a cluster and use kubectl to connect to the cluster.

How it works

When a pod is created, Fluid's mutating webhook reads the pod's labels and the Dataset it references, then injects nodeAffinity rules into the pod spec before kube-scheduler evaluates placement. The injected rules use preferredDuringSchedulingIgnoredDuringExecution (soft affinity) by default. To force hard placement on cache-holding nodes, add the label fluid.io/dataset.<dataset_name>.sched: required to the pod.

The fallback order is: node → zone → region. If a pod cannot be placed on a node with cached data, kube-scheduler falls back to a node in the same zone, then the same region, based on the configured weights.

Pods that do not reference a dataset are handled by the PreferNodesWithoutCache plugin, which steers them away from nodes reserved for data caching.

Configure the scheduling policy

Default configuration

The scheduling policy is stored in the webhook-plugins ConfigMap in the fluid-system namespace. To inspect it:

kubectl get cm -n fluid-system webhook-plugins -oyaml

Expected output:

apiVersion: v1
data:
  pluginsProfile: |
    pluginConfig:
    - args: |
        preferred:
          # fluid.io/node: built-in, name cannot be changed. Schedules pods to the node holding cached data.
          - name: fluid.io/node
            weight: 100
          # topology.kubernetes.io/zone: schedules pods to nodes in the same zone as the cache. Adjust key for your cluster.
          - name: topology.kubernetes.io/zone
            weight: 50
          # topology.kubernetes.io/region: schedules pods to nodes in the same region as the cache. Adjust key for your cluster.
          - name: topology.kubernetes.io/region
            weight: 20
        # required: applies when a pod carries the label fluid.io/dataset.{dataset name}.sched=required
        required:
          - fluid.io/node
      name: NodeAffinityWithCache
    plugins:
      serverful:
        withDataset:
        - RequireNodeWithFuse
        - NodeAffinityWithCache
        - MountPropagationInjector
        withoutDataset:
        - PreferNodesWithoutCache
      serverless:
        withDataset:
        - FuseSidecar
        withoutDataset: []

The preferred list controls soft affinity weights. The required list controls which label must match when a pod opts into hard scheduling. The fluid.io/node name in both sections cannot be changed.

Custom configuration

ACK clusters may use different node labels to represent topology. To add, remove, or replace topology keys:

Edit the ConfigMap:

kubectl edit -n fluid-system cm webhook-plugins

Modify the preferred list. Two common scenarios are described below. Example: Ignore node-level cache affinity Comment out fluid.io/node to drop node-level placement preference. Fluid still prefers zone and region affinity.

preferred:
  # - name: fluid.io/node   # commented out: node affinity disabled
  #   weight: 100
  - name: topology.kubernetes.io/zone
    weight: 50
  - name: topology.kubernetes.io/region
    weight: 20

Example: Add node pool affinity Insert a custom topology key between node-level and zone-level affinity to prefer scheduling within the same node pool.

preferred:
  - name: fluid.io/node
    weight: 100
  - name: alibabacloud.com/nodepool-id   # custom topology key
    weight: 80
  - name: topology.kubernetes.io/zone
    weight: 50
  - name: topology.kubernetes.io/region
    weight: 20

Restart the Fluid webhook to apply the changes:

kubectl rollout restart deployment -n fluid-system fluid-webhook

Examples

Example 1: Preferred node affinity (soft scheduling)

This example schedules a pod to the node that holds the cached data. If no cache-holding node is schedulable, kube-scheduler falls back to zone and region affinity based on the configured weights.

Create a Secret with your OSS credentials:

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
stringData:
  fs.oss.accessKeyId: <ACCESS_KEY_ID>
  fs.oss.accessKeySecret: <ACCESS_KEY_SECRET>

Create a Dataset and a JindoRuntime:

Important

This example uses JindoRuntime. To use other cache runtimes, see Use EFC to accelerate access to NAS or CPFS. For JindoFS and Object Storage Service (OSS) acceleration, see Use JindoFS to accelerate access to OSS.

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  mounts:
    - mountPoint: oss://<oss_bucket>/<bucket_dir>
      options:
        fs.oss.endpoint: <oss_endpoint>
      name: hadoop
      path: "/"
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 10G
        high: "0.99"
        low: "0.8"

Create the application pod with affinity injection enabled:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    fuse.serverful.fluid.io/inject: "true"   # enables Fluid affinity injection
spec:
  containers:
    - name: nginx
      image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
      volumeMounts:
        - mountPath: /data
          name: data-vol
  volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: demo-dataset   # PVC auto-created by Fluid, named after the Dataset

Verify that Fluid injected the affinity rule:

kubectl get pod nginx -oyaml

The pod spec should contain a preferredDuringSchedulingIgnoredDuringExecution rule targeting the cache-holding node:

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: fluid.io/s-default-demo-dataset
            operator: In
            values:
            - "true"
        weight: 100

Confirm the pod was scheduled to a cache-holding node:
```
kubectl get pod nginx -o custom-columns=NAME:metadata.name,NODE:.spec.nodeName
```
The node shown should be one of the JindoRuntime worker nodes where data is cached.

Example 2: Preferred zone affinity (soft scheduling)

This example schedules pods to any node in the zone where cached data resides. Both node-level and zone-level affinity rules are injected, so kube-scheduler first tries cache-holding nodes, then other nodes in the same zone.

To enable zone affinity, pin the Dataset and JindoRuntime master to a specific zone.

Create a Secret (same as Example 1):

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
stringData:
  fs.oss.accessKeyId: <ACCESS_KEY_ID>
  fs.oss.accessKeySecret: <ACCESS_KEY_SECRET>

Create a Dataset and a JindoRuntime, both pinned to the target zone:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: topology.kubernetes.io/zone
              operator: In
              values:
              - "<ZONE_ID>"   # e.g., cn-beijing-i
  mounts:
    - mountPoint: oss://<oss_bucket>/<bucket_dir>
      options:
        fs.oss.endpoint: <oss_endpoint>
      name: hadoop
      path: "/"
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  replicas: 2
  master:
    nodeSelector:
      topology.kubernetes.io/zone: <ZONE_ID>   # e.g., cn-beijing-i
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 10G
        high: "0.99"
        low: "0.8"

The nodeAffinity.required.nodeSelectorTerms constraint on the Dataset tells Fluid which zone the cache lives in. Fluid reads this to generate the zone-level affinity rule injected into application pods.

Create the application pod:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    fuse.serverful.fluid.io/inject: "true"
spec:
  containers:
    - name: nginx
      image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
      volumeMounts:
        - mountPath: /data
          name: data-vol
  volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: demo-dataset

Verify the injected affinity rules:

kubectl get pod nginx -oyaml

Both a node-level and a zone-level affinity rule should appear:

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: fluid.io/s-default-demo-dataset
            operator: In
            values:
            - "true"
        weight: 100
      - preference:
          matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - <ZONE_ID>   # e.g., cn-beijing-i
        weight: 50

Confirm the pod was scheduled to a node in the target zone:

kubectl get pod nginx -o custom-columns=NAME:metadata.name,NODE:.spec.nodeName
kubectl get node <node_name> --show-labels | grep topology.kubernetes.io/zone

The node should have the topology.kubernetes.io/zone=<ZONE_ID> label.

Example 3: Required node affinity (hard scheduling)

This example forces the pod onto a node that holds cached data. If no cache-holding node is schedulable, the pod stays pending. Use this when data locality is non-negotiable—for example, in latency-sensitive training jobs.

Create a Secret (same as Example 1):

apiVersion: v1
kind: Secret
metadata:
  name: mysecret
stringData:
  fs.oss.accessKeyId: <ACCESS_KEY_ID>
  fs.oss.accessKeySecret: <ACCESS_KEY_SECRET>

Create a Dataset and a JindoRuntime (same as Example 1):

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo-dataset
spec:
  mounts:
    - mountPoint: oss://<oss_bucket>/<bucket_dir>
      options:
        fs.oss.endpoint: <oss_endpoint>
      name: hadoop
      path: "/"
      encryptOptions:
        - name: fs.oss.accessKeyId
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeyId
        - name: fs.oss.accessKeySecret
          valueFrom:
            secretKeyRef:
              name: mysecret
              key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
  name: demo-dataset
spec:
  replicas: 2
  tieredstore:
    levels:
      - mediumtype: MEM
        path: /dev/shm
        quota: 10G
        high: "0.99"
        low: "0.8"

Create the application pod with the required-scheduling label:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    fuse.serverful.fluid.io/inject: "true"
    fluid.io/dataset.demo-dataset.sched: required   # forces hard affinity for demo-dataset
spec:
  containers:
    - name: nginx
      image: registry.openanolis.cn/openanolis/nginx:1.14.1-8.6
      volumeMounts:
        - mountPath: /data
          name: data-vol
  volumes:
    - name: data-vol
      persistentVolumeClaim:
        claimName: demo-dataset

The label format is fluid.io/dataset.<dataset_name>.sched: required. Replace <dataset_name> with the name of your Dataset.

Verify the injected affinity rule:

kubectl get pod nginx -oyaml

A requiredDuringSchedulingIgnoredDuringExecution rule should appear, blocking the pod from scheduling to any node that does not hold the cached data:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: fluid.io/s-default-demo-dataset
            operator: In
            values:
            - "true"

Confirm the pod was scheduled to a cache-holding node:
```
kubectl get pod nginx -o custom-columns=NAME:metadata.name,NODE:.spec.nodeName
```
The node shown should be one of the JindoRuntime worker nodes. If no such node is available, the pod remains in Pending state until a cache-holding node becomes schedulable.

Container Service for Kubernetes:Schedule pods based on cache affinity