All Products
Search
Document Center

Container Service for Kubernetes:Use CPFS for Lingjun Dynamically Provisioned Volumes

Last Updated:Mar 26, 2026

Dynamic volume provisioning automates on-demand storage for CPFS for Lingjun, eliminating manual persistent volume (PV) management. Because CPFS for Lingjun supports concurrent reads and writes across multiple pods, it is well suited for AI training and data analytics workloads that share code, configuration files, and intermediate computation results.

Limitations

Review these constraints before you begin. Violating them leads to mount failures or unrecoverable cluster state.

  • Same hpn-zone required for VSC mounting: The node running the pod must be in the same hpn-zone as the CPFS for Lingjun file system instance.

  • Node initialization: A Lingjun node must be associated with a CPFS for Lingjun file system during initialization. If this step was skipped, CSI mounting fails.

  • One file system per pod: Do not mount multiple volumes from the same CPFS for Lingjun file system in a single pod — for example, multiple PVs created by a StorageClass containing the same bmcpfsId. The native protocol does not support mounting the same file system instance multiple times within a single pod, even to different subdirectories.

  • Drain before taking a node offline: Before taking a Lingjun node offline due to failure, drain all pods from it. Skipping this step leaves behind unrecoverable pod resources and causes inconsistent cluster metadata.

Prerequisites

Before you begin, make sure you have:

  • A cluster running version 1.26 or later. To upgrade, see Manually upgrade a cluster.

  • Nodes running Alibaba Cloud Linux 3.

  • The following storage components installed and meeting the minimum version requirements. Go to the Add-ons page to check versions, install, or upgrade components.

    Component Minimum version
    CSI add-on (csi-plugin and csi-provisioner) v1.33.1
    cnfs-nas-daemon add-on 0.1.2
    bmcpfs-csi component (bmcpfs-csi-controller and bmcpfs-csi-node) 1.35.1

Configure cnfs-nas-daemon resources

The cnfs-nas-daemon add-on manages Elastic File Client (EFC) processes. It consumes significant resources and directly affects storage performance. Set its resource configuration on the Add-ons page using these guidelines:

  • CPU: Allocate 0.5 core per 1 Gb/s of bandwidth, plus 1 extra core for metadata management. Example: For a node with a 100 Gb/s NIC, set the CPU request to 100 × 0.5 + 1 = 51 cores.

  • Memory: Set the memory request to 15% of the node's total memory. CPFS for Lingjun uses FUSE, so data caching and file metadata both consume memory.

After adjusting the configuration, scale resources up or down based on actual workload.

Important

The cnfs-nas-daemon DaemonSet uses the OnDelete update strategy by default. After changing CPU or memory settings on the Add-ons page, manually delete the existing cnfs-nas-daemon pod on each node to trigger a rebuild and apply the new settings. Perform this during off-peak hours.

  • Nodes without hot upgrade support: This causes a hardware interrupt. Application pods fail and require manual deletion. After deletion, they restart and recover automatically.

  • Nodes with hot upgrade support: Application pods recover automatically after the cnfs-nas-daemon pod restarts. A node supports hot upgrades when all three conditions are met: kernel version 5.10.134-18 or later, bmcpfs-csi-controller and bmcpfs-csi-plugin versions 1.35.1 or later, and cnfs-nas-daemon version 0.1.9-compatible.1 or later.

Step 1: Create a CPFS file system

  1. Create a CPFS for Lingjun file system. See Create a CPFS for Lingjun file system. Record the file system ID.

  2. (Optional) If you want to mount on non-Lingjun nodes, create a VPC mount target in the same VPC as your cluster nodes, and record the mount target domain name. The format is cpfs-*-vpc-*.<Region>.cpfs.aliyuncs.com.

    If all pods schedule to Lingjun nodes, VSC (Virtual Storage Controller) mounting is used by default and this step is not required.

Also review Limits for CPFS for Lingjun before proceeding.

Step 2: Create a StorageClass

Create a StorageClass object to define the storage template for dynamic provisioning.

  1. Create sc.yaml with the following content:

    Parameter Required Description
    bmcpfsId Yes CPFS for Lingjun file system ID. Format: bmcpfs-xxxxxxxxx or cpfs-xxxxxxxxx.
    path No Subdirectory within the file system. When specified, the volume is created under {path}/{volumeName}/. When omitted, the volume is created under /{volumeName}/.
    allowVolumeExpansion No Reserved parameter. The current version does not support dynamic expansion.
    reclaimPolicy No Delete (default): automatically deletes the fileset in the backend file system when you delete the PVC. Retain: keeps the fileset when you delete the PVC; you must clean up manually. Use Retain in production environments.
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: alicloud-bmcpfs-test
    provisioner: bmcpfsplugin.csi.alibabacloud.com
    parameters:
      # Required: CPFS for Lingjun file system ID (format: bmcpfs-xxxxxxxxx or cpfs-xxxxxxxxx)
      bmcpfsId: bmcpfs-29000z8xz3lf5nj*****
      # Optional: subdirectory within the file system; volume creates under {path}/{volumeName}/
      # path: "/shared"
    # Reserved parameter — current version does not support dynamic expansion
    allowVolumeExpansion: true
    # Delete (default): automatically removes the fileset when the PVC is deleted
    # Retain (recommended for production): keeps the fileset; you must clean up manually
    reclaimPolicy: Delete
  2. Apply the StorageClass:

    kubectl apply -f sc.yaml

    The expected output is:

    storageclass.storage.k8s.io/alicloud-bmcpfs-test created

Step 3: Create a PVC

Applications request storage through a persistent volume claim (PVC), which references the StorageClass as a provisioning template.

  1. Create pvc.yaml with the following content:

    Parameter Description
    accessModes Only ReadWriteMany is supported, allowing multiple pods to mount and read/write simultaneously.
    storage Requested storage capacity. Supports units such as Gi and Ti.
    volumeMode Only Filesystem is supported.
    storageClassName The StorageClass to use. Specifying this field triggers dynamic volume creation.
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: bmcpfs-vsc
      namespace: default
    spec:
      accessModes:
        - ReadWriteMany  # Supports concurrent reads and writes across multiple pods
      resources:
        requests:
          storage: 10Ti  # Supports large-capacity storage (Ti level)
      volumeMode: Filesystem  # Only Filesystem is supported
      storageClassName: alicloud-bmcpfs-test  # Must match the StorageClass created in Step 2
  2. Apply the PVC:

    kubectl apply -f pvc.yaml
  3. Verify the PVC is bound:

    kubectl get pvc bmcpfs-vsc -n default

    The expected output is:

    NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS           AGE
    bmcpfs-vsc   Bound    pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx   10Ti       RWX            alicloud-bmcpfs-test   30s

    When STATUS is Bound, the system has automatically created the corresponding PV. To confirm provisioning succeeded, run:

    kubectl describe pvc bmcpfs-vsc -n default

    In the Events section, look for a Provisioning succeeded message.

Step 4: Deploy a workload and mount the PVC

After the PVC is bound, deploy a workload that mounts the volume.

  1. Create deploy.yaml with the following content:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cpfs-shared-example
    spec:
      replicas: 3  # Three replicas verify that shared storage works across multiple pods
      selector:
        matchLabels:
          app: cpfs-shared-app
      template:
        metadata:
          labels:
            app: cpfs-shared-app
        spec:
          tolerations:
            - key: node-role.alibabacloud.com/lingjun
              operator: Exists
              effect: NoSchedule
          # Optional: to pin all pods to a specific node, uncomment and set the node name
          # nodeName: cn-hangzhou.10.XX.XX.226
          containers:
          - name: app-container
            image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
            volumeMounts:
              - name: pvc-cpfs
                mountPath: /data  # Shared volume mounted at /data inside the container
            lifecycle:
              postStart:
                exec:
                  command:
                    - /bin/sh
                    - -c
                    - >
                      echo "Data written by $(hostname)" > /data/$(hostname).txt &&
                      echo "Deployment is running, check shared data in /data." &&
                      sleep 3600
          volumes:
            - name: pvc-cpfs
              persistentVolumeClaim:
                claimName: bmcpfs-vsc  # References the PVC created in Step 3
  2. Apply the Deployment:

    kubectl apply -f deploy.yaml

    The expected output is:

    deployment.apps/cpfs-shared-example created

Clean up resources

To avoid unexpected costs and ensure data safety, delete resources in the following order.

  1. Delete workloads — Stop all applications using the relevant PVCs. This unmounts the volumes.

    kubectl delete deployment <your-deployment-name>
  2. Delete PVCs — The outcome depends on the reclaimPolicy of the StorageClass:

    • Retain (recommended): After deleting the PVC, the CPFS for Lingjun fileset and its data remain intact. Proceed to Step 3 to clean up the PV.

    • Delete: Deleting the PVC permanently deletes its bound PV and the backend fileset. This operation is irreversible.

    kubectl delete pvc <your-pvc-name>
  3. Delete PVs (only when reclaimPolicy is Retain) — After you delete the PVC, the PV transitions to Released status. Delete the PV to remove the resource definition from Kubernetes. This does not affect backend data.

    kubectl delete pv <your-pv-name>
  4. (Optional) Delete the StorageClass — If you no longer need this storage configuration, delete the StorageClass. This does not affect already-created volumes.

    kubectl delete sc <your-sc-name>
  5. Delete the CPFS for Lingjun file system — This permanently deletes all data on the file system, including data retained by the Retain policy. Confirm the file system has no remaining dependencies before proceeding. See Delete a file system.

Troubleshooting

PVC stays in Pending status

A PVC stuck in Pending means dynamic provisioning failed. Start with the PVC events — they usually identify the cause directly.

kubectl describe pvc <your-pvc-name> -n <your-namespace>

Check the Events section for warning messages. Common causes:

  • StorageClass not found: The storageClassName field is incorrect, or the StorageClass does not exist.

  • provisioning failed or failed to create fileset: There is an issue interacting with the backend storage. Continue with the steps below.

If the events point to a configuration issue, inspect the StorageClass and verify the CSI driver is registered:

# Check the StorageClass configuration
kubectl get storageclass <your-sc-name> -o yaml

# Verify the CSI driver is registered
kubectl get csidriver bmcpfsplugin.csi.alibabacloud.com

Confirm that:

  • The provisioner field matches bmcpfsplugin.csi.alibabacloud.com.

  • The bmcpfsId parameter is correctly set and the file system ID exists.

  • If get csidriver returns no output or an error, the driver is not installed. Install bmcpfs-csi-controller, bmcpfs-csi-node, and cnfs-nas-daemon from the cluster's Add-ons page.

Pod stays in ContainerCreating or MountVolume.Setup failed

This error means the pod was scheduled to a node but the volume mount failed. Follow these steps to isolate the cause.

  1. Check pod events for the specific failure:

    kubectl describe pod <pod-name> -n <your-namespace>

    In the Events section, look for Warning messages such as FailedMount or MountVolume.Setup failed.

  2. Confirm the PVC is bound. Pods can only mount successfully bound volumes.

    kubectl get pvc <your-pvc-name>

    The STATUS must be Bound. If it is Pending, see PVC stays in Pending status.

  3. If the PVC is bound, check the node-side CSI plugin logs for the lowest-level error:

    kubectl get pods -n kube-system -l app.kubernetes.io/name=bmcpfs-csi-driver \
      --field-selector spec.nodeName=<nodeName> \
      -o name | xargs kubectl logs -n kube-system -c csi-plugin

    These logs contain detailed error messages, including network connectivity issues, mount target permission errors, and underlying I/O errors.