All Products
Search
Document Center

Container Service for Kubernetes:Use dynamically provisioned volumes for CPFS for Lingjun

Last Updated:Jan 22, 2026

The dynamic volume mechanism automates on-demand storage for CPFS for Lingjun, which eliminates the need to manually manage persistent volumes (PVs). This method supports parallel read and write operations for multiple applications and is ideal for scenarios such as AI training and data analytics. You can use it to efficiently share data, such as code, configuration files, and intermediate computing results.

Preparations

  • You are familiar with the limits of CPFS for Lingjun.

  • Make sure that your cluster meets the following requirements:

    • Cluster version: 1.26 or later. To upgrade the cluster, see Manually upgrade an ACK cluster.

    • Node operating system: Alibaba Cloud Linux 3.

    • The following storage components are installed and meet the version requirements.

      On the Component Management page of the cluster, you can check component versions and install or upgrade components.
      • CSI components (csi-plugin and csi-provisioner): v1.33.1 or later. For more information about how to upgrade, see Manage CSI components.

      • cnfs-nas-daemon component: 0.1.2 or later.

        Click to view the introduction to cnfs-nas-daemon

        cnfs-nas-daemon manages EFC processes. It has high resource consumption and directly affects storage performance. You can adjust its resource configuration on the Component Management page. The recommended strategy is as follows:

        • CPU: The CPU request is related to the total bandwidth of the node. The calculation rule is to allocate 0.5 cores for every 1 Gb/s of bandwidth and add an extra 1 core for metadata management. You can adjust the CPU configuration based on this rule.

          For example, for a node with a 100 Gb/s network interface controller (NIC), the recommended CPU request is 100 × 0.5 + 1 = 51 cores.
        • Memory: CPFS for Lingjun is accessed through Filesystem in Userspace (FUSE). Its data read/write cache and file metadata both consume memory. You can set the memory request to 15% of the total memory of the node.

        After you adjust the configuration, you can dynamically scale resources based on the actual workload.

        Important
        • How updates take effect: The default update policy for the cnfs-nas-daemon DaemonSet is OnDelete. After you adjust its CPU or Memory on the Component Management page, you must manually delete the original cnfs-nas-daemon pod on the node. This action triggers a rebuild and applies the new configuration.

          To ensure business stability, we recommend that you perform this operation during off-peak hours.

        • Operation risks: Deleting or restarting the cnfs-nas-daemon pod temporarily interrupts the CPFS mount service on the node.

          • Nodes that do not support hot upgrades for mount targets①: This operation is a hardware interrupt and causes the application pod to run abnormally. You must manually delete the application pod and wait for it to restart to recover.

          • Nodes that support hot upgrades①: The application pod can automatically recover after the cnfs-nas-daemon pod restarts.

          ①: Nodes that meet the following conditions support hot upgrades:

          • The node system kernel is 5.10.134-18 or later.

          • The versions of bmcpfs-csi-controller and bmcpfs-csi-plugin are 1.35.1 or later.

          • The version of cnfs-nas-daemon is 0.1.9-compatible.1 or later.

      • bmcpfs-csi component: 1.35.1 or later

        This includes bmcpfs-csi-controller (a control plane component managed by ACK) and bmcpfs-csi-node (a node-side component deployed as a DaemonSet in the cluster).

Notes

  • When you use a VSC mount, the node where the pod runs must be in the same hpn-zone as the CPFS for Lingjun file system instance.

  • During initialization, a Lingjun node must be associated with a CPFS for Lingjun instance. Otherwise, the instance cannot be mounted using CSI.

  • Before you take a faulty Lingjun node offline, you must first drain the pods. Otherwise, the cluster metadata becomes inconsistent, and the pod resources are left behind and cannot be reclaimed.

  • Mounting multiple persistent volumes from the same CPFS for Lingjun file system in a single pod is not supported. This applies to multiple PVs created by a StorageClass that has the same bmcpfsId. Because of native protocol limitations, unexpected behavior occurs if the same pod tries to mount the same file system instance multiple times, even to different subdirectories.

Step 1: Create a CPFS file system

  1. Create a CPFS for Lingjun file system and record the file system ID. For more information, see Create a CPFS for Lingjun file system.

  2. (Optional) To mount from a non-Lingjun node, create a VPC mount target in the same VPC as the cluster nodes and record the mount target domain name. The domain name uses the format cpfs-***-vpc-***.<Region>.cpfs.aliyuncs.com.

    If the pod is scheduled to a Lingjun node, it uses a VSC mount by default. In this case, this step is not required.

Step 2: Create a StorageClass

Create a StorageClass object to use as a storage template.

  1. Create a file named sc.yaml with the following content.

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: alicloud-bmcpfs-test
    provisioner: bmcpfsplugin.csi.alibabacloud.com
    parameters:
      # CPFS for Lingjun file system ID
      bmcpfsId: bmcpfs-29000z8xz3lf5nj*****  
      # Specify a subdirectory within the file system
      # path: "/shared"  
    # Allow subsequent volume expansion
    allowVolumeExpansion: true  
    # Delete (automatic cleanup) or Retain (keep data)
    reclaimPolicy: Delete  

    Parameter descriptions:

    Parameter

    Required

    Description

    bmcpfsId

    Yes

    The ID of the BMCPFS file system, such as bmcpfs-xxxxxxxxx or cpfs-xxxxxxxxx.

    path

    No

    A subdirectory within the file system.

    • If specified, the volume is created in the {path}/{volumeName}/ path.

    • If not specified, the volume is created in the /{volumeName}/ path.

    allowVolumeExpansion

    No

    Specifies whether to allow automatic expansion through a PVC later.

    The current version does not support dynamic expansion. This is a reserved parameter.

    reclaimPolicy

    No

    • Delete (default): When the PVC is deleted, the fileset in the backend file system is automatically deleted.

    • Retain: When the PVC is deleted, the fileset in the backend file system is retained. You must clean it up manually. This policy is recommended for production environments.

  2. Create the StorageClass.

    kubectl apply -f sc.yaml

Step 3: Create a PVC

The application requests a persistent volume using a PVC and references the StorageClass as a configuration template.

  1. Create a file named pvc.yaml with the following content.

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: bmcpfs-vsc
      namespace: default
    spec:
      accessModes:
        # CPFS for Lingjun volumes support simultaneous read and write operations by multiple pods
        - ReadWriteMany  
      resources:
        requests:
          # Supports large-capacity storage (TiB level)
          storage: 10Ti  
      # Only Filesystem is supported
      volumeMode: Filesystem
      # Specify the previously created StorageClass
      storageClassName: alicloud-bmcpfs-test

    Parameter descriptions:

    All the following parameters are required.

    Parameter

    Description

    accessModes

    Only ReadWriteMany is supported. This means multiple pods can mount and perform read/write operations at the same time.

    storage

    The requested storage capacity. Units such as Gi and Ti are supported.

    volumeMode

    Only Filesystem is supported.

    storageClassName

    Specifies the StorageClass to use. This triggers the dynamic creation of the persistent volume.

  2. Create the PVC.

    kubectl apply -f pvc.yaml
  3. You can run the following commands to check the PVC status.

    • Run kubectl get pvc bmcpfs-vsc -n default to view the PVC status. If the value of STATUS is Bound, the system has automatically created a corresponding PV.

    • Run kubectl describe pvc bmcpfs-vsc -n default and check the Events section for the Provisioning succeeded message.

Step 4: Create a workload and mount the PVC

After the PVC is created, you can deploy a sample workload and mount the PV that is bound to the PVC to the application.

  1. Create a file named deploy.yaml with the following content.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: cpfs-shared-example
    spec:
      # Create 3 replicas to verify shared storage across multiple pods
      replicas: 3
      selector:
        matchLabels:
          app: cpfs-shared-app
      template:
        metadata:
          labels:
            app: cpfs-shared-app
        spec:
          # Ensure the pod can be scheduled to a Lingjun node
          tolerations:
            - key: node-role.alibabacloud.com/lingjun
              operator: Exists
              effect: NoSchedule
          # Optional: To schedule all pods to a specific node, uncomment this line and modify the node name
          # nodeName: cn-hangzhou.10.XX.XX.226
          containers:
          - name: app-container
            image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6
            volumeMounts:
              - name: pvc-cpfs
                # Mount the shared storage volume to the /data directory inside the container
                mountPath: /data
            # Simple lifecycle command to verify data writing and sharing
            # After the pod starts, it writes a file containing its hostname to the shared directory
            lifecycle:
              postStart:
                exec:
                  command:
                    - /bin/sh
                    - -c
                    - >
                      echo "Data written by $(hostname)" > /data/$(hostname).txt && 
                      echo "Deployment is running, check shared data in /data." && 
                      sleep 3600
          volumes:
            - name: pvc-cpfs
              persistentVolumeClaim:
                # Reference the previously created PVC
                claimName: bmcpfs-vsc
  2. Create the deployment.

    kubectl apply -f deploy.yaml

Resource release guide

To avoid unexpected charges and ensure data security, follow this process to release unused resources.

  1. Delete the workload

    • Operation: Delete all applications that use the relevant PVC, such as deployments and StatefulSets. This action stops the applications and unmounts the persistent volumes.

    • Example command: kubectl delete deployment <your-deployment-name>

  2. Delete the PVC

    • Operation: Delete the PVC that is associated with the application. How the backend data is handled depends on the reclaim policy (reclaimPolicy) of the StorageClass.

      • Retain (Recommended): After the PVC is deleted, the fileset and data on the backend CPFS for Lingjun are retained.

      • Delete: After the PVC is deleted, its bound PV and the fileset on the backend CPFS for Lingjun are permanently deleted. This operation is irreversible. Use this policy with caution.

    • Example command: kubectl delete pvc <your-pvc-name>

  3. Delete the PV (if the reclaim policy is Retain)

    • Operation: If the reclaim policy is Retain, the PV enters the Released state after the PVC is deleted. You must then manually delete the PV. This operation removes only the resource definition in Kubernetes and does not affect the backend data.

    • Example command: kubectl delete pv <your-pv-name>

  4. Delete the StorageClass (Optional)

    • Operation: If you no longer need this storage class, you can delete the StorageClass. This operation does not affect volumes that are already created.

    • Example command: kubectl delete sc <your-sc-name>

  5. Delete the CPFS for Lingjun backend file system

    • Operation: This operation permanently deletes all data on the file system, including data that is retained by the Retain policy. This data cannot be recovered. Before you proceed, make sure that no services depend on this file system. For more information, see Delete a file system.

FAQ

Why is my PVC stuck in the Pending state?

A PVC in the Pending state usually indicates that the creation (provisioning) of the dynamically provisioned volume has failed. You can follow these steps to troubleshoot the issue.

  1. Check the PVC events. The events usually indicate the reason for the failure.

    kubectl describe pvc <your-pvc-name> -n <your-namespace>

    Look for alert information in the Events section. Common reasons include the following:

    • StorageClass not found: The storageClassName field is incorrect, or the corresponding StorageClass does not exist.

    • provisioning failed or failed to create fileset: A problem occurred when the system interacted with the backend storage. You can proceed with the next steps.

  2. Check the StorageClass and CSI driver configurations

    If the event log indicates a configuration problem or does not show a clear error, you can check the StorageClass configuration and the status of the CSI driver.

    # 1. Check the YAML configuration of the StorageClass
    kubectl get storageclass <your-sc-name> -o yaml
    
    # 2. Check if the CSI driver is registered in the cluster
    kubectl get csidriver bmcpfsplugin.csi.alibabacloud.com

    Confirm the following:

    • StorageClass configuration: The provisioner field is correct, and the bmcpfsId parameter is correctly set to an existing file system ID.

    • bmcpfs-csi status: If the get csidriver command returns an error or no output, the driver is not installed correctly. On the Component Management page of the cluster, you can install the bmcpfs-csi-controller, bmcpfs-csi-node, and cnfs-nas-daemon components.

How do I troubleshoot a pod that is stuck in the ContainerCreating state or shows a MountVolume.Setup failed error in its events?

This error indicates that the pod has been scheduled to a node but failed to mount the persistent volume on that node. You can follow this troubleshooting process.

  1. Check pod events to identify the cause

    You can view the pod's event logs using the describe pod command.

    kubectl describe pod <pod-name> -n <your-namespace>

    Pay close attention to Warning messages in the Events section, such as FailedMount or MountVolume.Setup failed.

  2. Check the mount prerequisites

    Confirm that the PVC status is Bound because pods can only mount bound volumes.

    kubectl get pvc <your-pvc-name>

    The STATUS of the PVC must be Bound. A Pending status indicates a problem with the volume creation process. For more information, see Why is my PVC stuck in the Pending state?.

  3. Check the detailed logs of the node's CSI plugin

    If the PVC is Bound and the pod is on the correct node, you can further check the mount operation that is performed by the node-side csi-plugin component.

    # View the logs of the csi-plugin on the pod's node to find the root cause of the failure
    kubectl get pods -n kube-system -l app.kubernetes.io/name=bmcpfs-csi-driver   --field-selector spec.nodeName=<nodeName>   -o name | xargs kubectl logs -n kube-system -c csi-plugin

    These logs contain the lowest-level error messages, such as network connectivity issues from the node to the storage backend, mount target permission problems, or underlying I/O errors.