The dynamic volume mechanism automates on-demand storage for CPFS for Lingjun, which eliminates the need to manually manage persistent volumes (PVs). This method supports parallel read and write operations for multiple applications and is ideal for scenarios such as AI training and data analytics. You can use it to efficiently share data, such as code, configuration files, and intermediate computing results.
Preparations
You are familiar with the limits of CPFS for Lingjun.
Make sure that your cluster meets the following requirements:
Cluster version: 1.26 or later. To upgrade the cluster, see Manually upgrade an ACK cluster.
Node operating system: Alibaba Cloud Linux 3.
The following storage components are installed and meet the version requirements.
On the Component Management page of the cluster, you can check component versions and install or upgrade components.
CSI components (csi-plugin and csi-provisioner): v1.33.1 or later. For more information about how to upgrade, see Manage CSI components.
cnfs-nas-daemon component: 0.1.2 or later.
bmcpfs-csi component: 1.35.1 or later
This includes bmcpfs-csi-controller (a control plane component managed by ACK) and bmcpfs-csi-node (a node-side component deployed as a DaemonSet in the cluster).
Notes
When you use a VSC mount, the node where the pod runs must be in the same hpn-zone as the CPFS for Lingjun file system instance.
During initialization, a Lingjun node must be associated with a CPFS for Lingjun instance. Otherwise, the instance cannot be mounted using CSI.
Before you take a faulty Lingjun node offline, you must first drain the pods. Otherwise, the cluster metadata becomes inconsistent, and the pod resources are left behind and cannot be reclaimed.
Mounting multiple persistent volumes from the same CPFS for Lingjun file system in a single pod is not supported. This applies to multiple PVs created by a StorageClass that has the same
bmcpfsId. Because of native protocol limitations, unexpected behavior occurs if the same pod tries to mount the same file system instance multiple times, even to different subdirectories.
Step 1: Create a CPFS file system
Create a CPFS for Lingjun file system and record the file system ID. For more information, see Create a CPFS for Lingjun file system.
(Optional) To mount from a non-Lingjun node, create a VPC mount target in the same VPC as the cluster nodes and record the mount target domain name. The domain name uses the format
cpfs-***-vpc-***.<Region>.cpfs.aliyuncs.com.If the pod is scheduled to a Lingjun node, it uses a VSC mount by default. In this case, this step is not required.
Step 2: Create a StorageClass
Create a StorageClass object to use as a storage template.
Create a file named
sc.yamlwith the following content.apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: alicloud-bmcpfs-test provisioner: bmcpfsplugin.csi.alibabacloud.com parameters: # CPFS for Lingjun file system ID bmcpfsId: bmcpfs-29000z8xz3lf5nj***** # Specify a subdirectory within the file system # path: "/shared" # Allow subsequent volume expansion allowVolumeExpansion: true # Delete (automatic cleanup) or Retain (keep data) reclaimPolicy: DeleteParameter descriptions:
Parameter
Required
Description
bmcpfsIdYes
The ID of the BMCPFS file system, such as
bmcpfs-xxxxxxxxxorcpfs-xxxxxxxxx.pathNo
A subdirectory within the file system.
If specified, the volume is created in the
{path}/{volumeName}/path.If not specified, the volume is created in the
/{volumeName}/path.
allowVolumeExpansionNo
Specifies whether to allow automatic expansion through a PVC later.
The current version does not support dynamic expansion. This is a reserved parameter.
reclaimPolicyNo
Delete(default): When the PVC is deleted, the fileset in the backend file system is automatically deleted.Retain: When the PVC is deleted, the fileset in the backend file system is retained. You must clean it up manually. This policy is recommended for production environments.
Create the StorageClass.
kubectl apply -f sc.yaml
Step 3: Create a PVC
The application requests a persistent volume using a PVC and references the StorageClass as a configuration template.
Create a file named
pvc.yamlwith the following content.apiVersion: v1 kind: PersistentVolumeClaim metadata: name: bmcpfs-vsc namespace: default spec: accessModes: # CPFS for Lingjun volumes support simultaneous read and write operations by multiple pods - ReadWriteMany resources: requests: # Supports large-capacity storage (TiB level) storage: 10Ti # Only Filesystem is supported volumeMode: Filesystem # Specify the previously created StorageClass storageClassName: alicloud-bmcpfs-testParameter descriptions:
All the following parameters are required.
Parameter
Description
accessModesOnly
ReadWriteManyis supported. This means multiple pods can mount and perform read/write operations at the same time.storageThe requested storage capacity. Units such as Gi and Ti are supported.
volumeModeOnly
Filesystemis supported.storageClassNameSpecifies the
StorageClassto use. This triggers the dynamic creation of the persistent volume.Create the PVC.
kubectl apply -f pvc.yamlYou can run the following commands to check the PVC status.
Run
kubectl get pvc bmcpfs-vsc -n defaultto view the PVC status. If the value ofSTATUSisBound, the system has automatically created a corresponding PV.Run
kubectl describe pvc bmcpfs-vsc -n defaultand check theEventssection for theProvisioning succeededmessage.
Step 4: Create a workload and mount the PVC
After the PVC is created, you can deploy a sample workload and mount the PV that is bound to the PVC to the application.
Create a file named
deploy.yamlwith the following content.apiVersion: apps/v1 kind: Deployment metadata: name: cpfs-shared-example spec: # Create 3 replicas to verify shared storage across multiple pods replicas: 3 selector: matchLabels: app: cpfs-shared-app template: metadata: labels: app: cpfs-shared-app spec: # Ensure the pod can be scheduled to a Lingjun node tolerations: - key: node-role.alibabacloud.com/lingjun operator: Exists effect: NoSchedule # Optional: To schedule all pods to a specific node, uncomment this line and modify the node name # nodeName: cn-hangzhou.10.XX.XX.226 containers: - name: app-container image: anolis-registry.cn-zhangjiakou.cr.aliyuncs.com/openanolis/nginx:1.14.1-8.6 volumeMounts: - name: pvc-cpfs # Mount the shared storage volume to the /data directory inside the container mountPath: /data # Simple lifecycle command to verify data writing and sharing # After the pod starts, it writes a file containing its hostname to the shared directory lifecycle: postStart: exec: command: - /bin/sh - -c - > echo "Data written by $(hostname)" > /data/$(hostname).txt && echo "Deployment is running, check shared data in /data." && sleep 3600 volumes: - name: pvc-cpfs persistentVolumeClaim: # Reference the previously created PVC claimName: bmcpfs-vscCreate the deployment.
kubectl apply -f deploy.yaml
Resource release guide
To avoid unexpected charges and ensure data security, follow this process to release unused resources.
Delete the workload
Operation: Delete all applications that use the relevant PVC, such as deployments and StatefulSets. This action stops the applications and unmounts the persistent volumes.
Example command:
kubectl delete deployment <your-deployment-name>
Delete the PVC
Operation: Delete the PVC that is associated with the application. How the backend data is handled depends on the reclaim policy (
reclaimPolicy) of theStorageClass.Retain(Recommended): After the PVC is deleted, the fileset and data on the backend CPFS for Lingjun are retained.Delete: After the PVC is deleted, its bound PV and the fileset on the backend CPFS for Lingjun are permanently deleted. This operation is irreversible. Use this policy with caution.
Example command:
kubectl delete pvc <your-pvc-name>
Delete the PV (if the reclaim policy is
Retain)Operation: If the reclaim policy is
Retain, the PV enters theReleasedstate after the PVC is deleted. You must then manually delete the PV. This operation removes only the resource definition in Kubernetes and does not affect the backend data.Example command:
kubectl delete pv <your-pv-name>
Delete the StorageClass (Optional)
Operation: If you no longer need this storage class, you can delete the
StorageClass. This operation does not affect volumes that are already created.Example command:
kubectl delete sc <your-sc-name>
Delete the CPFS for Lingjun backend file system
Operation: This operation permanently deletes all data on the file system, including data that is retained by the
Retainpolicy. This data cannot be recovered. Before you proceed, make sure that no services depend on this file system. For more information, see Delete a file system.
FAQ
Why is my PVC stuck in the Pending state?
A PVC in the Pending state usually indicates that the creation (provisioning) of the dynamically provisioned volume has failed. You can follow these steps to troubleshoot the issue.
Check the PVC events. The events usually indicate the reason for the failure.
kubectl describe pvc <your-pvc-name> -n <your-namespace>Look for alert information in the
Eventssection. Common reasons include the following:StorageClass not found: ThestorageClassNamefield is incorrect, or the corresponding StorageClass does not exist.provisioning failedorfailed to create fileset: A problem occurred when the system interacted with the backend storage. You can proceed with the next steps.
Check the StorageClass and CSI driver configurations
If the event log indicates a configuration problem or does not show a clear error, you can check the
StorageClassconfiguration and the status of the CSI driver.# 1. Check the YAML configuration of the StorageClass kubectl get storageclass <your-sc-name> -o yaml # 2. Check if the CSI driver is registered in the cluster kubectl get csidriver bmcpfsplugin.csi.alibabacloud.comConfirm the following:
StorageClass configuration: The
provisionerfield is correct, and thebmcpfsIdparameteris correctly set to an existing file system ID.bmcpfs-csi status: If the
get csidrivercommand returns an error or no output, the driver is not installed correctly. On the Component Management page of the cluster, you can install the bmcpfs-csi-controller, bmcpfs-csi-node, and cnfs-nas-daemon components.
How do I troubleshoot a pod that is stuck in the ContainerCreating state or shows a MountVolume.Setup failed error in its events?
This error indicates that the pod has been scheduled to a node but failed to mount the persistent volume on that node. You can follow this troubleshooting process.
Check pod events to identify the cause
You can view the pod's event logs using the
describe podcommand.kubectl describe pod <pod-name> -n <your-namespace>Pay close attention to
Warningmessages in theEventssection, such asFailedMountorMountVolume.Setup failed.Check the mount prerequisites
Confirm that the PVC status is
Boundbecause pods can only mount bound volumes.kubectl get pvc <your-pvc-name>The
STATUSof the PVC must beBound. APendingstatus indicates a problem with the volume creation process. For more information, see Why is my PVC stuck in the Pending state?.Check the detailed logs of the node's CSI plugin
If the PVC is
Boundand the pod is on the correct node, you can further check the mount operation that is performed by the node-sidecsi-plugincomponent.# View the logs of the csi-plugin on the pod's node to find the root cause of the failure kubectl get pods -n kube-system -l app.kubernetes.io/name=bmcpfs-csi-driver --field-selector spec.nodeName=<nodeName> -o name | xargs kubectl logs -n kube-system -c csi-pluginThese logs contain the lowest-level error messages, such as network connectivity issues from the node to the storage backend, mount target permission problems, or underlying I/O errors.