When you encounter pod status abnormalities while mounting and using storage volumes, you can troubleshoot by checking the status and events of the pod and PVC, along with the CSI storage component to identify the cause and resolve the issue. This topic describes the diagnostic procedure for storage issues and common storage problems.

1. Check if the pod abnormality is caused by storage issues
Confirm whether the pod cannot start due to storage issues by checking pod or PVC events.
Check the events of the abnormal pod.
kubectl describe pod <pod-name>If there are events in Events indicating a storage issue (for example, the sample
FailedSchedulingwithMessageindicating that scheduling failed due to volume and node mismatch), refer to the following sections for further troubleshooting.Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 4m37s default-scheduler 0/1 nodes are available: 1 node(s) had volume node affinity conflict. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.,If there are events in Events indicating that the volume has been successfully mounted (for example, the sample
SuccessfulAttachVolume), and the pod is not starting (for example,CrashLoopBackOff), this is not a storage issue. Please troubleshoot other issues based on Events or submit a ticket.Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 97s default-scheduler Successfully assigned default/disk-test-0 to cn-shanghai.192.168.5.2 Normal SuccessfulAttachVolume 97s attachdetach-controller AttachVolume.Attach succeeded for volume "d-uf6b8s2l5ypf48*****"
If no storage-related events are found in the pod events, you can check all events.
kubectl get eventsIf there are events indicating a storage issue (for example, the sample
FailedBindingindicating that PVC failed to bind to PV), refer to the following sections for further troubleshooting.LAST SEEN TYPE REASON OBJECT MESSAGE 2m56s Normal FailedBinding persistentvolumeclaim/data-my-release-mariadb-0 no persistent volumes available for this claim and no storage class is set 41s Normal ExternalProvisioning persistentvolumeclaim/pvc-nas-dynamic-create-subpath8 waiting for a volume to be created, either by external provisioner "nasplugin.csi.alibabacloud.com" or manually created by system administrator 3m31s Normal Provisioning persistentvolumeclaim/pvc-nas-dynamic-create-subpath8 External provisioner is provisioning volume for claim "default/pvc-nas-dynamic-create-subpath8"If there are no storage-related events, please troubleshoot other issues based on Events or submit a ticket.
2. Check if the storage component is working properly
If your cluster is currently using the Flexvolume component, since Flexvolume is deprecated, please migrate to the CSI component as soon as possible. For more information, see Migrate from Flexvolume to CSI.
Check whether the CSI storage component is working properly.
kubectl get pod -n kube-system |grep csiThe following is an example of the returned result. If the pod status is not Running, you can run
kubectl describe pods <pod-name> -n kube-systemto check the specific reason for container exit and the pod's events.NoteCSI storage components include csi-plugin and csi-provisioner, where csi-provisioner is installed in managed version by default. The managed version component is maintained by Alibaba Cloud, and you cannot see the related pods in the cluster.
NAME READY STATUS RESTARTS AGE csi-plugin-bpz28 4/4 Running 0 3d csi-plugin-h2tdg 4/4 Running 0 3d csi-plugin-qpnm4 4/4 Running 0 3d csi-plugin-wczgm 4/4 Running 0 3dCheck whether the CSI storage component is the latest version.
kubectl get ds csi-plugin -n kube-system -o yaml |grep imageYou can confirm the image version in the
imagefield of the returned information, as shown in the following example:image: registry-cn-shanghai-vpc.ack.aliyuncs.com/acs/csi-plugin:v1.33.1-67e8986-aliyunIf the storage component is not the latest version, please upgrade csi-plugin and csi-provisioner. For information about the latest version of storage components, see csi-plugin.
NoteYou can also find the csi-plugin and csi-provisioner components on the Components page in the console to confirm version information and upgrade components.
Check the YAML of PV, PVC, and StorageClass to confirm that the driver configuration (
driverorprovisionerfield) is using the CSI storage component and is consistent with the storage component used by the current cluster.
3. Check if the PVC is in Bound state
Check the PVC status.
kubectl get pvcIf the PVC is not in Bound state, refer to the following methods for troubleshooting and resolution.
Cause
Static: The selectors of the PVC and PV fail to meet certain conditions. For example, the selector configuration of the PVC is different from that of the PV, the StorageClass names are inconsistent, or the PV status is Released.
Dynamic: Issues with the csi-provisioner component.
Solution
Static: Check whether the relevant YAML is written correctly. For more information, see the following topics:
NoteIf the PV status is Released, the PV cannot be reused. You need to obtain the storage resource information from the PV and create a new PV.
Dynamic: Run
kubectl describe pvc <pvc-name> -n <namespace>to check the PVC events.Handle according to the event prompt information. For more information, see the following topics:
If there is no relevant event information, submit a ticket.
If you are using disk volumes, the issue might be with the ECS OpenAPI when creating disks. Please refer to ECS Error Center for troubleshooting. If troubleshooting fails, submit a ticket.
4. Check if the pod is in Running state
Check the pod status.
kubectl get podIf the PVC is in Bound state but the pod is not in Running state, please refer to the following methods for troubleshooting and resolution based on the storage volume type.
Disk volumes
ImportantWhen using disk volumes, you need to ensure that the ECS node specification to which the Pod is scheduled supports mounting that type of disk, and that the Pod and disk are in the same region and zone. For information about the matching relationship between disk types and ECS instance specifications, see Overview of instance families.
Cause
No node is available for scheduling.
An error occurs when the system mounts the disk.
The ECS instance does not support the specified disk type.
Solution
Schedule the pod to another node. For more information, see Schedule application pods to the specified node.
Run
kubectl describe pods <pod-name>to view the pod events.Handle according to the event prompt information. For more information, see FAQ about disk volumes.
If there is no relevant event information, submit a ticket.
If the ECS instance does not support the specified disk type, select a disk type that is supported by the ECS instance. For more information, see Overview of instance families.
For other ECS OpenAPI type issues, see ErrorCode.
NAS volumes
ImportantThe node and NAS must be in the same VPC. If they are not in the same VPC, use Cloud Enterprise Network to connect them.
NAS supports cross-zone mounting.
The mount directory for Extreme NAS must start with
/share.
Cause
fsGroupsare used when mounting NAS, and there are many files, causing chmod to be slow.Port 2049 is blocked in the security group rules.
The NAS file system and node are deployed in different VPCs.
Solution
Check whether
fsGroupsare configured. If yes, remove them, restart the pod, and try to mount again.Check whether port 2049 of the node that hosts the pod is blocked. If yes, unblock the port and try again. For more information, see Add a security group rule.
If the NAS file system and node are deployed in different VPCs, use Cloud Enterprise Network to connect them.
For other issues, run
kubectl describe pods <pod-name>to view the pod events.Handle according to the event prompt information. For more information, see FAQ about NAS volumes.
If there is no relevant event information, submit a ticket.
OSS volumes
ImportantWhen mounting OSS to a node, you need to fill in the AccessKey information in the PV, which can be used through the Secret method.
When using OSS across regions, you need to change the Bucket URL to a public address. For the same region, it is recommended to use an internal address.
Cause
fsGroupsare used when mounting OSS, and there are many files, causing chmod to be slow.The OSS bucket and node are created in different regions and the internal endpoint of the OSS bucket is used. As a result, the node fails to connect to the bucket endpoint.
Solution
Check whether
fsGroupsare configured. If yes, remove them, restart the pod, and try to mount again.Check whether the OSS bucket and node are created in different regions and whether the internal endpoint is used. If yes, change to the public endpoint.
For other issues, run
kubectl describe pods <pod-name>to view the pod events.Handle according to the event prompt information. For more information, see FAQ about ossfs 1.0 volumes or FAQ about ossfs 2.0 volumes.
If there is no relevant event information, submit a ticket.
FAQ
When creating or mounting a storage volume, PVC prompts no volume plugin matched
Issue
When creating or mounting a storage volume, PVC prompts Unable to attach or mount volumes: unmounted volumes=[xxx], unattached volumes=[xxx]: failed to get Plugin from volumeSpec for volume "xxx" err=no volume plugin matched.
Cause
The volume plug-in does not match the YAML template. As a result, the system cannot find the corresponding volume plug-in when creating or mounting a volume.
Solution
Check whether the volume plug-in exists in the cluster.
If the volume plug-in is not installed, install the plug-in. For more information, see Manage components.
If the volume plug-in is already installed, check whether the volume plug-in matches the YAML templates of the PV and PVC and whether the YAML templates meet the following requirements:
The CSI plug-in is deployed by following the steps as required. For more information, see Storage CSI.
The FlexVolume plug-in is deployed by following the steps as required. For more information, see Storage FlexVolume.
ImportantSince FlexVolume is deprecated, it is recommended to migrate to the CSI component as soon as possible. For more information, see Migrate from FlexVolume to CSI.
Pod's Event prompts 0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims
Issue
The pod fails to start, and the pod event shows:
0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims. preemption: 0/x nodes are available: x Preemption is not helpful for schedulingCause
The custom StorageClass referenced by the pod is not found because the custom StorageClass does not exist.
Solution
You need to check whether the StorageClass referenced by the current pod exists. If it does not exist, you need to recreate the StorageClass.
PV is in Released state and cannot be bound by recreating PVC
Issue
After a PVC is accidentally deleted, the PV is in Released state and cannot be bound by recreating the PVC.
Cause
If the reclaimPolicy of the PVC is Retain, when the PVC is accidentally deleted, the PV will change to Released state.
Solution
You need to delete the pv.spec.claimRef field in the current PV, and then rebind using the static volume method. This will change the PV to Bound state.
PV is in Lost state and cannot be bound by recreating PVC
Issue
After creating PVC and PV, the PV is in Lost state and cannot be bound to the PVC.
Cause
The PVC name referenced by claimRef in the PV does not exist, causing the PV state to be Lost.
Solution
You need to delete the pv.spec.claimRef field in the current PV, and then rebind using the static volume method. This will change the PV to Bound state.
Will changes to StorageClass affect existing storage?
If the YAML files of PVC and PV do not change, changes to StorageClass will not affect existing storage. For example, when modifying the ALLOWVOLUMEEXPANSION field of StorageClass, it will only take effect after modifying the Capacity of the PVC. If the YAML file of the PVC does not change, this field will not affect the existing configuration.