All Products
Search
Document Center

Container Service for Kubernetes:Storage troubleshooting

Last Updated:Jun 25, 2025

When you encounter pod status abnormalities while mounting and using storage volumes, you can troubleshoot by checking the status and events of the pod and PVC, along with the CSI storage component to identify the cause and resolve the issue. This topic describes the diagnostic procedure for storage issues and common storage problems.

流程

1. Check if the pod abnormality is caused by storage issues

Confirm whether the pod cannot start due to storage issues by checking pod or PVC events.

  1. Check the events of the abnormal pod.

    kubectl describe pod <pod-name>
    • If there are events in Events indicating a storage issue (for example, the sample FailedScheduling with Message indicating that scheduling failed due to volume and node mismatch), refer to the following sections for further troubleshooting.

      Events:
        Type     Reason            Age    From               Message
        ----     ------            ----   ----               -------
        Warning  FailedScheduling  4m37s  default-scheduler  0/1 nodes are available: 1 node(s) had volume node affinity conflict. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.,
    • If there are events in Events indicating that the volume has been successfully mounted (for example, the sample SuccessfulAttachVolume), and the pod is not starting (for example, CrashLoopBackOff), this is not a storage issue. Please troubleshoot other issues based on Events or submit a ticket.

      Events:
        Type    Reason                  Age   From                     Message
        ----    ------                  ----  ----                     -------
        Normal  Scheduled               97s   default-scheduler        Successfully assigned default/disk-test-0 to cn-shanghai.192.168.5.2
        Normal  SuccessfulAttachVolume  97s   attachdetach-controller  AttachVolume.Attach succeeded for volume "d-uf6b8s2l5ypf48*****"
  2. If no storage-related events are found in the pod events, you can check all events.

    kubectl get events
    • If there are events indicating a storage issue (for example, the sample FailedBinding indicating that PVC failed to bind to PV), refer to the following sections for further troubleshooting.

      LAST SEEN   TYPE      REASON                 OBJECT                                                  MESSAGE
      2m56s       Normal    FailedBinding          persistentvolumeclaim/data-my-release-mariadb-0         no persistent volumes available for this claim and no storage class is set
      41s         Normal    ExternalProvisioning   persistentvolumeclaim/pvc-nas-dynamic-create-subpath8   waiting for a volume to be created, either by external provisioner "nasplugin.csi.alibabacloud.com" or manually created by system administrator
      3m31s       Normal    Provisioning           persistentvolumeclaim/pvc-nas-dynamic-create-subpath8   External provisioner is provisioning volume for claim "default/pvc-nas-dynamic-create-subpath8"
    • If there are no storage-related events, please troubleshoot other issues based on Events or submit a ticket.

2. Check if the storage component is working properly

Note

If your cluster is currently using the Flexvolume component, since Flexvolume is deprecated, please migrate to the CSI component as soon as possible. For more information, see Migrate from Flexvolume to CSI.

  1. Check whether the CSI storage component is working properly.

    kubectl get pod -n kube-system |grep csi

    The following is an example of the returned result. If the pod status is not Running, you can run kubectl describe pods <pod-name> -n kube-system to check the specific reason for container exit and the pod's events.

    Note

    CSI storage components include csi-plugin and csi-provisioner, where csi-provisioner is installed in managed version by default. The managed version component is maintained by Alibaba Cloud, and you cannot see the related pods in the cluster.

    NAME                     READY   STATUS        RESTARTS   AGE
    csi-plugin-bpz28         4/4     Running       0          3d
    csi-plugin-h2tdg         4/4     Running       0          3d
    csi-plugin-qpnm4         4/4     Running       0          3d
    csi-plugin-wczgm         4/4     Running       0          3d
  2. Check whether the CSI storage component is the latest version.

    kubectl get ds csi-plugin -n kube-system -o yaml |grep image

    You can confirm the image version in the image field of the returned information, as shown in the following example:

    image: registry-cn-shanghai-vpc.ack.aliyuncs.com/acs/csi-plugin:v1.33.1-67e8986-aliyun

    If the storage component is not the latest version, please upgrade csi-plugin and csi-provisioner. For information about the latest version of storage components, see csi-plugin.

    Note

    You can also find the csi-plugin and csi-provisioner components on the Components page in the console to confirm version information and upgrade components.

  3. Check the YAML of PV, PVC, and StorageClass to confirm that the driver configuration (driver or provisioner field) is using the CSI storage component and is consistent with the storage component used by the current cluster.

3. Check if the PVC is in Bound state

  1. Check the PVC status.

    kubectl get pvc
  2. If the PVC is not in Bound state, refer to the following methods for troubleshooting and resolution.

    Cause

    • Static: The selectors of the PVC and PV fail to meet certain conditions. For example, the selector configuration of the PVC is different from that of the PV, the StorageClass names are inconsistent, or the PV status is Released.

    • Dynamic: Issues with the csi-provisioner component.

    Solution

4. Check if the pod is in Running state

  1. Check the pod status.

    kubectl get pod
  2. If the PVC is in Bound state but the pod is not in Running state, please refer to the following methods for troubleshooting and resolution based on the storage volume type.

    Disk volumes

    Important

    When using disk volumes, you need to ensure that the ECS node specification to which the Pod is scheduled supports mounting that type of disk, and that the Pod and disk are in the same region and zone. For information about the matching relationship between disk types and ECS instance specifications, see Overview of instance families.

    Cause

    • No node is available for scheduling.

    • An error occurs when the system mounts the disk.

    • The ECS instance does not support the specified disk type.

    Solution

    NAS volumes

    Important
    • The node and NAS must be in the same VPC. If they are not in the same VPC, use Cloud Enterprise Network to connect them.

    • NAS supports cross-zone mounting.

    • The mount directory for Extreme NAS must start with /share.

    Cause

    • fsGroups are used when mounting NAS, and there are many files, causing chmod to be slow.

    • Port 2049 is blocked in the security group rules.

    • The NAS file system and node are deployed in different VPCs.

    Solution

    • Check whether fsGroups are configured. If yes, remove them, restart the pod, and try to mount again.

    • Check whether port 2049 of the node that hosts the pod is blocked. If yes, unblock the port and try again. For more information, see Add a security group rule.

    • If the NAS file system and node are deployed in different VPCs, use Cloud Enterprise Network to connect them.

    • For other issues, run kubectl describe pods <pod-name> to view the pod events.

    OSS volumes

    Important
    • When mounting OSS to a node, you need to fill in the AccessKey information in the PV, which can be used through the Secret method.

    • When using OSS across regions, you need to change the Bucket URL to a public address. For the same region, it is recommended to use an internal address.

    Cause

    • fsGroups are used when mounting OSS, and there are many files, causing chmod to be slow.

    • The OSS bucket and node are created in different regions and the internal endpoint of the OSS bucket is used. As a result, the node fails to connect to the bucket endpoint.

    Solution

    • Check whether fsGroups are configured. If yes, remove them, restart the pod, and try to mount again.

    • Check whether the OSS bucket and node are created in different regions and whether the internal endpoint is used. If yes, change to the public endpoint.

    • For other issues, run kubectl describe pods <pod-name> to view the pod events.

FAQ

When creating or mounting a storage volume, PVC prompts no volume plugin matched

Issue

When creating or mounting a storage volume, PVC prompts Unable to attach or mount volumes: unmounted volumes=[xxx], unattached volumes=[xxx]: failed to get Plugin from volumeSpec for volume "xxx" err=no volume plugin matched.

Cause

The volume plug-in does not match the YAML template. As a result, the system cannot find the corresponding volume plug-in when creating or mounting a volume.

Solution

Check whether the volume plug-in exists in the cluster.

  • If the volume plug-in is not installed, install the plug-in. For more information, see Manage components.

  • If the volume plug-in is already installed, check whether the volume plug-in matches the YAML templates of the PV and PVC and whether the YAML templates meet the following requirements:

    • The CSI plug-in is deployed by following the steps as required. For more information, see Storage CSI.

    • The FlexVolume plug-in is deployed by following the steps as required. For more information, see Storage FlexVolume.

      Important

      Since FlexVolume is deprecated, it is recommended to migrate to the CSI component as soon as possible. For more information, see Migrate from FlexVolume to CSI.

Pod's Event prompts 0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims

Issue

The pod fails to start, and the pod event shows:

0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims. preemption: 0/x nodes are available: x Preemption is not helpful for scheduling

Cause

The custom StorageClass referenced by the pod is not found because the custom StorageClass does not exist.

Solution

You need to check whether the StorageClass referenced by the current pod exists. If it does not exist, you need to recreate the StorageClass.

PV is in Released state and cannot be bound by recreating PVC

Issue

After a PVC is accidentally deleted, the PV is in Released state and cannot be bound by recreating the PVC.

Cause

If the reclaimPolicy of the PVC is Retain, when the PVC is accidentally deleted, the PV will change to Released state.

Solution

You need to delete the pv.spec.claimRef field in the current PV, and then rebind using the static volume method. This will change the PV to Bound state.

PV is in Lost state and cannot be bound by recreating PVC

Issue

After creating PVC and PV, the PV is in Lost state and cannot be bound to the PVC.

Cause

The PVC name referenced by claimRef in the PV does not exist, causing the PV state to be Lost.

Solution

You need to delete the pv.spec.claimRef field in the current PV, and then rebind using the static volume method. This will change the PV to Bound state.

Will changes to StorageClass affect existing storage?

If the YAML files of PVC and PV do not change, changes to StorageClass will not affect existing storage. For example, when modifying the ALLOWVOLUMEEXPANSION field of StorageClass, it will only take effect after modifying the Capacity of the PVC. If the YAML file of the PVC does not change, this field will not affect the existing configuration.