This topic describes the diagnostic procedure for storage and how to troubleshoot storage exceptions.

Diagnostic procedure

Procedure
  1. Run the following command to view the pod event. Check whether the pod fails to be launched due to a storage issue.
    kubectl describe pods <pod-name>
    If the pod is in the following state, the volume is mounted to the pod. In this scenario, the pod fails to be launched due to other issues, such as CrashLoopBackOff. To resolve the issue, Submit a ticket. pod
  2. Run the following command to check whether the Container Storage Interface (CSI) plug-in works as normal:
    kubectl get pod -n kube-system |grep csi

    Expected output:

    NAME                       READY   STATUS             RESTARTS   AGE
    csi-plugin-***             4/4     Running            0          23d
    csi-provisioner-***        7/7     Running            0          14d
    Note If the status of the pod is not Running, run the kubectl describe pods <pod-name> -n kube-system command to check the reason that causes containers to exit and view the pod event.
  3. Run the following command to check whether the version of the CSI plug-in is up-to-date.
    kubectl get ds csi-plugin -n kube-system -oyaml |grep image

    Expected output:

    image: registry.cn-****.aliyuncs.com/acs/csi-plugin:v*****-aliyun

    For more information about the latest CSI version, see csi-plugin and csi-provisioner. If your cluster uses an earlier CSI version, update the plug-in to the latest version. For more information, see Manage system components. For more information about how to troubleshoot volume plug-in update failures, see Troubleshoot component update failures.

  4. Troubleshoot the pod pending issue.
  5. Troubleshoot the issue that the status of the persistent volume claim (PVC) is not Bound.
  6. If the issue persists, Submit a ticket.

Troubleshoot component update failures

If you fail to update the csi-provisioner and csi-plugin components, perform the following steps to troubleshoot the issue.

csi-provisioner

  • By default, the csi-provisioner component is deployed by using a Deployment that creates two pods. The pods cannot be scheduled to the same node because they are mutually exclusive. If you fail to update the component, check whether only one node is available in the cluster.
  • For version 1.14 or earlier, the csi-provisioner component is deployed by using a StatefulSet. If the csi-provisioner component in your cluster is deployed by using a StatefulSet, you can run the kubectl delete sts csi-provisioner command to delete the current csi-provisioner component. Then, log on to Container Service for Kubernetes (ACK) console and re-install the csi-provisioner component. For more information, see Manage system components.

csi-plugin

  • Check whether the cluster contains nodes that are in the NotReady state. If NotReady nodes exist, ACK fails to update the DaemonSet that is used to deploy the csi-plugin component.
  • If you fail to update the csi-plugin component but all plug-ins work as normal, the issue is caused by an update timeout error. If a timeout error occurs when the component center updates the csi-plugin component, the component center automatically rolls back the update. To resolve this issue, Submit a ticket.

Disk troubleshooting

Note
  • To mount a disk to a node, make sure that the node and disk are created in the same region and zone. If they are created in different regions or zones, you fail to mount the disk to the node.
  • The types of disks supported by different types of Elastic Compute Service (ECS) instances vary. For more information, see Overview of instance families.

The status of the pod is not Running

Problem:

The status of the PVC is Bound but the status of the pod is not Running.

Cause:

  • No node is available for scheduling.
  • An error occurs when the system mounts the disk.
  • The ECS instance does not support the specified disk type.

Solution:

The status of the PVC is not Bound

Problem:

The status of the PVC is not Bound and the status of the pod is not Running.

Cause:

  • Static: The selectors of the PVC and persistent volume (PV) fail to meet certain conditions. Therefore, the PV and PVC cannot be associated. For example, the selector configuration of the PVC is different from that of the PV, the selectors use different StorageClass names, or the status of the PV is Release.
  • Dynamic: The csi-provisioner component fails to create the disk.

Solution:

  • Static: Check the relevant YAML content. For more information, see Mount a statically provisioned disk volume by using kubectl.
    Note If the status of the PV is Release, the PV cannot be reused. You need to create a new PV to use the disk.
  • Dynamic: Run the kubectl describe pvc <pvc-name> -n <namespace> command to view the PVC event.
  • If an error occurs when you call the ECS API to create a disk, refer to ErrorCode and troubleshoot the issue. If the issue persists, Submit a ticket.

NAS troubleshooting

Note
  • To mount a NAS file system to a node, make sure that the node and NAS file system are deployed in the same virtual private cloud (VPC). If the node and NAS file system are deployed in different VPCs, use Cloud Enterprise Network (CEN) to connect them.
  • You can mount a NAS file system to a node that is deployed in a zone different from the NAS file system.
  • The path to which an Extreme NAS file system or CPFS 2.0 file system is mounted must start with /share.

The status of the pod is not Running

Problem:

The status of the PVC is Bound but the status of the pod is not Running.

Cause:

  • fsGroups are used when you mount the NAS file system. chmod is slowed down because a large number of files need to be handled.
  • Port 2049 is blocked in the security group rules.
  • The NAS file system and node are deployed in different VPCs.

Solution:

  • Check whether fsGroups are configured. If yes, delete the fsGroups, restart the pod, and try to mount the NAS file system again.
  • Check whether port 2049 of the node that hosts the pod is blocked. If yes, unblock the port and try again. For more information, see Add a security group rule.
  • If the NAS file system and node are deployed in different VPCs, use CEN to connect them.
  • For other causes, run the kubectl describe pods <pod-name> command to view the pod event.

The status of the PVC is not Bound

Problem:

The status of the PVC is not Bound and the status of the pod is not Running.

Cause:

  • Static: The selectors of the PVC and PV fail to meet certain conditions. Therefore, the PV and PVC cannot be associated. For example, the selector configuration of the PVC is different from that of the PV, the selectors use different StorageClass names, or the status of the PV is Release.
  • Dynamic: The csi-provisioner component fails to mount the NAS file system.

Solution:

  • Static: Check the relevant YAML content. For more information, see Mount a statically provisioned NAS volume by using kubectl.
    Note If the status of the PV is Release, the PV cannot be reused. Create a new PV that uses the NAS file system.
  • Dynamic: Run the kubectl describe pvc <pvc-name> -n <namespace> command to view the PVC event.

OSS troubleshooting

Note
  • When you mount an OSS bucket to a node, you need to specify the AccessKey pair in the PV. You can store the AccessKey pair in a Secret.
  • If the OSS bucket and node are created in different regions, set Bucket URL to the public endpoint of the OSS bucket. If the OSS bucket and node are created in the same region, we recommend that you use the private endpoint of the OSS bucket.

The status of the pod is not Running

Problem:

The status of the PVC is Bound but the status of the pod is not Running.

Cause:

  • fsGroups are used when you mount the OSS bucket. chmod is slowed down because a large number of files need to be handled.
  • The OSS bucket and node are created in different regions and the private endpoint of the OSS bucket is used. As a result, the node fails to connect to the bucket endpoint.

Solution:

  • Check whether fsGroups are configured. If yes, delete the fsGroups, restart the pod, and try to mount the OSS bucket again.
  • Check whether the OSS bucket and node are created in the same region. If they are created in different regions, check whether the private endpoint of the OSS bucket is used. If yes, change to the public endpoint of the OSS bucket.
  • For other causes, run the kubectl describe pods <pod-name> command to view the pod event.

The status of the PVC is not Bound

Problem:

The status of the PVC is not Bound and the status of the pod is not Running.

  • Static: The selectors of the PVC and PV fail to meet certain conditions. Therefore, the PV and PVC cannot be associated. For example, the selector configuration of the PVC is different from that of the PV, the selectors use different StorageClass names, or the status of the PV is Release.
  • Dynamic: The csi-provisioner component fails to mount the OSS bucket.

Solution: