All Products
Search
Document Center

Container Service for Kubernetes:FAQ about CSI

Last Updated:May 26, 2023

This topic describes how to troubleshoot common issues related to storage and provides answers to some frequently asked questions about disk volumes and Apsara File Storage NAS (NAS) volumes.

Type

Question

Common issues

Common issues

FAQ about disk volumes

FAQ about creating disks

FAQ about mounting disks

FAQ about unmounting disks

FAQ about resizing disks

Why does the system fail to dynamically expand a disk and generate the "Waiting for user to (re-)start a pod to finish file system resize of volume on node" PVC event?

FAQ about NAS volumes

FAQ about OSS volumes

FAQ about volume plug-ins

FAQ about cloud-native storage

FAQ about migrating from FlexVolume to CSI

FAQ about migrating from FlexVolume to CSI

Other storage issues

Other storage issues

Common issues

Perform the following operations to view the log of a specified volume plug-in. This helps you locate issues.

  1. Run the following command to check whether events related to persistent volume claims (PVCs) or pods are generated:

    kubectl get events

    Expected output:

    LAST SEEN   TYPE      REASON                 OBJECT                                                  MESSAGE
    2m56s       Normal    FailedBinding          persistentvolumeclaim/data-my-release-mariadb-0         no persistent volumes available for this claim and no storage class is set
    41s         Normal    ExternalProvisioning   persistentvolumeclaim/pvc-nas-dynamic-create-subpath8   waiting for a volume to be created, either by external provisioner "nasplugin.csi.alibabacloud.com" or manually created by system administrator
    3m31s       Normal    Provisioning           persistentvolumeclaim/pvc-nas-dynamic-create-subpath8   External provisioner is provisioning volume for claim "default/pvc-nas-dynamic-create-subpath8"
  2. Check whether the FlexVolume or CSI plug-in is deployed in the cluster.

    • Run the following command to check whether the FlexVolume plug-in is deployed in the cluster:

      kubectl get pod -n kube-system |grep flexvolume

      Expected output:

      NAME                      READY   STATUS             RESTARTS   AGE
      flexvolume-***            4/4     Running            0          23d
    • Run the following command to check whether the CSI plug-in is deployed in the cluster:

      kubectl get pod -n kube-system |grep csi

      Expected output:

      NAME                       READY   STATUS             RESTARTS   AGE
      csi-plugin-***             4/4     Running            0          23d
      csi-provisioner-***        7/7     Running            0          14d
  3. Check whether the volume template matches the template of the volume plug-in used in the cluster. The supported volume plug-ins are FlexVolume and CSI.

    If this is the first time you mount volumes in the cluster, check whether the driver specified in the persistent volume (PV) and StorageClass is a CSI driver or a FlexVolume driver. The name of the driver that you specified must be the same as the type of the volume plug-in that is deployed in the cluster.

  4. Check whether the volume plug-in is updated to the latest version.

    • Run the following command to query the image version of the FlexVolume plug-in:

      kubectl get ds flexvolume -n kube-system -oyaml | grep image

      Expected output:

      image: registry.cn-hangzhou.aliyuncs.com/acs/Flexvolume:v1.14.8.109-649dc5a-aliyun

      For more information about the FlexVolume plug-in, see FlexVolume.

    • Run the following command to query the image version of the CSI plug-in:

      kubectl get ds csi-plugin -n kube-system -oyaml |grep image

      Expected output:

      image: registry.cn-hangzhou.aliyuncs.com/acs/csi-plugin:v1.18.8.45-1c5d2cd1-aliyun

      For more information about the CSI plug-in, see csi-plugin and csi-provisioner.

  5. View logs.

    • If a PVC of disk type is in the Pending state, the related PV is not created. You must check the log of the Provisioner plug-in.

      • If the FlexVolume plug-in is deployed in the cluster, run the following command to print the log of alicloud-disk-controller:

        podid=`kubectl get pod -nkube-system | grep alicloud-disk-controller | awk '{print $1}'`
        kubectl logs <PodID> -n kube-system
      • If the CSI plug-in is deployed in the cluster, run the following command to print the log of csi-provisioner:

        podid=`kubectl get pod -n kube-system | grep csi-provisioner | awk '{print $1}'`
        kubectl logs <PodID> -n kube-system -c csi-provisioner
        Note

        Two pods are created to run csi-provisioner. After you run the kubectl get pod -nkube-system | grep csi-provisioner | awk '{print $1}' command, two podid values are returned. Then, run the kubectl logs <PodID> -nkube-system -c csi-provisioner command on each pod.

    • If a mounting error occurs when the system starts a pod, you must check the log of FlexVolume or csi-plugin.

      • If the FlexVolume plug-in is deployed in the cluster, run the following command to print the log of FlexVolume:

        kubectl get pod <pod-name> -owide

        Log on to the Elastic Compute Service (ECS) instance where the pod runs and check the log of FlexVolume in the /var/log/alicloud/flexvolume_**.log directory.

      • If the CSI plug-in is deployed in the cluster, run the following command to print the log of csi-plugin:

        nodeID=`kubectl get pod <pod-name> -owide | awk 'NR>1 {print $7}'`
        podID=`kubectl get pods -nkube-system -owide -lapp=csi-plugin | grep $nodeID|awk '{print $1}'`
        kubectl logs <PodID> -nkube-system
    • View the log of kubelet.

      Run the following command to query the node on which the pod runs:

      kubectl get pod <pod-name> -owide | awk 'NR>1 {print $7}'

      Log on to the node and check the log files in the /var/log/message directory.

Quick recovery

If you fail to mount volumes to most of the pods on a node, you can schedule the pods to other nodes. For more information, see Schedule pods to specific nodes.

csi-plugin update failures

csi-plugin is deployed through a DaemonSet. If nodes that are in the NotReady state or a state other than Running exist in the cluster, ACK fails to update csi-plugin. You need to manually fix the nodes and perform the update again. For more information, see Install and update the CSI plug-in.

What do I do if the csi-provisioner update fails because the number of nodes in the cluster does not meet the requirements of the update precheck?

Issue

The csi-provisioner plug-in fails to pass the precheck because the number of nodes in the cluster does not meet the requirement.

Cause

To ensure the high availability of csi-provisioner, csi-provisioner runs in a primary pod and a secondary pod. The primary and secondary pods are scheduled to different nodes. If your cluster has only one node, you cannot update csi-provisioner.

Solution

Create a file named csi-provisioner.yaml to manually update csi-provisioner.

What do I do if the csi-provisioner update fails due to StorageClasses attribute changes?

Issue

csi-provisioner fails to pass the precheck because the attributes of StorageClasses do not meet the requirements.

Cause

The attributes of the default StorageClasses are modified. You have deleted and recreated StorageClasses that have the same names as the default StorageClasses. The attributes of the default StorageClasses must not be changed. Otherwise, csi-provisioner may fail to be updated.

Solution

Delete the following default StorageClasses: alicloud-disk-essd, alicloud-disk-available, alicloud-disk-efficiency, alicloud-disk-ssd, and alicloud-disk-topology. The deletion operation does not affect the applications that are running in the cluster. Then, reinstall csi-provisioner. After csi-provisioner is installed, the preceding default StorageClasses are automatically recreated.

Important

If you want to create custom StorageClasses, use names that are different from the names of the preceding default StorageClasses.

What do I do if the "failed to renew lease xxx timed out waiting for the condition" error is displayed in the log of csi-provisioner?

Issue

After you run the kubectl logs csi-provisioner-xxxx -nkube-system command to query the log of csi-provisioner, the failed to renew lease xxx timed out waiting for the condition error appears in the log.

Cause

Multiple replicated pods are provisioned for csi-provisinoer to implement high availability. Kubernetes uses Leases to perform leader election among the replicated pods of a component. During the election, csi-provisioner accesses the Kubernetes API server of the cluster to request a specified Lease. The replicated pod that acquires the Lease becomes the leader that provides services in the cluster. This issue occurs because csi-provisioner cannot access the Kubernetes API server of the cluster.

Solution

Check whether the cluster network and Kubernetes API server of the cluster are in the normal state. If the issue persists, Submit a ticket.

OOM issues caused by volume plug-ins

csi-provisioner is a centralized volume plug-in. Sidecar containers are used to cache information about pods, PVs, and PVCs. When the size of the cluster grows, out of memory (OOM) errors may occur. When an OOM error occurs, you need to modify resource limits based on the size of the cluster.

  1. Log on to the ACK console and click Clusters in the left-side navigation pane.

  2. On the Clusters page, click the name of a cluster and choose Operations > Add-ons in the left-side navigation pane.

  3. On the Add-ons page, click the Icon icon in the lower-right part of the csi-provisioner card and click View in YAML.

  4. Modify the resource limits in the YAML file based on the size of the cluster.

    Modify resource limits

Why does the system prompt no volume plugin matched for the PVC when I create or mount a volume?

Issue

The system prompts Unable to attach or mount volumes: unmounted volumes=[xxx], unattached volumes=[xxx]: failed to get Plugin from volumeSpec for volume "xxx" err=no volume plugin matched for the PVC when you create or mount a volume.

Cause

The volume plug-in does not match the YAML template. As a result, the system cannot find the corresponding volume plug-in when creating or mounting a volume.

Solution

Check whether the volume plug-in exists in the cluster.

  • If the volume plug-in is not installed, install the plug-in. For more information, see Manage components.

  • If the volume plug-in is already installed, check whether the YAML templates of the PV and PVC meet the following requirements:

    • The CSI plug-in is deployed by following the steps as required. For more information, see CSI overview.

    • The FlexVolume plug-in is deployed by following the steps as required. For more information, see FlexVolume overview.

      Important

      FlexVolume is deprecated. If the version of your ACK cluster is earlier than 1.18, we recommend that you migrate from FlexVolume to CSI. For more information, see Migrate from FlexVolume to CSI.

What do I do if a large volume of traffic is recorded in the monitoring data of the csi-plugin pod?

Issue

A large volume of traffic is recorded in the monitoring data of the csi-plugin pod.

Cause

csi-plugin is responsible for mounting NAS volumes to nodes. If a NAS volume is mounted to a pod on a node, requests from the pod to the NAS volume pass through the namespace where csi-plugin is deployed. The requests are monitored by the cluster. As a result, a large volume of traffic is recorded in the monitoring data of the csi-plugin pod.

Solution

You do not need to fix this issue. The volume of traffic that flows through the csi-plugin does not double. In addition, the traffic that flows through the csi-plugin does not consume additional network bandwidth.

Why does the system generate the 0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims event for a pod?

Issue

The system generates the 0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims. preemption: 0/x nodes are available: x Preemption is not helpful for scheduling event for the pod.

Cause

The custom StorageClass referenced by the pod is not found because the custom StorageClass does not exist.

Solution

If the pod uses a dynamically provisioned volume, find the custom StorageClass that is referenced by the pod. If the StorageClass does not exist, create one.

What do I do if the PV is in the Released state and cannot be bound to the recreated PVC?

Issue

You accidentally deleted the PVC. The PV is in the Released state and cannot be bound to the PVC that you recreated.

Cause

If the reclaimPolicy of the PVC is Retain, the status of the PV changes to Released after you delete the PVC.

Solution

You need to delete the pv.spec.claimRef field for the PV and then bind the PV to the PVC as a statically provisioned volume. This way, the status of the PV changes to Bound.

FAQ about migrating from FlexVolume to CSI

In earlier ACK versions, FlexVolume is used as the volume plug-in. FlexVolume is deprecated in later versions. If the version of your ACK cluster is earlier than 1.18, we recommend that you migrate from FlexVolume to CSI. For more information, see Migrate from FlexVolume to CSI.

Other storage issues

To avoid storage issues such as mountOption contains spelling errors, the StorageClass referenced by the PVC does not exist, and the mount target domain name does not exist, we recommend that you use Container Network File System (CNFS) volumes. For more information about CNFS, see CNFS overview.