All Products
Search
Document Center

Container Service for Kubernetes:FAQ about CSI, FAQ about CSI

Last Updated:Mar 05, 2024

This topic describes how to troubleshoot common issues related to storage and provides answers to some frequently asked questions about disk volumes and Apsara File Storage NAS (NAS) volumes.

Type

Issue

Common issues

Common issues

FAQ about disk volumes

FAQ about creating disks

FAQ about mounting disks

FAQ about unmounting disks

FAQ about resizing disks

Why does the system fail to dynamically expand a disk and generate the "Waiting for user to (re-)start a pod to finish file system resize of volume on node" PVC event?

FAQ about using disks

Why does the system prompt input/output error when an application performs read and write operations on the mount directory of a disk volume?

FAQ about NAS volumes

FAQ about OSS volumes

FAQ about mounting OSS volumes

    FAQ about using OSS volumes

      FAQ about detection failures in the ACK console

        FAQ about volume plug-ins

        FAQ about cloud-native storage

        FAQ about migrating from FlexVolume to CSI

        FAQ about migrating from FlexVolume to CSI

        Other storage issues

        Common issues

        Perform the following steps to view the log of a volume plug-in and identify issues.

        1. Run the following command to check whether events related to persistent volume claims (PVCs) or pods are generated:

          kubectl get events

          Expected output:

          LAST SEEN   TYPE      REASON                 OBJECT                                                  MESSAGE
          2m56s       Normal    FailedBinding          persistentvolumeclaim/data-my-release-mariadb-0         no persistent volumes available for this claim and no storage class is set
          41s         Normal    ExternalProvisioning   persistentvolumeclaim/pvc-nas-dynamic-create-subpath8   waiting for a volume to be created, either by external provisioner "nasplugin.csi.alibabacloud.com" or manually created by system administrator
          3m31s       Normal    Provisioning           persistentvolumeclaim/pvc-nas-dynamic-create-subpath8   External provisioner is provisioning volume for claim "default/pvc-nas-dynamic-create-subpath8"
        2. Check whether the FlexVolume or CSI plug-in is deployed in the cluster.

          • Run the following command to check whether the FlexVolume plug-in is deployed in the cluster:

            kubectl get pod -n kube-system |grep flexvolume

            Expected output:

            NAME                      READY   STATUS             RESTARTS   AGE
            flexvolume-***            4/4     Running            0          23d
          • Run the following command to check whether the CSI plug-in is deployed in the cluster:

            kubectl get pod -n kube-system |grep csi

            Expected output:

            NAME                       READY   STATUS             RESTARTS   AGE
            csi-plugin-***             4/4     Running            0          23d
            csi-provisioner-***        7/7     Running            0          14d
        3. Check whether the volume template matches the template of the volume plug-in used in the cluster. The supported volume plug-ins are FlexVolume and CSI.

          If this is the first time you mount volumes in the cluster, check whether the driver specified in the persistent volume (PV) and StorageClass is a CSI driver or a FlexVolume driver. The name of the driver that you specified must be the same as the type of the volume plug-in that is deployed in the cluster.

        4. Check whether the volume plug-in is updated to the latest version.

          • Run the following command to query the image version of the FlexVolume plug-in:

            kubectl get ds flexvolume -n kube-system -oyaml | grep image

            Expected output:

            image: registry.cn-hangzhou.aliyuncs.com/acs/Flexvolume:v1.14.8.109-649dc5a-aliyun

            For more information about FlexVolume, see FlexVolume (Deprecated).

          • Run the following command to query the image version of the CSI plug-in:

            kubectl get ds csi-plugin -n kube-system -oyaml |grep image

            Expected output:

            image: registry.cn-hangzhou.aliyuncs.com/acs/csi-plugin:v1.18.8.45-1c5d2cd1-aliyun

            For more information about the CSI plug-in, see csi-plugin and csi-provisioner.

        5. View logs.

          • If a PVC of the disk type is in the Pending state, the related PV is not created. You must check the log of the Provisioner plug-in.

            • If the FlexVolume plug-in is deployed in the cluster, run the following command to print the log of alicloud-disk-controller:

              podid=`kubectl get pod -nkube-system | grep alicloud-disk-controller | awk '{print $1}'`
              kubectl logs <PodID> -n kube-system
            • If the CSI plug-in is deployed in the cluster, run the following command to print the log of csi-provisioner:

              podid=`kubectl get pod -n kube-system | grep csi-provisioner | awk '{print $1}'`
              kubectl logs <PodID> -n kube-system -c csi-provisioner
              Note

              Two pods are created to run csi-provisioner. After you run the kubectl get pod -nkube-system | grep csi-provisioner | awk '{print $1}' command, two podid values are returned. Then, run the kubectl logs <PodID> -nkube-system -c csi-provisioner command on each pod.

          • If a mounting error occurs when the system starts a pod, you must check the log of FlexVolume or csi-plugin.

            • If the FlexVolume plug-in is deployed in the cluster, run the following command to print the log of FlexVolume:

              kubectl get pod <pod-name> -owide

              Log on to the Elastic Compute Service (ECS) instance where the pod runs and check the log of FlexVolume in the /var/log/alicloud/flexvolume_**.log directory.

            • If the CSI plug-in is deployed in the cluster, run the following command to print the log of csi-plugin:

              nodeID=`kubectl get pod <pod-name> -owide | awk 'NR>1 {print $7}'`
              podID=`kubectl get pods -nkube-system -owide -lapp=csi-plugin | grep $nodeID|awk '{print $1}'`
              kubectl logs <PodID> -nkube-system
          • View the log of kubelet.

            Run the following command to query the node on which the pod runs:

            kubectl get pod <pod-name> -owide | awk 'NR>1 {print $7}'

            Log on to the node and check the log files in the /var/log/message directory.

        Quick recovery

        If you fail to mount volumes to most of the pods on a node, you can schedule the pods to other nodes. For more information, see Schedule pods to specific nodes.

        csi-plugin update failures

        csi-plugin is deployed through a DaemonSet. If nodes that are in the NotReady state or a state other than Running exist in the cluster, Container Service for Kubernetes (ACK) fails to update csi-plugin. You need to manually fix the nodes and perform the update again. For more information, see Manage the CSI plug-in.

        csi-plugin startup failures

        Issue

        csi-provisioner and csi-plugin fail to be started. The main container logs of csi-plugin and csi-provisioner report the 403 - Forbidden error.

        Cause

        Security hardening is enabled for the metadata servers on nodes. The metadata cannot be accessed because CSI does not support security hardening.

        Solution

        Submit a ticket to contact the ECS team for technical support.

        What do I do if the csi-provisioner update fails because the number of nodes in the cluster does not meet the requirements of the update precheck?

        Issues

        1. The csi-provisioner plug-in fails to pass the precheck because the number of nodes in the cluster does not meet the requirement.

        2. The csi-provisioner plug-in passes the precheck and can be updated. However, the csi-provisioner pod crashes and the following 403 Forbidden error is found in the log:

          time="2023-08-05T13:54:00+08:00" level=info msg="Use node id : <?xml version=\"1.0\" encoding=\"iso-8859-1\"?>\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n         \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\n <head>\n  <title>403 - Forbidden</title>\n </head>\n <body>\n  <h1>403 - Forbidden</h1>\n </body>\n</html>\n"

        Cause

        Cause for issue 1:

        To ensure the high availability of csi-provisioner, csi-provisioner runs in a primary pod and a secondary pod. The primary and secondary pods are scheduled to different nodes. If your cluster has only one node, you cannot update csi-provisioner.

        Cause for issue 2:

        The security hardening mode is enabled for the node where csi-provisioner resides. This mode prevents access to the metadata server on the node.

        Solutions

        Solution for issue 1:

        Update csi-provisioner. For more information, see Manage the CSI plug-in.

        Solution for issue 2:

        Disable the security hardening mode on the node to allow CSI to access the metadata of the node.

        What do I do if the csi-provisioner update fails due to StorageClasses attribute changes?

        Issue

        csi-provisioner fails the precheck because the attributes of StorageClasses do not meet the requirements.

        Cause

        The attributes of the default StorageClasses are modified. You have deleted and recreated StorageClasses that have the same names as the default StorageClasses. The attributes of the default StorageClasses cannot be changed. Otherwise, csi-provisioner may fail to be updated.

        Solution

        Delete the following default StorageClasses: alicloud-disk-essd, alicloud-disk-available, alicloud-disk-efficiency, alicloud-disk-ssd, and alicloud-disk-topology. The deletion operation does not affect the applications in the cluster. Then, reinstall csi-provisioner. After csi-provisioner is reinstalled, the preceding default StorageClasses are automatically recreated.

        Important

        If you want to create custom StorageClasses, use names that are different from the names of the preceding default StorageClasses.

        Do StorageClass changes affect existing volumes?

        StorageClass changes do not affect existing volumes if the YAML files of the PVCs or PVs are not modified. For example, after you modify the ALLOWVOLUMEEXPANSION setting in a StorageClass, the new setting takes effect only if you modify the Capacity parameter in the YAML file of the PVC.

        What do I do if the "failed to renew lease xxx timed out waiting for the condition" error is displayed in the log of csi-provisioner?

        Issue

        After you run the kubectl logs csi-provisioner-xxxx -nkube-system command to query the log of csi-provisioner, the failed to renew lease xxx timed out waiting for the condition error appears in the log.

        Cause

        Multiple replicated pods are provisioned for csi-provisinoer to implement high availability. Kubernetes uses Leases to perform a leader election among the replicated pods of a component. During the election, csi-provisioner accesses the Kubernetes API server of the cluster to request the specified Lease. The replicated pod that acquires the Lease becomes the leader to provide services in the cluster. This issue occurs because csi-provisioner cannot access the Kubernetes API server of the cluster.

        Solution

        Check whether the cluster network and Kubernetes API server of the cluster are in the normal state. If the issue persists, submit a ticket.

        OOM issues caused by volume plug-ins

        csi-provisioner is a centralized volume plug-in. Sidecar containers are used to cache information about pods, PVs, and PVCs. When the size of the cluster grows, out of memory (OOM) errors may occur. When an OOM error occurs, you need to modify resource limits based on the size of the cluster.

        1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

        2. On the Clusters page, click the name of the cluster that you want to manage and choose Operations > Add-ons in the left-side navigation pane.

        3. On the Add-ons page, click the 图标 icon in the lower-right part of the csi-provisioner component and click View in YAML.

        4. Modify the resource limits in the YAML file based on the size of the cluster.

          修改limit大小

        Why does the system prompt no volume plugin matched for the PVC when I create or mount a volume?

        Issue

        The system prompts Unable to attach or mount volumes: unmounted volumes=[xxx], unattached volumes=[xxx]: failed to get Plugin from volumeSpec for volume "xxx" err=no volume plugin matched for the PVC when you create or mount a volume.

        Cause

        The volume plug-in does not match the YAML template. As a result, the system cannot find the corresponding volume plug-in when creating or mounting a volume.

        Solution

        Check whether the volume plug-in exists in the cluster.

        • If the volume plug-in is not installed, install the plug-in. For more information, see Manage components.

        • If the volume plug-in is already installed, check whether the volume plug-in matches the YAML templates of the PV and PVC and whether the YAML templates meet the following requirements:

          • The CSI plug-in is deployed by following the steps as required. For more information, see CSI overview.

          • The FlexVolume plug-in is deployed by following the steps as required. For more information, see FlexVolume overview.

            Important

            FlexVolume is deprecated. If the version of your ACK cluster is earlier than 1.18, we recommend that you migrate from FlexVolume to CSI. For more information, see Migrate from FlexVolume to CSI.

        What do I do if a large volume of traffic is recorded in the monitoring data of the csi-plugin pod?

        Issue

        A large volume of traffic is recorded in the monitoring data of the csi-plugin pod.

        Cause

        csi-plugin is responsible for mounting NAS volumes to nodes. If a NAS volume is mounted to a pod on a node, requests from the pod to the NAS volume pass through the namespace where csi-plugin is deployed. The requests are monitored by the cluster. As a result, a large volume of traffic is recorded in the monitoring data of the csi-plugin pod.

        Solution

        You do not need to fix this issue. The volume of traffic that flows through csi-plugin does not double. In addition, the traffic that flows through csi-plugin does not consume additional network bandwidth.

        Why does the system generate the 0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims event for a pod?

        Issue

        The system generates the 0/x nodes are available: x pod has unbound immediate PersistentVolumeClaims. preemption: 0/x nodes are available: x Preemption is not helpful for scheduling event for a pod.

        Cause

        The custom StorageClass referenced by the pod is not found because the custom StorageClass does not exist.

        Solution

        If the pod uses a dynamically provisioned volume, find the custom StorageClass that is referenced by the pod. If the StorageClass does not exist, create one.

        What do I do if the PV is in the Released state and cannot be bound to the recreated PVC?

        Issue

        You accidentally deleted the PVC. The PV is in the Released state and cannot be bound to the PVC that you recreated.

        Cause

        If the reclaimPolicy of the PVC is Retain, the status of the PV changes to Released after you delete the PVC.

        Solution

        You need to delete the pv.spec.claimRef field for the PV and then bind the PV to the PVC as a statically provisioned volume. This way, the status of the PV changes to Bound.

        What do I do if the PV is in the Lost state and cannot be bound to the recreated PVC?

        Issue

        After the PVC and PV are created, the PV remains in the Lost state and cannot be bound to the PVC.

        Cause

        The PVC name that is specified in the claimRef field of the PV does not exist. As a result, the status of the PV changes to Lost.

        Solution

        You need to delete the pv.spec.claimRef field for the PV and then bind the PV to the PVC as a statically provisioned volume. This way, the status of the PV changes to Bound.

        FAQ about migrating from FlexVolume to CSI

        In earlier ACK versions, FlexVolume is used as the volume plug-in. FlexVolume is deprecated in later versions. If the version of your ACK cluster is earlier than 1.18, we recommend that you migrate from FlexVolume to CSI. For more information, see Migrate from FlexVolume to CSI.

        Other StorageClass issues

        In the case that the mountOption parameter contains spelling errors, the StorageClass referenced by a PVC does not exist, or the domain name of a mount target does not exist, we recommend that you use Container Network File System (CNFS) volumes. For more information about CNFS, see CNFS overview.

        Can multiple applications in a cluster use the same volume?

        How do I change the configurations of the StorageClasses automatically created for a disk?

        You cannot modify the StorageClasses that are automatically created.

        After csi-provisioner is installed, StorageClasses such as alicloud-disk-topology-alltype are automatically created in the cluster. Do not modify these StorageClasses. For more information about the StorageClasses of disks, see StorageClass. If you need to modify the configurations of a StorageClass, such as the volume type, performance, and reclaim policy, you can create a new StorageClass. The number of StorageClasses that you can create is unlimited. For more information, see Create a StorageClass.