All Products
Search
Document Center

Container Service for Kubernetes:Cluster check items and suggestions on how to fix cluster issues

Last Updated:Oct 27, 2023

Container Service for Kubernetes (ACK) supports cluster update check, cluster migration check, component check, and node pool check. This topic describes cluster check items and provides suggestions on how to fix cluster issues.

Table of contents

Cluster check items

Cluster update check

Kubernetes is complex. In new Kubernetes versions, changes may be made to the runtime, certain Kubernetes APIs may be deprecated, and new features may be introduced. Due to these updates, high risks exist when you update clusters. To ensure that you can smoothly update your cluster, ACK provides the cluster update check feature. A precheck is automatically triggered before a cluster is updated. The cluster is updated only if the cluster passes the precheck.

Cluster update check consists of the following checks:

  • Cluster resource check: checks cloud resources related to ACK clusters, such as Server Load Balancer (SLB) instances, Elastic Compute Service (ECS) instances, and virtual private clouds (VPCs).

  • Cluster component check: checks the configurations of ACK clusters, components, and applications. For example, the system checks whether the component versions meet the requirement or whether the applications are using deprecated APIs.

  • Cluster configuration check: checks configurations related to the nodes in ACK clusters. To perform the cluster configuration check, the system needs to create a pod on each node to collect information.

The cluster update check items vary based on the type, runtime, and version of the cluster. The check items in the following table are for reference only. The actual check items in the console shall prevail.

Type

Check item

Description

Cluster resources

APIServer SLB

Checks whether the SLB instance exists.

Checks whether the status of the SLB instance is normal.

Checks whether the configurations of the SLB listeners are valid, including the listener ports and protocol.

Checks whether the configurations of the SLB backend server groups are valid.

Checks whether the configuration of SLB access control is valid. If no access control is configured, this check item displays Normal.

VPC

Checks whether the VPC exists.

Checks whether the status of the VPC is normal.

vSwitch

Checks whether the vSwitch exists.

Checks whether the status of the vSwitch is normal.

Checks whether the vSwitch can provide no less than two idle IP addresses.

ECS

Checks whether the ECS instance exists.

Checks whether the status of the ECS instance is normal.

Checks whether the security group of the ECS instance is normal.

Checks whether the ECS instance has expired.

Checks whether the instance type of the ECS instance meets the requirement.

Checks whether the status of the Cloud Assistant client is normal.

Cluster components

Kube Proxy Master

Checks whether the component exists.

Kube Proxy Worker

Checks whether the component exists.

API Service

Checks whether unavailable API Services exist.

Cluster instances

Checks whether the number of master instances in the cluster is three or five.

Cluster components

Checks whether the version of Terway meets the requirement.

Checks whether the version of CoreDNS meets the requirement.

Checks whether the version of the cloud controller manager meets the requirement.

Checks whether the version of the NGINX Ingress controller meets the requirement.

Checks whether the version of ACK Virtual Node meets the requirement.

Checks whether the version of the metric server meets the requirement.

Nodes

Checks whether the node IP address exists.

Checks whether the node is schedulable.

Checks whether the node is ready.

Checks whether the operating system of the node can be updated.

Checks whether the number of available pods on the node is greater than two.

Deprecated APIs

Checks whether the cluster uses deprecated APIs.

Cluster configurations

iptables configurations

Checks whether the iptables configurations are valid.

Operating systems

Checks whether the operating system can be updated.

yum

Checks whether Yum is normal.

Disks

Checks whether the file system of the node is normal.

Checks whether the free disk space of the node exceeds 5% of the total disk space.

Swap

Checks whether the node has Swap enabled.

NTP

Checks whether the NTP of the node is normal.

Systemd

Checks whether the Systemd version of the node is later than systemd-219-67.

Kubelet

Checks whether the kubelet configuration meets the requirement.

Container runtime

Checks whether the Docker runtime or Containerd runtime is normal.

Kernel configuration

Checks whether the kernel configuration of the node is normal.

Manifest configuration

Checks whether the manifest file meets the requirement.

Cluster migration check

A precheck is automatically triggered before a cluster is migrated. The cluster is migrated only if the cluster passes the precheck. Cluster migration check is suitable for the following scenarios:

  • Migrate from an ACK dedicated cluster to an ACK Pro cluster.

  • Migrate from an ACK Basic cluster to an ACK Pro cluster.

Cluster migration check consists of the following checks:

  • Cluster resource check: checks cloud resources related to ACK clusters, such as SLB instances, ECS instances, and VPCs.

  • Cluster component check: checks the configurations of the components in ACK clusters. For example, the system checks whether unavailable API Services exist.

  • Cluster configuration check: checks configurations related to the nodes in ACK clusters. To perform the cluster configuration check, the system needs to create a pod on each node to collect information.

  • Use of components: After you migrate from an ACK dedicated cluster to an ACK Pro cluster, some components are managed by ACK. Therefore, the system checks whether these components are normal before the migration.

The cluster migration check items vary based on the type, runtime, and version of the cluster. The check items in the following table are for reference only. The actual check items in the console shall prevail.

Type

Check item

Description

Cluster resources

APIServer SLB

Checks whether the SLB instance exists.

Checks whether the status of the SLB instance is normal.

Checks whether the configurations of the SLB listeners are valid, including the listener ports and protocol.

Checks whether the configurations of the SLB backend server groups are valid.

Checks whether the configuration of SLB access control is normal. If no access control is configured, this check item displays Normal.

VPC

Checks whether the VPC exists.

Checks whether the status of the VPC is normal.

vSwitch

Checks whether the vSwitch exists.

Checks whether the status of the vSwitch is normal.

Checks whether the vSwitch can provide no less than two idle IP addresses.

ECS

Checks whether the ECS instance exists.

Checks whether the status of the ECS instance is normal.

Checks whether the security group of the ECS instance is normal.

Checks whether the status of the Cloud Assistant client is normal.

Cluster components

Kube Proxy Master

Checks whether the component exists.

Kube Proxy Worker

Checks whether the component exists.

API Service

Checks whether unavailable API Services exist.

Cluster instances

Checks whether the number of master instances in the cluster is three or five.

Nodes

Checks whether the node IP address exists.

Checks whether the node is schedulable.

Checks whether the node is ready.

Checks whether the operating system of the node can be updated.

Checks whether the number of available pods on the node is greater than two.

Cluster configurations

Operating systems

Checks whether the operating system can be updated.

yum

Checks whether Yum is normal.

Use of components

Cloud Controller Manager

Checks whether the cloud controller manager is normal.

Component check

Component check is suitable for component update scenarios. A precheck is automatically triggered before a component is updated. The component is updated only if the cluster passes the precheck.

The component check items vary based on the type, runtime, and version of the cluster. The check items in the following table are for reference only. The actual check items in the console shall prevail.

Type

Check item

Description

cloud-controller-manager

Addon_CCM

Checks whether the update causes SLB changes.

Component_Block_Version

Checks whether the cloud controller manager can be updated.

csi-plugin

DaemonSet_Annotation

Checks whether the annotations of the DaemonSet meet the requirement.

Csi_Driver_Attributes

Checks whether the CSI driver attribute meets the requirement.

Node_Status_Ready

Checks whether the node is ready.

csi-provisioner

Stateful_Set_Exist

Checks whether the resource is a StatefulSet.

Deployment_Annotation

Checks whether the annotations of the Deployment meet the requirement.

Storage_Class_Attributes

Checks whether the StorageClass attribute meets the requirement.

Csi_Provisioner_Node_Count

Checks whether the number of ready nodes is equal to or greater than two.

terway-eniip

Systemd

Checks whether the Systemd version of the node is later than systemd-219-67.

nginx-ingress-controller

Deployment_Healthy

Checks whether the NGINX Ingress Deployment is healthy.

Deployment_Not_Under_HPA

Checks whether a horizontal pod autoscaler (HPA) is configured for the Deployment.

Deployment_Not_Modified

Checks whether the Deployment is changed.

Nginx_Ingress_Pod_Error_Log

Checks whether NGINX error logs are generated.

LoadBalancer_Service_Healthy

Checks whether the NGINX Services are healthy.

Nginx_Ingress_Configuration

Checks whether incompatible configurations exist in Ingresses.

aliyun-acr-credential-helper

RamRole_Exist

Checks whether the component is assigned the AliyunCSManagedAcrRole role.

ack-cost-exporter

RamRole_Exist

Checks whether the component is assigned the AliyunCSManagedCostRole role.

Node pool check

Node pool check is suitable for node pool update scenarios. A precheck is automatically triggered before a node pool is updated. The node pool is updated only if the node pool passes the precheck.

Node pool check consists of the following checks:

  • Cluster resource check: checks cloud resources related to ACK clusters, such as SLB instances and VPCs.

  • Cluster component check: checks the configurations of ACK clusters, nodes, and applications.

  • Cluster configuration check: checks configurations related to the nodes in ACK clusters. To perform the cluster configuration check, the system needs to create a pod on each node to collect information.

The node pool check items vary based on the type, runtime, and version of the cluster. The check items in the following table are for reference only. The actual check items in the console shall prevail.

Type

Check item

Description

Cluster resources

APIServer SLB

Checks whether the SLB instance exists.

Checks whether the status of the SLB instance is normal.

Checks whether the configurations of the SLB listeners are valid, including the listener ports and protocol.

Checks whether the configurations of the SLB backend server groups are valid.

Checks whether the configuration of SLB access control is valid. If no access control is configured, this check item displays Normal.

VPC

Checks whether the VPC exists.

Checks whether the status of the VPC is normal.

vSwitch

Checks whether the vSwitch exists.

Checks whether the status of the vSwitch is normal.

Checks whether the vSwitch can provide no less than two idle IP addresses.

Cluster components

API Service

Checks whether unavailable API Services exist.

Cluster instances

Checks whether the number of master instances in the cluster is three or five.

Nodes

Checks whether the node is ready.

Checks whether the number of available pods on the node is greater than two.

HostPath

Checks whether pods that use hostPath exist on the node.

Cluster configurations

iptables configurations

Checks whether the iptables configurations are valid.

Operating systems

Checks whether the operating system can be updated.

yum

Checks whether Yum is normal.

Disks

Checks whether the file system of the node is normal.

Free disk space on nodes

Checks whether the free disk space of the node exceeds 5% of the total disk space.

Swap

Checks whether the node has Swap enabled.

NTP

Checks whether the NTP of the node is normal.

Systemd

Checks whether the Systemd version of the node is later than systemd-219-67.

Kubelet

Checks whether the kubelet configuration meets the requirement.

Container runtime

Checks whether the Docker runtime or Containerd runtime is normal.

Kernel configuration

Checks whether the kernel configuration of the node is normal.

Manifest configuration

Checks whether the manifest file meets the requirement.

Suggestions on how to fix cluster issues

Issue

Suggestion

Role Aliyun_ARMS_CMonitor_Role missing

Grant the cluster permissions on Managed Service for Prometheus. For more information about how to manually grant permissions on Application Real-Time Monitoring Service (ARMS) and Tracing Analysis, see Enable Kubernetes Monitoring for a Kubernetes cluster.

Outdated Systemd version

Update Systemd.

Outdated component version

Update the component. For more information, see Manage components.

Yum timeout

Run the following command to check whether Yum times out. The default timeout period is 10 seconds.

time if type yum&>/dev/null; then yum list yum; fi

Unavailable API Services

  1. Run the following command to check for unavailable API Services:

    kubectl -n kube-system get apiservices |grep -i false
  2. Confirm the purpose of the unavailable API service. If the API Service is no longer needed, run the following command to delete the Service.

    Important

    Proceed with caution because cluster exceptions may occur if you delete API Services that are still in use. If you cannot determine the purpose of the API Service, submit a ticket.

    kubectl -n kube-system delete apiservices ${your-abnormal-apiservice-name}

Pods using hostPath

When the system updates a node by replacing its system disk, if the pods on the node use hostPath to mount the container directory to the host, data lost may occur. You need to check the directory that is mounted by the pods. If hostPath is not used or no risk of data loss exists, you can proceed with the update. The check result is for reference only.

Use of deprecated APIs

Identify the resource that uses the deprecated APIs and take actions accordingly. For more information, see Deprecated APIs.

Deprecated APIs

If your cluster runs Kubernetes 1.20 or later, the precheck checks whether deprecated APIs are used in your cluster. You can view the deprecated APIs that are used by the cluster in the check report.

For example, before you update your cluster from Kubernetes 1.20 to Kubernetes 1.22, the system checks whether deprecated APIs are used in your cluster by scanning the audit logs that were generated the previous day.

  • The precheck result is for reference only. You can proceed with the update even if your cluster runs Kubernetes 1.20 and uses deprecated APIs.

  • If you continue to use the deprecated APIs in Kubernetes 1.22, potential security risks may exist.

The following table describes the types of deprecated APIs. Before you update a cluster that uses deprecated APIs, we recommend that you refer to the Type column of the following table and perform operations that correspond to the type of deprecated API used by the cluster.

Type

Suggestion

Example

core

Key Kubernetes components: ACK automatically updates key Kubernetes components. You do not need to update the components. Information about the components is not displayed on the precheck page.

apiserver, scheduler, and kube-controller-manager

ack

ACK components: ACK components require manual update. You can update ACK components based on the instructions on the Add-ons page of the ACK console.

Note
  • You can log on to the ACK console and choose Operations > Add-ons to update components. The next day after the components are updated, the deprecated APIs are not displayed.

  • The CoreDNS component may use deprecated APIs in clusters that run Kubernetes 1.24 and later. If the check report includes CoreDNS, see Why does CoreDNS use deprecated APIs? and proceed accordingly.

  • Deprecated APIs in the precheck result are for reference only. You can proceed with the update even if your cluster uses deprecated APIs. After your cluster is updated, your cluster uses new APIs. To avoid security risks, we recommend that you do not use deprecated APIs after you update your cluster.

metrics-server, nginx-ingress-controller, and coredns

opensource

Open source components: Some open source components are listed in the ACK console. You can decide whether to update the components. These components can only be manually updated. Other open source components may be classified into the unknown type.

Note

Deprecated APIs in the precheck result are for reference only. You can proceed with the update even if your cluster uses deprecated APIs. Update the components based on your business requirements.

rancher and elasticsearch-operator

unknown

Unknown sources: Deprecated APIs that do not belong to the preceding types are considered unknown resources and listed in the ACK console. You can decide whether to update the components. These components can only be manually updated.

Note

Deprecated APIs in the precheck result are for reference only. You can proceed with the update even if your cluster uses deprecated APIs. Update the components based on your business requirements.

kubectl, agent, Go-http-client, and okhttp

Perform the following operations to view the information about a deprecated API:

  1. On the Upgrade Cluster page, click Precheck and then click View Details.

  2. On the Report page, click the Cluster Components tab, and then click the Troubleshoot tab.

  3. Click the button next to Deprecated Kubernetes APIs.1.jpg

  4. In the dialog box that appears, click Deprecated Kubernetes APIs and click the link below.2.jpg

  5. On the Deprecated Kubernetes APIs page, you can view the information about the deprecated APIs.3