All Products
Search
Document Center

Container Service for Kubernetes:Update an ACK cluster

Last Updated:Aug 15, 2023

You can update the Kubernetes version of a Container Service for Kubernetes (ACK) cluster, or update only the control plane or node pools of an ACK cluster. This topic describes how to perform a precheck before you update the Kubernetes version of an ACK cluster. This topic also describes the update procedure and precautions for Kubernetes version updates, and how to update only the control plane or node pools of an ACK cluster

Precautions

ACK ensures the stability of ACK clusters that run the latest three Kubernetes major versions. The stability of ACK clusters that run earlier Kubernetes versions is not ensured and you may fail to update these clusters to the latest version. However, these issues do not adversely affect your applications deployed in these clusters. We recommend that you update your ACK clusters at the earliest opportunity. For more information, see Support for Kubernetes versions.

Do not add nodes, remove nodes, or perform other operations on the cluster during the update process. Applications that run in an ACK cluster are not interrupted during the update. Requests from applications to the API server may be temporarily interrupted. We recommend that you perform the update during off-peak hours.

Kubernetes version descriptions

You can update the Kubernetes version of an ACK cluster only to the next version. For example, the Kubernetes version of an ACK cluster is 1.18. If you want to update the Kubernetes version to 1.24, you must manually update the version to 1.20, 1.22, and then 1.24 in sequence.

Version

Description

Kubernetes 1.22

  • In Kubernetes 1.22.10 and later, kube-proxy no longer listens on NodePort Service ports. If the node port range specified by the ServiceNodePortRange parameter of the API server overlaps with the kernel parameter net.ipv4.ip_local_port_range of nodes, TCP connections may occasionally fail. In addition, health checks may fail to be performed for your applications and errors may occur on your applications.

    After you update the Kubernetes version of an ACK cluster to 1.22.10 or later, make sure that the values of the two parameters do not overlap. For more information, see How do I configure a proper node port range? and Kubernetes community PR.

  • The Docker runtime is no longer supported by clusters that run this Kubernetes version. Before you update to Kubernetes 1.24, we recommend that you change the container runtime from Docker to containerd for nodes by updating node pools. For more information, see Change the container runtime from Docker to containerd. For ACK dedicated clusters, the container runtime is automatically changed from Docker to containerd during the Kubernetes version update.

    All containers on master nodes are recreated during a Kubernetes version update. Back up data in the containers before you start a Kubernetes version update.

Kubernetes 1.20

  • The selfLink field is not supported by clusters that run this Kubernetes version. If both FlexVolume and alicloud-nas-controller are deployed in your cluster, you must update the image version of alicloud-nas-controller to 1.14.8.17-7b898e5-aliyun or later before you update the Kubernetes version of your cluster to 1.20. FlexVolume is deprecated. We recommend that you upgrade from FlexVolume to Container Storage Interface (CSI). For more information, see Upgrade from FlexVolume to CSI.

  • The issue that the timeout period of exec probes does not take effect is fixed in Kubernetes 1.20. If you have configured exec probes for a pod in a cluster that runs a Kubernetes version earlier than 1.20, you must ensure that the exec probe can be terminated within the specified timeout period before you update Kubernetes to 1.20 or later. If the exec probe is not terminated within the timeout period, the pod may be recreated after the Kubernetes version update is completed.

Kubernetes 1.16 to 1.18

  • Before you update the Kubernetes version of an ACK cluster, read Update notes for CSI block volumes if applications in the cluster use disk volumes whose type is Block Volume.

  • After you update the Kubernetes version of an ACK cluster to 1.18, ACK automatically configures resource reservation. Workloads on cluster nodes may be evicted when the resource usage of the nodes is high and resource reservation is not configured. For more information about resource reservation, see Resource reservation policy.

Kubernetes v1.14

Before you update an ACK cluster to Kubernetes 1.14, make sure that the IP address of the Server Load Balancer (SLB) instance exposed by a LoadBalancer Service is accessible. For more information, see What Can I Do if the Cluster Cannot Access the IP Address of the SLB Instance Exposed by the LoadBalancer Service.

Affected features

The following table describes the features that may be affected by cluster updates.

Feature

Description

FlexVolume

Object Storage Service (OSS) volumes that are mounted by using FlexVolume 1.11.2.5 or later are remounted during a cluster update. After the update is completed, you must recreate the pods that use OSS volumes.

FlexVolume is deprecated. We recommend that you upgrade from FlexVolume to CSI. For more information, see Upgrade from FlexVolume to CSI.

Auto scaling

If auto scaling is enabled for cluster nodes, the cluster automatically updates Cluster Autoscaler to the latest version after the cluster is updated. This ensures that the auto scaling feature can function as normal. For more information, see Auto scaling of nodes.

Resource reservation

If the resource usage of the cluster is excessively high, the system may fail to schedule the evicted pods when the node update fails. We recommend that you reserve resources for the cluster nodes. Do not use more than 50% of CPU resources or more than 70% of memory resources.

kubectl

After the update is completed, we recommend that you update kubectl on your on-premises machine. For more information about how to install kubectl, see Install kubectl.

If you do not update kubectl, the kubectl version may be incompatible with the API server version. As a result, the error message invalid object doesn't have additional properties may appear.

Custom configurations

The following custom configurations may affect cluster updates.

Custom configuration item

Description

Network

To update a cluster, you need to use Yum to download the required software packages. If you have modified the network configuration of the nodes, make sure that Yum can function as expected on the nodes. You can run the yum makecache command to check whether Yum functions as normal.

OS image

To update a cluster, you need to use Yum to download the required software packages. If you have used a custom OS image, make sure that Yum can function as expected on the nodes. You can run the yum makecache command to check whether Yum functions as normal.

Custom images are not strictly validated by ACK. If you specify a custom OS image, we cannot guarantee a successful node update.

Others

If you have modified the configurations of the ACK cluster, such as swap partition changes or kubelet configuration changes by using the CLI, the update may fail or your custom configurations may be lost.

How ACK clusters are updated

The cluster update process consists of the precheck, control plane update, and node update. The system runs a precheck before updating an ACK cluster. The cluster can be updated only after it passes the precheck. After the cluster passes the precheck, you can update the control plane and nodes at the same time or update only the control plane or node pools of the cluster. For more information, see Update an ACK cluster, or update only the control plane or node pools of an ACK cluster. This section describes how to update the control plane and nodes.

Control plane update

ACK managed clusters and serverless Kubernetes (ASK) clusters

  1. Control planes and managed components, including kube-apiserver, kube-controller-manager, and kube-scheduler, are updated.

  2. Kubernetes components, such as kube-proxy, are updated.

ACK dedicated clusters

  1. Optional. The etcd and container runtime on the master nodes are updated in sequence.

  2. The system updates only one master node each time and displays the ID of the master node.

  3. Master components, including kube-apiserver, kube-controller-manager, and kube-scheduler, are updated.

  4. The kubelet on master nodes is updated.

  5. Kubernetes components, such as kube-proxy, are updated after all master nodes are updated.

Node pool update

Cluster nodes are updated based on the batch update policy. The batch update policy specifies the following rules:

  • Node pools are updated one after one.

  • The nodes in a node pool are updated in batches. The first batch includes one node. The number of nodes increases based on the powers of two in subsequent batches. The batch update policy still applies after you resume a paused update.

  • The system updates at most 10 nodes in each batch.

Update an ACK cluster, or update only the control plane or node pools of an ACK cluster

If your cluster shows that the Kubernetes version can be updated to the latest version or the Kubernetes version is outdated, applications that are deployed in the cluster are not adversely affected. In this scenario, we recommend that you update your cluster during off-peak hours at the earliest opportunity to prevent security and stability risks that arise from the outdated Kubernetes version.

Before you update your cluster, perform a precheck on cluster resources, components, and cluster configurations to ensure that your cluster meets the requirements. If a cluster fails to pass the precheck, the cluster can still run as expected and the cluster status does not change. After your cluster passes the precheck, you can update the Kubernetes version of the cluster, the control plane of the cluster, and node pools in the cluster. ACK allows you to customize cluster updates. You can use this feature to update the control plane before updating node pools of an ACK cluster.

Update the cluster

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to upgrade and choose More > Operations > Upgrade Cluster in the Actions column.

  3. On the Upgrade Cluster page, select Control Planes and All Node Pools for Update Mode in the Update Items section and set the Maximum Number of Nodes to Repair per Batch parameter in the Batch Update Policy section. Then, click Precheck.

    After the precheck is completed, click Details to view the report.

    • If the result is normal in the report, the cluster passes the precheck and you can update the cluster.

    • If the result is abnormal in the report, the cluster can still run as expected and the cluster status does not change. Click the Troubleshoot tab and follow the suggestions displayed on the page to fix the issues. For more information, see Cluster check items and suggestions on how to fix cluster issues.

      Note

      If your cluster runs Kubernetes 1.20 or later, the precheck checks whether deprecated APIs are used in your cluster. The precheck result is for reference only and does not determine whether the cluster is updatable. For more information, see Deprecated APIs.

  4. After the cluster passes the precheck, click Start Update.

    You can view the update progress on the Upgrade Cluster page. You can perform the following operations:

    • Pause and resume the update: If you want to pause the update, click Pause. After you pause the update, you can click Continue to resume the update.

      Do not perform operations after you pause the update of a cluster. In addition, we recommend that you resume and complete the update at your earliest opportunity. If the update is paused for more than seven days, the system automatically terminates the update process. The events and log data that are generated during the update process are also deleted.

    • Cancel the update: After the update is paused, you can click Cancel to cancel the update. In the message that appears, click Confirm to Cancel the update. After you cancel the update, the system will complete updating the nodes in the current batch and skip the nodes that have not been updated. After the update is canceled, you cannot roll back the nodes that have been updated.

      Note
      • If errors occur during the update, the system pauses the update. The cause of failure is displayed in the lower part of the page. You can follow the suggestions on how to fix the error.

      • Do not modify the resources in the kube-upgrade namespace during the update process unless an error occurs.

      • If the update fails, the update is paused. You must troubleshoot the error and delete the failed pods in the kube-upgrade namespace. You can restart the update after the error is fixed.

    After the update is completed, you can go to the Clusters page and check the Kubernetes version of your cluster to verify that the control plane components are updated. In the left-side navigation pane of the Clusters page, choose Nodes > Nodes and check the kubelet version to verify that the cluster nodes are updated.

Update only the control plane

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to upgrade and choose More > Operations > Upgrade Cluster in the Actions column.

  3. On the Upgrade Cluster page, select Control Planes Only for Update Mode in the Update Items section and set the Maximum Number of Nodes to Repair per Batch parameter in the Batch Update Policy section. Then, click Precheck.

    After the precheck is completed, click Details to view the report.

    • If the result is normal in the report, the cluster passes the precheck and you can update the cluster.

    • If the result is abnormal in the report, the cluster can still run as expected and the cluster status does not change. Click the Troubleshoot tab and follow the suggestions displayed on the page to fix the issues. For more information, see Cluster check items and suggestions on how to fix cluster issues.

      Note

      If your cluster runs Kubernetes 1.20 or later, the precheck checks whether deprecated APIs are used in your cluster. The precheck result is for reference only and does not determine whether the cluster is updatable. For more information, see Deprecated APIs.

  4. After the cluster passes the precheck, click Start Update.

    You can view the update progress on the Upgrade Cluster page. After the update is completed, you can go to the Clusters page and check the Kubernetes version of your cluster to verify that the control plane components are updated.

Update only the node pools

Before updating the nodepools of an ACK cluster, you should have updated the control plane.

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the name of the cluster that you want to manage and choose Nodes > Node Pools in the left-side navigation pane.

  3. Find the node pool that you want to update and choose More > Upgrade in the Actions column. In the Update Items section, select Kubelet Update and set other parameters. Then, click Precheck.

    If the cluster fails to pass the precheck, follow the suggestions displayed on the page fix the issues.

  4. After the cluster passes the precheck, click Start Update.

    To update all node pools in the cluster, you can repeat the preceding steps on each node pool. For more information about node pool updates, see Node pool updates.

    After the update is completed, you can go to the Clusters page and choose Nodes > Nodes in the left-side navigation pane and check the kubelet version to verify that the cluster nodes are updated.

Troubleshoot cluster update failures

What do I do if the master node update of an ACK dedicated cluster times out?

Cause

The self-signed server certificate of the admission webhook component does not contain the Subject Alternative Name field. As a result, the master component fails to start up.

Solution

Run the following commands to check whether the self-signed server certificate of the admission webhooks contains the Subject Alternative Name field. You need to run the following commands on nodes that have kubectl configured.

  1. Run the following command to query the admission webhooks in the cluster:

    kubectl get mutatingwebhookconfigurations

    Expected output:

    NAME                                      WEBHOOKS   AGE
    ack-node-local-dns-admission-controller   1          27h
  2. Run the following command to query the Service configured for an admission webhook:

    kubectl get mutatingwebhookconfigurations ack-node-local-dns-admission-controller -oyaml | grep service -A 5
        service:
          name: ack-node-local-dns-admission-controller
          namespace: kube-system
          path: /inject
          port: 443
      failurePolicy: Ignore
  3. Run the following command to query the cluster IP address of the Service:

    kubectl -n kube-system get service ack-node-local-dns-admission-controller

    Expected output:

    NAME                                      TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)   AGE
    ack-node-local-dns-admission-controller   ClusterIP   192.168.XX.XX   <none>        443/TCP   27h
  4. Run the following command to access the admission webhook through the cluster IP address to obtain the certificate, and check whether the certificate contains the Subject Alternative Name field:

    openssl s_client -connect 192.168.XX.XX:443 -showcerts </dev/null 2>/dev/null|openssl x509 -noout -text

What do I do if the "the aliyun service is not running on the instance" error message is prompted when I update an ACK dedicated cluster?

Cause

The Cloud Assistant agent is unavailable.

Solution

The Cloud Assistant agent becomes unavailable. As a result, the update command fails to be sent to the cluster. Start or stop the Cloud Assistant agent. Then, update the cluster again. For more information, see Start, stop, or uninstall the Cloud Assistant agent.

What do I do if the PLEG module of nodes is unhealthy?

The containers or container runtime does not respond. You need to restart the nodes and initiate the update again.

References