Manually update ACK clusters or update control planes and node pools separately - Container Service for Kubernetes

Outdated cluster versions may have security and stability issues. To ensure business continuity, Container Service for Kubernetes (ACK) uses in-place updates to update ACK clusters. You can update the Kubernetes version of an ACK cluster in the ACK console, or update the control planes and node pools of the cluster separately. This topic describes the usage notes before and after an update and the procedure for updating ACK clusters.

Why ACK clusters need updates

ACK guarantees the stability of the latest three Kubernetes minor versions. For example, ACK discontinues the support for Kubernetes 1.24 when ACK is updated to support the following even-numbered versions: Kubernetes 1.26, 1.28, and 1.30. In this case, you can no longer create ACK clusters that run Kubernetes 1.24.

Proactive updates provide the following benefits:

Reduced security and stability risks: New Kubernetes versions are usually released to add optimizations and patch security and stability vulnerabilities. Using outdated Kubernetes clusters may pose security and stability risks to your businesses.
Improved technical support and customer service: ACK no longer releases security patches or repairs for outdated Kubernetes versions. In addition, ACK does not guarantee the quality of technical support for outdated Kubernetes versions. You can enjoy improved technical support and customer service when using new Kubernetes versions.
New features: The iteration of open source Kubernetes usually comes with new features and improvements. ACK will also support these features to optimize your development and maintenance experience.

In addition, for security purposes, ACK reserves the right to force outdated ACK clusters to update to the earliest Kubernetes version supported by ACK. We recommend that you perform the following steps to proactively update your ACK clusters.

Important

When you update an ACK cluster, ACK performs a precheck on the cluster, but ACK does not guarantee that all incompatible features, configurations, and APIs are detected. According to the shared responsibility model, we recommend that you pay attention to the release of Kubernetes versions by checking the documentation, information in the console, and internal messages, and learn the update notes of the corresponding version before you update the cluster.

For more information about how ACK supports Kubernetes versions, see Support for Kubernetes versions.

Usage notes (important)

Kubernetes versions

You can update an ACK cluster from a version only to the next version. For example, to update the Kubernetes version of an ACK cluster from 1.26 to 1.30, you need to first update the cluster to Kubernetes 1.28 and then update it to Kubernetes 1.30.

To view the Kubernetes version of an ACK cluster, log on to the ACK console. Go to the Clusters page, find the cluster that you want to manage and check the Version column. After you decide the Kubernetes version to which you want to update your cluster, see the support for the corresponding Kubernetes version to learn the version details, discontinued APIs, and usage notes for the update. Read the release notes for the corresponding Kubernetes version. This helps you avoid compatibility issues caused by feature updates in new Kubernetes versions.

Note

If the YAML file of your Helm chart uses discontinued resources, modify the file at the earliest opportunity. For more information, see the preceding release notes and the Deprecated APIs section of the "Cluster check items and suggestions on how to fix cluster issues" topic.

Features and custom configurations

If your ACK cluster uses the features listed in the following table, read the precautions and suggested solutions.

Feature	Precaution	Suggested solution
FlexVolume	Object Storage Service (OSS) volumes that are mounted by using FlexVolume 1.11.2.5 or earlier are remounted during a cluster update.	After the update is complete, you need to recreate the pods that use OSS volumes. FlexVolume is discontinued. We recommend that you use CSI. For more information, see Upgrade from FlexVolume to CSI.
Auto scaling of nodes	If auto scaling is enabled, the cluster automatically updates Cluster Autoscaler to the latest version after the cluster is updated. This ensures that the auto scaling feature can be used as expected. After auto scaling is enabled, nodes in swift mode may shut down and fail to be updated.	Make sure that Cluster Autoscaler is updated to the latest version. For more information, see Enable node auto scaling. If nodes in swift mode failed to be updated after the cluster is updated, we recommend that you manually remove the nodes.
Resource reservation	After you update the Kubernetes version of an ACK cluster to 1.18, ACK automatically configures resource reservation. If resource reservation is not configured for the cluster and the resource usage of nodes is high, ACK may fail to schedule evicted pods to the nodes after the cluster is updated.	Reserve sufficient resources on the nodes. We recommend that you reserve at least 50% CPU resources and at least 70% memory resources. For more information, see Resource reservation policy.
LoadBalancer configurations	ACK Edge clusters require Server Load Balancer (SLB) instances to handle external access. However, if `externalTrafficPolicy: Local` is specified for an SLB instance, traffic is forwarded only to node-local pods. If your application pods are deployed on other nodes, traffic cannot reach these pods.	Check whether externalTrafficPolicy: Local is specified for an SLB instance in case the SLB instance cannot forward traffic to the application pods. For more information, see What Can I Do if the Cluster Cannot Access the IP Address of the SLB Instance Exposed by the LoadBalancer Service.
API server	When ACK updates a cluster, it attempts to update the control planes without interrupting communication with the applications in the cluster. However, communication with the API server may be temporarily interrupted. The interruption affects applications that strongly rely on the API server. For example, if your application needs to list and watch resources, the watch operation is interrupted when the API server restarts. To resolve this issue, you need to configure the application to automatically retry the watch operation when an interruption occurs.	If your application does not need to access the API server, the application is not affected by the update.
Startup probe	If the pods in a cluster are configured with a startup probe, the pods may temporarily enter the NotReady state after the kubelet is restarted.	We recommend that you deploy multiple replicated pods and spread the pods across nodes. This ensures that your application still has sufficient pods when one of the nodes restarts.
kubectl	After a cluster is updated, we recommend that you update kubectl on your on-premises machine. Otherwise, the kubectl version may become incompatible with the API server version. As a result, the `invalid object doesn't have additional properties` error message may appear.	Install or update kubectl. For more information, see Install Tools.

If your cluster uses custom configurations, read the descriptions in the following table.

Feature	Description
Network	To update a cluster, you need to use Yum to download the required software packages. If your cluster uses custom network configurations or a custom OS image, you need to ensure that Yum can run as expected. You can run the `yum makecache` command to check the status of Yum.
OS image	Custom OS images are not strictly validated by ACK. ACK does not guarantee that your cluster can be updated if your cluster uses a custom OS image.
Others	If your cluster uses other custom configurations, such as swap partitions or kubelet configurations modified by using the CLI, the cluster may fail to be updated or the custom configurations may be lost during the update.

Update procedure, methods, and duration

Update process

Preparations and precheck:
- Usage notes: Before you update a cluster, read the release notes for the corresponding Kubernetes version to learn the usage notes for updates. This helps you avoid compatibility issues caused by feature updates. For more information, see the Kubernetes versions section of this topic.
- Precheck: Run a precheck to identify potential update risks. If update risks are identified, follow the instructions in the console or see Cluster check items and suggestions on how to fix cluster issues to fix the issues.
Cluster update: After the cluster passes the precheck, you can update the cluster, including the control planes and node pools. ACK allows you to update a cluster in the following modes:
- Concurrent update: The control planes and node pools are updated concurrently.
- Separate update: ACK first updates the control planes and then updates the node pools.
Control plane updates involve the key component kube-apiserver. Node pool updates involve the kubelet and its dependent components. To ensure cluster stability and reliability, ACK must ensure that kube-apiserver is up to two versions earlier than the kubelet. To meet this requirement, ACK needs to update the control planes separately and then updates the node pools during off-peak hours.
After cluster update: Verify the versions of the cluster and kubelet, check whether the node pools run as normal, and check whether the applications in the cluster run as normal.

Update methods

Update control planes

ACK managed clusters and ACK Serverless clusters

ACK managed clusters and ACK Serverless clusters use rolling updates. Procedure:

Update control planes and managed control plane components, including kube-apiserver, kube-controller-manager, cloud-controller-manager, and kube-scheduler.
Update Kubernetes components, such as kube-proxy.

ACK dedicated clusters

ACK dedicated clusters use in-place updates in order to ensure business continuity and reduce potential risks posed by data migration and configuration modifications. Procedure:

When ACK identifies that etcd and container runtime in your cluster need to be updated, it updates etcd and container runtime on each master node in sequence.
The system updates only one master node each time and displays the ID of the master node.
Update Master components such as kube-apiserver, kube-controller-manager, cloud-controller-manager, and kube-scheduler.
Update the kubelet on master nodes.
Update Kubernetes components, such as kube-proxy, after all master nodes are updated.

Update node pools

During a node pool update, the kubelet, OS image, and container runtime are updated. To replace the OS or update the container runtime from Docker to containerd, ACK needs to replace the system disks of the nodes during the update. We recommend that you back up the data in the system disks of the nodes before you update the cluster. In other scenarios, ACK uses in-place updates to update node pools. For more information, see Update a node pool

The nodes in an ACK cluster is updated in batches based on the following batch update strategy:

ACK updates node pools one after one.
The nodes in a node pool are updated in batches. The first batch includes one node. The number of nodes increases based on the powers of two in subsequent batches. The batch update policy still applies after you resume a paused update process. You can specify the batch size on the Node Pool Upgrade page. We recommend that you set the batch size to 10. For more information, see Update a node pool.

Update duration

For ACK managed clusters and ACK Serverless clusters, it requires about 5 minutes to update the control planes. For ACK dedicated clusters, ACK needs to update the master nodes one after one. It requires about 8 minutes to update a master node. Nodes in a node pool are updated in batches. It requires about 5 minutes to update a batch.

Procedure

Update only control planes

Limits

You can update only control planes when you update an ACK cluster that runs Kubernetes 1.18 or later.

Procedure

Before you update the node pools, you need to update the control planes first.

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to upgrade and choose More > Operations > Upgrade Cluster in the Actions column.
In the Update Items section of the Upgrade Cluster page, select an available Kubernetes version, set the Update Mode parameter to Control Planes Only, and then click Precheck.
After the precheck is complete, click Details to view the report.
- If the result is normal in the report, the cluster passes the precheck and you can update the cluster.
- If the result is abnormal in the report, the cluster can still run as expected and the cluster status does not change. Click the Troubleshoot tab and follow the suggestions displayed on the page to fix the issues. For more information, see Cluster check items and suggestions on how to fix cluster issues.
  Note
  If your cluster runs Kubernetes 1.20 or later, the precheck checks whether discontinued APIs are used in your cluster. The precheck result is for reference only and does not determine whether the cluster can be updated. For more information, see the Deprecated APIs section of the "Cluster check items and suggestions on how to fix cluster issues" topic.
After the cluster passes the precheck, click Start Update.
You can view the update progress on the Upgrade Cluster page. After the update is complete, you can go to the Clusters page, find the cluster that you want to manage, and check the Kubernetes version of your cluster to verify that the control plane components are updated.

Next step: Update node pools

After the control planes are updated, new nodes are added to the cluster based on the updated Kubernetes version. We recommend that you update the existing nodes during off-peak hours at the earliest opportunity and confirm the kubelet version after the update is complete. For more information, see Update a node pool.

Update control planes and all node pools

Log on to the ACK console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to upgrade and choose More > Operations > Upgrade Cluster in the Actions column.
In the Update Items section of the Upgrade Cluster page, select an available Kubernetes version and set the Update Mode parameter to Control Planes and All Node Pools. In the Batch Update Policy section, configure the Maximum Number of Nodes to Repair per Batch parameter and click Precheck.
After the precheck is complete, click View Details to view the report.
- If the result is normal in the report, you can continue to update the cluster.
- If the result is abnormal in the report, click the Troubleshoot tab and follow the suggestions to fix the issues. For more information, see Cluster check items and suggestions on how to fix cluster issues.
After the cluster passes the precheck, click Start Update.
During the update, do not add or remove nodes. To add or remove nodes, you need to first cancel the update. You can check the update progress in the Event Rotation section of the Upgrade Cluster page and perform the following operations based on your business requirements:
- Pause and resume the update: Click Pause to pause the update. To resume the update, click Continue.
  After you pause the update, the cluster remains in an intermediate state. Do not perform any operations on the cluster when the update is paused and complete the update at the earliest opportunity. The update is terminated after the cluster remains in the Paused state for seven days. ACK will automatically delete the events and logs related to the update.
- Cancel the update: Click Cancel. In the message that appears, click OK. After you cancel the update, ACK continues to update the nodes in the current batch and the update cannot be rolled back. The remaining batches are not updated.
  Note
  - If an error occurs during the update, ACK pauses the update. The cause of the failure is displayed in the lower part of the page. You can follow the suggestions to troubleshoot the error.
  - Do not modify the resources in the kube-upgrade namespace during the update unless an error occurs.
After the update is complete, you can go to the Clusters page and check the Kubernetes version of your cluster to verify that the control plane components are updated. You can also go to the cluster details page and choose Nodes > Nodes in the left-side navigation pane to view the Kubernetes version of the nodes.

FAQ

What do I do if the update for master nodes in an ACK dedicated cluster times out?

Cause

The self-signed server certificate of the admission webhook component does not contain the Subject Alternative Name field. As a result, the master components fail to start up.

Solution

Run the following commands to check whether the self-signed server certificate of the admission webhooks contains the Subject Alternative Name field. You need to run the following commands on nodes on which kubectl is configured.

Run the following command to query the admission webhooks in the cluster:

kubectl get mutatingwebhookconfigurations

Expected output:

NAME                                      WEBHOOKS   AGE
ack-node-local-dns-admission-controller   1          27h

Run the following command to query the Service configured for an admission webhook:

kubectl get mutatingwebhookconfigurations ack-node-local-dns-admission-controller -oyaml | grep service -A 5
    service:
      name: ack-node-local-dns-admission-controller
      namespace: kube-system
      path: /inject
      port: 443
  failurePolicy: Ignore

Run the following command to query the cluster IP address of the Service:

kubectl -n kube-system get service ack-node-local-dns-admission-controller

Expected output:

NAME                                      TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)   AGE
ack-node-local-dns-admission-controller   ClusterIP   192.168.XX.XX   <none>        443/TCP   27h

Run the following command to use the cluster IP address to access the admission webhook, obtain its certificate, and check whether the Subject Alternative Name field exists:
```
openssl s_client -connect 192.168.XX.XX:443 -showcerts </dev/null 2>/dev/null|openssl x509 -noout -text
```

What do I do if a cluster update fails and the "the aliyun service is not running on the instance" error message is returned?

Cause

The Cloud Assistant agent becomes unavailable. As a result, the update command fails to be sent to the cluster.

Solution

Start or stop the Cloud Assistant agent. Then, update the cluster again. For more information, see Start, stop, or uninstall the Cloud Assistant Agent.

How do I handle the PLEG not healthy error?

The containers or container runtime does not respond. You need to restart the nodes and initiate the update again.

References

Cluster check items and suggestions on how to fix cluster issues
Increase the number of nodes in an ACK cluster
Best practices for clusters
ACK clusters that run Kubernetes 1.24 and later no longer use Docker as the built-in container runtime. Before you update your cluster to Kubernetes 1.24 or later, you need to change the container runtime to containerd. For more information, see Change the container runtime from Docker to containerd.