All Products
Search
Document Center

Container Service for Kubernetes:Use ContainerOS to quickly scale out nodes

Last Updated:Oct 12, 2023

ContainerOS is an operating system that Alibaba Cloud provides for containerized development. ContainerOS is fully compatible with Kubernetes. ContainerOS has accelerated operating system startups and image pulling to improve the efficiency of node scale-outs for Container Service for Kubernetes (ACK). This topic describes how to use ContainerOS to quickly scale out nodes.

Table of contents

Prerequisites

  • ContainerOS is specified as the operating system of your managed node pool. For more information, see Configure a managed node pool to run ContainerOS.
  • If this is the first time you create a managed node pool that runs ContainerOS, make sure that the following components in your ACK cluster are updated to the latest version so that cluster nodes can be quickly scaled out:
    • The network component of the cluster: Terway or Flannel
    • The default volume component of the cluster: csi-plugin

    Go to the cluster information page of your cluster, choose Operations > Add-ons and check whether the components are updated to the latest version. If Upgrade is displayed in the lower-right part of the card of the component, click Upgrade to update the component.

Precautions

To accelerate the startup of the operating system, ContainerOS pre-installs container images to reduce the amount of time required for pulling the images. When you use ContainerOS, do not manually update the ACK components, including csi-plugin and Terway or Flannel. Otherwise, the pre-installed image version may differ from the application version and the startup of the operating system will be slowed down.

Note Container images are layered. Due to this characteristic, it requires less time and is more flexible for ContainerOS to update images than pulling images. We recommend that you update the corresponding components in advance to improve your experience when you scale out nodes.

Advantages of ContainerOS

OptimizationDescription
Operating system startup speedContainerOS simplifies the operating system startup procedure to accelerate operating system startups. ContainerOS is an operating system for virtual machines in the cloud. ContainerOS relies on only a small number of hardware drivers. You can change the required kernel driver module to the built-in mode. In addition, ContainerOS deprecates the initial ram file system (initramfs) and simplifies Udev rules to greatly speed up operating system startups. For example, Alibaba Cloud Linux 3 requires more than 1 minute for the initial boot on an Elastic Compute Service (ECS) instance of the ecs.g7.large type, but ContainerOS requires only two seconds.
Image pulling speedAfter the ECS nodes start up, ACK needs to pull the container images of specific components to complete certain basic tasks. By pre-installing the container images of the components that are required for cluster management, ContainerOS can reduce the time consumption of image pulling during node startups.

For example, if your cluster uses Terway, a node can change to the Ready state only after the pod of Terway is ready. The high network latency can severely increase the amount of time required for pulling images. To avoid this issue, ContainerOS pre-installs the container image of Terway in the operating system, which allows ACK to directly obtain the image from the local directory. This saves the time from pulling the image over the Internet.

Node elasticityContainerOS is integrated with the management capabilities of ACK to improve the elasticity of nodes.

The following figure shows the P90 duration of expanding an empty ACK node pool. The duration starts when the scale-out request is submitted and ends when 90% of the nodes are ready. Compared with the CentOS and Alibaba Cloud Linux 2 custom image solutions, ContainerOS has competitive advantages in performance. The statistics are displayed in the following figure.

P90 time consumption statistics
Important The statistics in this example are theoretical values. The actual values may vary based on the optimization of the service and environment.

Procedure

If you want to start up a large number of nodes, you can manually configure the Kubernetes controller manager, Kubernetes scheduler, and API server to accelerate node scale-outs. For example, if you want to scale out more than 100 ECS nodes at a time, you can use this method.
Note Some APIs support a maximum of 100 connections by default. In this scenario, no additional configuration is required when you start up less than 100 ECS nodes.

Configure traffic throttling for the Kubernetes controller manager

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the name of the cluster that you want to manage and choose Operations > Add-ons in the left-side navigation pane.

  3. On the Core Components tab of the Add-ons page, find Kube Controller Manager and click Configuration in the lower-right part of the card.
  4. In the dialog box that appears, set kubeAPIQPS to 800 and kubeAPIBurst to 1000, configure other parameters based on your business requirement, and then click OK.
    Note Based on our test results, we recommend that you use the preceding settings. If you have other requirements, you can modify the settings accordingly.

Configure traffic throttling for the Kubernetes scheduler

  1. Log on to the ACK console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, click the name of the cluster that you want to manage and choose Operations > Add-ons in the left-side navigation pane.

  3. On the Core Components tab of the Add-ons page, find Kube Scheduler and click Configuration in the lower-right part of the card.
  4. In the dialog box that appears, set connectionQPS to 800 and connectionBurst to 1000, configure other parameters based on your business requirement, and then click OK.
    Note Based on our test results, we recommend that you use the preceding settings. If you have other requirements, you can modify the settings accordingly.

Modify the number of replicas for the API server

The number of replicas for the API server is dynamically adjusted based on loads. If a large number of nodes are scaled out, the replicas are increased. More time is required for the nodes to reach the Ready state. You can Submit a ticket to adjust the number of replicas for the API server in order to accelerate node scale-outs.

References

ContainerOS overview