All Products
Search
Document Center

Container Service for Kubernetes:Auto scaling overview

Last Updated:Apr 18, 2024

Auto scaling is a feature that can dynamically scale computing resources to meet your business requirements. Auto scaling provides a more cost-effective method to manage your resources. This topic introduces auto scaling and the related components.

Background information

Auto scaling is widely used in Container Service for Kubernetes (ACK) clusters. Typically, auto scaling is used in scenarios such as online workload scaling, large-scale computing and training, GPU-accelerated deep learning, inference and training based on shared GPU resources, and periodical workload scheduling. Auto scaling enables elasticity from the following aspects:

  • Workload scaling (scheduling layer elasticity): scale workloads to adjust resource scheduling. For example, the Horizontal Pod Autoscaler (HPA) works at the scheduling layer. It can adjust the number of application pods to further adjust the amount of resources occupied by the current workload.

  • Node scaling (resource layer elasticity): scale out nodes when the cluster capacity cannot fulfill the cluster scheduling needs.

The components for workload scaling and node scaling can be used separately or in combination. If you want to decouple the components, you must scale the workload within the resource limit of the cluster.

Scaling components for ACK clusters

弹性伸缩Workload scaling components

Component

Description

Use scenario

Limit

References

HPA

A built-in component of Kubernetes. The HPA is used for online applications.

Online businesses

The HPA uses Deployments and StatefulSets to scale workloads.

Implement horizontal pod autoscaling

VPA (alpha)

An open source component. The Vertical Pod Autoscaler (VPA) is used for monolithic applications.

Monolithic applications

The VPA is used for applications that cannot be horizontally scaled. Typically, the VPA is used when pods are recovered from anomalies.

Vertical pod auto scaling

CronHPA

An open source component provided by ACK. The CronHPA is used for applications whose resource usage periodically changes.

Periodically changing workloads

The CronHPA uses Deployments and StatefulSets to scale workloads. The CronHPA is compatible with the HPA. You can use the CronHPA and HPA in combination to scale workloads.

Implement CronHPA

UnitedDeployment

A component provided by ACK. Elastic-Workload is used in scenarios where fine-grained scaling is required. For example, you can use Elastic-Workload if you want to distribute a workload across different zones.

Scenarios where fine-grained scaling is required

Elastic-Workload is applicable to online workloads that require fine-grained scaling. For example, you can deploy the pods of a Deployment on elastic container instances when Elastic Compute Service (ECS) instances are insufficient.

Use the UnitedDeployment controller in ACK clusters

Node scaling components

Component

Description

Use scenario

Resource delivery efficiency

References

cluster-autoscaler

cluster-autoscaler is an open source component provided by Kubernetes to scale nodes in a cluster. cluster-autoscaler is integrated with auto scaling to provide more elastic and cost-effective scaling services.

Applicable to online workloads, deep learning tasks, and large-scale computing tasks.

The time required to add 100 nodes to a cluster:

  • Standard mode: 120 seconds.

  • Swift mode: 60 seconds.

  • Standard mode with images that support quick boot (Qboot): 90 seconds.

  • Swift mode with images that support Qboot: 45 seconds.

    For more information about images that support Qboot, see Overview.

Enable node auto scaling

virtual-node

virtual-node is an open source component provided by ACK. virtual-node provides the runtime for serverless applications. Developers do not need to handle node resources and only need to create, manage, and pay for pods based on the actual usage.

virtual-node is used to handle traffic spikes, continuous integration and continuous delivery (CD/CD), and big data computing.

The time required to create 1,000 pods in a cluster:

  • When image caching is disabled: 30 seconds.

  • When image caching is enabled: 15 seconds.

Scale out elastic container instances

GOATScaler

GOATScaler (instant scaling) is an event-driven node autoscaler. It is compatible with the semantics and behavior of node pools that have auto scaling enabled.

Supports all types of applications and can be seamlessly enabled.

The time required to add 100 nodes to a cluster:

  • ContainerOS: 45 seconds

  • Standard mode: 103 seconds

  • Swift mode: N/A

Enable node instant scaling

Logs of auto scaling activities

For more information about how to collect logs of auto scaling activities, see Collect log files of system components.