Community Blog Enhancing Self-created Kubernetes with Cloud Elasticity to Cope with Traffic Bursts

Enhancing Self-created Kubernetes with Cloud Elasticity to Cope with Traffic Bursts

This article focuses on the use scenario of ACK One registered clusters - cloud elasticity.

By Yu Zhuang

Container technology, represented by Kubernetes, has revolutionized the application delivery model and is rapidly becoming a standardized API for data centers worldwide.

In application architecture design, stability, uninterrupted user access, high availability, and elasticity are persistent goals. Multi-cluster architecture naturally possesses these capabilities. However, it is through the unified and standardized API of Kubernetes that the true value of multi-cluster and hybrid cloud capabilities is realized.

In the previous article, Simplifying Kubernetes Multi-cluster Management with the Right Approach, we focused on the application scenarios, architectural implementation, security reinforcement of registered clusters in Distributed Cloud Container Platform for Kubernetes (ACK One), as well as the powerful observability of Alibaba Cloud Container Service for Kubernetes (ACK) in both cloud-based and self-built Kubernetes clusters, enabling unified operations and maintenance (O&M) of Kubernetes clusters.

In this article, we focus on another important use scenario of ACK One registered clusters - cloud elasticity.

Typical Application Scenarios and Benefits of Cloud Elasticity

The cloud elasticity of ACK One registered clusters is beneficial in the following scenarios:

1. Rapid business growth: Kubernetes clusters deployed in on-premises IDCs often face limitations in scaling due to limited computing resources. Additionally, the procurement and deployment of computing resources can be time-consuming, making it difficult to accommodate the rapid growth of business traffic.

2. Periodic or sudden business growth: The fixed number of computing resources in on-premises IDCs cannot effectively handle periodic business peaks or sudden surges in traffic.

The elasticity of computing resources provides a fundamental solution to these scenarios by dynamically scaling resources based on changes in business traffic. This ensures that business requirements are met while optimizing costs.

The following figure shows the architecture for cloud elasticity in ACK One registered clusters.


Through ACK One registered clusters, Kubernetes clusters in on-premises IDCs can elastically scale out the node pool of Alibaba Cloud ECS instances. This takes advantage of the exceptional elasticity provided by Alibaba Cloud Container Service, enabling businesses to scale out and effectively handle increased traffic, while also scaling in to reduce costs. Specifically for AI scenarios, ACK One registered clusters enable the connection of GPU-accelerated machines from the cloud to Kubernetes clusters in on-premises IDCs.

Best Practices for Adding Alibaba Cloud GPU Computing Power to a On-premises IDC Kubernetes Cluster

1. Create an ACK One Registered Cluster

Visit the Register Cluster page in the ACK One console. The registered cluster ACKOneRegisterCluster1 has been created and connected to a Kubernetes cluster in an on-premises IDC. See Simplifying Kubernetes Multi-cluster Management with the Right Approach

The registered clusters page in the ACK One console:


After the connection, you can view the on-premises IDC Kubernetes cluster in the ACK One console. Currently, there is only one master node.


2. Create a GPU Node Pool and Manually Scale Out One GPU Node.

Create the node pool GPU-P100 in the registered cluster and add a GPU-accelerated machine on the cloud to the Kubernetes cluster in the IDC.


Run kubectl in the IDC Kubernetes cluster to view the node information.

kubectl get node
NAME                           STATUS   ROLES    AGE     VERSION
cn-zhangjiakou.172.16.217.xx   Ready    <none>   5m35s   v1.20.9    // GPU-accelerated machine on the cloud
iz8vb1xtnuu0ne6b58hvx0z        Ready    master   20h     v1.20.9    // IDC machine

k describe node cn-zhangjiakou.172.16.217.xx
Name:               cn-zhangjiakou.172.16.217.xx
Roles:              <none>
Labels:             aliyun.accelerator/nvidia_count=1             //nvidia labels
                    aliyun.accelerator/nvidia_mem=16280MiB        //nvidia labels 
                    aliyun.accelerator/nvidia_name=Tesla-P100-PCIE-16GB  //nvidia labels
  cpu:                4
  ephemeral-storage:  123722704Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             30568556Ki
  nvidia.com/gpu:     1              //nvidia gpu
  pods:               110
  cpu:                4
  ephemeral-storage:  114022843818
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             30466156Ki
  nvidia.com/gpu:     1              //nvidia gpu
  pods:               110
System Info:
  OS Image:                   Alibaba Cloud Linux (Aliyun Linux) 2.1903 LTS (Hunting Beagle)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  docker://19.3.13
  Kubelet Version:            v1.20.9
  Kube-Proxy Version:         v1.20.9

3. Run the GPU Task Test

Submit the GPU test task to the Kubernetes cluster in the IDC. The test passed.

> cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
 name: gpu-pod
  restartPolicy: Never
    - name: cuda-container
      image: acr-multiple-clusters-registry.cn-hangzhou.cr.aliyuncs.com/ack-multiple-clusters/cuda10.2-vectoradd
          nvidia.com/gpu: 1 # requesting 1 GPU

> kubectl logs gpu-pod
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory

Multi-level Elastic Scheduling Policy - Priority-based Resource Scheduling

Alibaba Cloud provides priority-based resource scheduling to meet the elasticity requirements in pod scheduling. When you deploy or scale out an application, you can customize a resource policy to determine the order in which pods are scheduled to different types of node resources. When you scale in the application, pods are deleted from nodes in reverse order.

By following the demonstration above, you can create node pools using ECS resources on the cloud through the registered ACK One cluster and add them to the IDC cluster. You have the option to label the node pools or nodes and use nodeAffinity or nodeSelector to decide whether pods should run on the on-premises IDC nodes or the ECS nodes on the cloud. In this mode, you will need to modify the configuration of the application pod. If the production system has a large number of applications to process, you will need to write scheduling rules for custom scheduling scenarios. For example, you can schedule a GPU training task with a specific CUDA version to a particular GPU ECS instance on the cloud.

To simplify the use of ECS resources on the cloud by the IDC Kubernetes clusters, registered ACK One clusters provide the multi-level elastic scheduling feature. By installing ack-co-scheduler components, you can define ResourcePolicy CR objects to utilize this multi-level elastic scheduling feature.

ResourcePolicy CR is a namespace resource. The important parameters are as follows:

• selector: declares that the ResourcePolicy is applied to the selected pods which have the key1=value1 label in the same namespace.
• strategy: the scheduling policy. Currently only support prefer.
• units: the schedulable units. During a scale-out activity, pods are scheduled to nodes based on the priorities of the nodes listed under units in descending order. During a scale-in activity, pods are deleted from the nodes based on the priorities of the nodes in ascending order.
• resource: the type of elastic resources. Currently, IDC, ECS, and ECI are supported.
• nodeSelector: uses the label of the node to identify the nodes in the scheduling units. This parameter only applies to ECS resources..
• max: the maximum number of pods that can be deployed by using the resources.

ResourcePolicy supports the following scenarios:

Scenario 1: Use Cluster Resources in IDCs First and Then the ECS Resources on the Cloud

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
 name: cost-balance-policy
    app: nginx           // Select an application pod.
  strategy: prefer
  - resource: idc        // Prioritize the node resources in IDCs.
  - resource: ecs        // When IDC node resources are insufficient, use ECS resources on the cloud. You can use nodeSelector to select nodes.    

Scenario 2: Use IDC resources and ECS resources Simultaneously

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
 name: load-balance-policy
    app: nginx
  strategy: prefer
  - resource: idc
    max: 2             // Start a maximum of two application instances in the IDC nodes.
  - resource: ecs
    max: 4             // Start a maximum of four application instances in the ECS node pool.


In the demonstration, we added the Alibaba Cloud GPU P100 machine to the Kubernetes cluster in the IDC to enhance the GPU computing power of the IDC.

Through an ACK One registered cluster, you can:

  1. Select various ECS instance types and specifications on Alibaba Cloud, including X86, ARM, and GPU.
  2. Manually scale out or scale in the number of ECS instances.
  3. Configure automatic scaling of the number of ECS instances.
  4. Utilize multi-level elastic scheduling to prioritize resources in the IDC. If the IDC resources are insufficient, the ECS node pool in the cloud will automatically expand to handle sudden bursts of business traffic.


[1] Overview of Registered Clusters
[2] Create an ECS Node Pool
[3] Configure Auto Scaling of ECS Nodes
[4] Multi-level Elastic Scheduling

2 1 0
Share on

Alibaba Cloud Native

151 posts | 12 followers

You may also like


5395694827629328 December 7, 2023 at 2:22 pm

5395694827629328 December 7, 2023 at 2:22 pm

Alibaba Cloud Native

151 posts | 12 followers

Related Products