Enable automatic node scaling in self-managed Kubernetes clusters - Auto Scaling

If you have a self-managed Kubernetes cluster and want to dynamically adjust the number of nodes in the cluster based on actual workloads, you can use Cluster Autoscaler (CA) together with Alibaba Cloud Auto Scaling to ensure efficient resource usage and service stability.

Important

This topic describes how to manually deploy CA to enable automatic node scaling in self-managed Kubernetes clusters. You can also connect your self-managed Kubernetes cluster to the registered ACK One cluster and use the automatic scaling capability of ACK One to automatically scale nodes. To use the automatic scaling capability of ACK One, perform the following steps:

For more information about ACK One, see ACK One overview.

Working principle

CA is a component in Kubernetes that automatically adjusts the number of nodes in a cluster. CA continuously monitors the status of pods. If CA detects pods in the Pending state due to resource shortage, CA triggers Auto Scaling to increase the number of nodes. The following figure shows how this process works.

When CA detects that the resource usage of specific nodes consistently falls below the predefined threshold and the pods on these nodes can be moved to other nodes, CA evicts the pods to alternative nodes and then triggers Auto Scaling to decrease the number of nodes. The following figure shows how this process works.

For more information about CA, see Cluster Autoscaling.

Preparations

Make sure that you complete the following preparations:

A self-managed Kubernetes cluster whose version is V1.9.3 or later is created.
Important
In this topic, Elastic Compute Service (ECS) instances are used to create a Kubernetes cluster. If you want to use on-premises machines in data centers or instances from different cloud service providers, refer to the official documentation of Alibaba Cloud VPN Gateway or Smart Access Gateway to ensure network connectivity.
A Resource Access Management (RAM) user is created.
When CA attempts to access Alibaba Cloud Auto Scaling, access credentials are required to verify user identities and access permissions. Create a RAM user for CA and grant the RAM user the Auto Scaling access permissions.
1. For information about how to create a RAM user and enable API access for the RAM user, see Create a RAM user.
2. For information about how to attach a custom policy to a RAM user, see Grant permissions to a RAM user. In this topic, the following custom policy is used:
```
{
  "Version": "1",
  "Statement": [
    {
      "Action": [
        "ess:Describe*",
        "ess:CreateScalingRule",
        "ess:ModifyScalingGroup",
        "ess:RemoveInstances",
        "ess:ExecuteScalingRule",
        "ess:ModifyScalingRule",
        "ess:DeleteScalingRule",
        "ess:DetachInstances",
        "ecs:DescribeInstanceTypes"
      ],
      "Resource": [
        "*"
      ],
      "Effect": "Allow"
    }
  ]
}
```
3. An AccessKey pair that consists of an AccessKey ID and AccessKey secret is created. You must save the AccessKey pair for subsequent use. For more information about how to create an AccessKey pair, see Create an AccessKey pair.

Procedure

(Optional) Step 1: Create a Cluster Autoscaler image

You can build a custom Cluster Autoscaler image by using the source code. Then, you can use the image to deploy Cluster Autoscaler in your self-managed Kubernetes cluster.

Important

Alternatively, you can skip this step and use the following cluster-autoscaler image provided by Alibaba Cloud: ess-cluster-autoscaler-registry.cn-hangzhou.cr.aliyuncs.com/ess-cluster-autoscaler/cluster-autoscaler:v1.7.

Download the source code from GitHub.

mkdir -p $GOPATH/src/github.com/kubernetes
cd $GOPATH/src/github.com/kubernetes
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler

Create an image.

# Compile the code.
cd cluster-autoscaler && make build-arch-amd64
# Build an image.
docker build -t cluster-autoscaler:v1.0 -f Dockerfile.amd64 .
# Add tags.
docker tag cluster-autoscaler:v1.0 Domain name of your image repository/cluster-autoscaler:v1.0
# Upload the image.
docker push Domain name of your image repository/cluster-autoscaler:v1.0

Step 2: Create and configure a scaling group

Create a scaling group.

Log on to the Auto Scaling console.
In the top navigation bar, select the region where you activated Auto Scaling. In the left-side navigation pane, click Scaling Groups. On the Scaling Groups page, click Create.

On the Create by Form tab, configure parameters as prompted and click Create. For more information about how to create scaling groups, see Create scaling groups. The following table describes the parameter settings used in this example.

Parameter	Parameter	Cron expression
Scaling Group Name	Enter a name for the scaling group.	K8s-Node-Scaling-Group
Type	Select ECS. This value specifies that the scaling group contains ECS instances.	ECS
Instance Configuration Source	Do not specify a template used to automatically create elastic container instances. After you create the scaling group, continue to create a scaling configuration.	Create from Scratch
Minimum Number of Instances	In this example, this parameter is set to 0. A value of 0 specifies that the scaling group contains no ECS instance at the very beginning.	0
Maximum Number of Instances	In this example, this parameter is set to 5. A value of 5 specifies that the scaling group contains up to five ECS instances.	5
VPC	Select an existing virtual private cloud (VPC) for the ECS instances in the scaling group.	vpc-test****-001
vSwitch	Select multiple vSwitches from different zones to improve the success rate of scale-out operations.	vsw-test****

Important

After you create the scaling group, record the zones and the scaling group ID for subsequent use.

Create a scaling configuration for the scaling group.
1. Find the scaling group that you created and click Details in the Actions column to go to the scaling group details page.
2. On the Instance Configuration Sources tab, click Scaling Configurations. Then, click Create Scaling Configuration to go to the Create Scaling Configuration page.

For more information about how to create a scaling configuration, see Create a scaling configuration of the ECS type. The following table describes the parameter settings used in this example.

Parameter	Parameter	Cron expression
Scaling Configuration Name	Enter a name for the scaling configuration.	K8s-Scaling-Node-Config
Billing Method	Select a billing method based on your business requirements.	Pay-as-you-go
Instance Configuration Mode	Select an instance configuration mode based on your business requirements.	Specify Instance Type
Select Instance Type	Select one or more instance types based on your business requirements. Warning Instance types of the following instance families are supported: x86-based enterprise-level computing instance families Enterprise-level heterogeneous computing instance families High-performance computing instance families ECS Bare Metal Instance families Arm-based enterprise-level computing instance families are not supported. For more information about instance families, see Overview of instance families.	ecs.g6a.large
Select Image	Select an image based on your business requirements.	Alibaba Cloud Linux

Configure the network and security group.
- Security Group: Select an existing security group. Make sure that the security group allows access to your self-managed Kubernetes cluster.
- Assign Public IPv4 Address: If the API Server component of your self-managed Kubernetes cluster uses a public IP address, select this check box to enable Internet access for the ECS instances in the scaling group.
  Warning
  In addition, make sure that port 6443 is enabled for the API Server component.

In the Advanced Settings section, configure the Instance User Data parameter. In the Instance User Data text box, enter the following script to initialize Kubernetes worker nodes and add the worker nodes to your self-managed Kubernetes cluster:

Important

Replace <<YOUR_MASTER_NODE_IP>> with the IP address of your Kubernetes master node.

#!/bin/bash

# Disable the firewall.
systemctl stop firewalld

systemctl disable firewalld

# Disable SELinux.
sed -i 's/enforced /disabled/' /etc/selinux/config # Permanent
setenforce 0 # Temporary

# Disable swap.
swapoff -a # Temporary
sed -ri 's/.swap./#&/' /etc/fstab # Permanent

# Pass the IPv4 bridged traffic to the iptables chain.
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system # Effective


# Add a Kubernetes repository.
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF


# General installation package
yum install vim bash-completion net-tools gcc -y

# Install Docker.
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
yum -y install docker-ce

systemctl enable docker && systemctl start docker

cat > /etc/docker/daemon.json << EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
EOF

systemctl restart docker


# Install kubeadm, kubectl, and kubelet.
yum install -y kubelet-1.23.0 kubeadm-1.23.0 kubectl-1.23.0

# Start kubelet.
systemctl enable kubelet && systemctl start kubelet

# If kubelet fails to be started, you can run the journalctl -xeu kubelet command to troubleshoot the issue.

# Add worker nodes to the cluster.
regionId=$(sed -n 's/.*"region-id": "\(.*\)".*/\1/p' /run/cloud-init/instance-data.json)
instanceId=$(sed -n 's/.*"instance_id": "\(.*\)".*/\1/p' /run/cloud-init/instance-data.json)
privateIpv4=$(sed -n 's/.*"private-ipv4": "\(.*\)".*/\1/p' /run/cloud-init/instance-data.json)

cat > kubeadm-config.yaml << EOF
apiVersion: kubeadm.k8s.io/v1beta2
kind: JoinConfiguration
discovery:
  bootstrapToken:
    token: "your-bootstrap-token"
    apiServerEndpoint: "<<YOUR_MASTER_NODE_IP>>:6443"
    caCertHashes:
    - "sha256:your-discovery-token-ca-cert-hash"
nodeRegistration:
  name: "$regionId-$privateIpv4"
  kubeletExtraArgs:
    provider-id: "$regionId.$instanceId"

EOF

kubeadm join --config=kubeadm-config.yaml

Note

Specify the --provider-id field for the worker nodes that you want to scale out. In the preceding script, the --provider-id field is specified.

Click Create and enable the scaling configuration.
Optional. Check whether ECS instances that are scaled out in the scaling group can be added to the Kubernetes cluster as expected.
You can manually increase the minimum number of instances in the scaling group by one to trigger a scale-out event and check whether the scaled-out ECS instance is initialized and added to your self-managed Kubernetes cluster as expected.

Step 3: Deploy the Cluster Autoscaler component in your self-managed Kubernetes cluster

Encode the AccessKey ID and AccessKey secret created in the "Preparations" section of this topic in Base64 format.

echo $AccessKey-ID | tr -d '\n' | base64
echo $AccessKey-Secret | tr -d '\n' | base64 
echo $RegionId | tr -d '\n' | base64

Create a file named deploy-ca.yaml. Modify the relevant fields based on your business requirements. Then, deploy the file to the kube-system namespace in your self-managed Kubernetes cluster. In this topic, the following deploy-ca.yaml file is used:

Important

Perform the following operations to update the access-key-id, access-key-secret, and region-id settings in the Secret file and update the scaling group ID in the command section below the containers field of the Deployment file:

Replace <<YOUR_ACCESS_KEY_ID>> with the Base64-encoded AccessKey ID.
Replace <<YOUR_ACCESS_KEY_SECRET>> with the Base64-encoded AccessKey secret.
Replace <<YOUR_REGION_ID>> with the Base64-encoded region ID. For information about how to obtain the region ID, see Regions.
Replace <<YOUR_ESS_SCALING_GROUP_ID>> with the ID of the scaling group that you created.
Replace <<KUBERNETES_SERVICE_HOST>> with the IP address of the API Server component of your self-managed Kubernetes cluster.

---
apiVersion: v1
kind: Secret
metadata:
  name: cloud-config
type: Opaque
data:
  access-key-id: <<YOUR_ACCESS_KEY_ID>>
  access-key-secret: <<YOUR_ACCESS_KEY_SECRET>>
  region-id: <<YOUR_REGION_ID>>

---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
  name: cluster-autoscaler
  namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
  resources: ["events","endpoints"]
  verbs: ["create", "patch"]
- apiGroups: [""]
  resources: ["pods/eviction"]
  verbs: ["create"]
- apiGroups: [""]
  resources: ["pods/status"]
  verbs: ["update"]
- apiGroups: [""]
  resources: ["endpoints"]
  resourceNames: ["cluster-autoscaler"]
  verbs: ["get","update"]
- apiGroups: [""]
  resources: ["nodes"]
  verbs: ["watch","list","get","update"]
- apiGroups: [""]
  resources: ["namespaces","pods","services","replicationcontrollers","persistentvolumeclaims","persistentvolumes"]
  verbs: ["watch","list","get"]
- apiGroups: ["extensions"]
  resources: ["replicasets","daemonsets"]
  verbs: ["watch","list","get"]
- apiGroups: ["policy"]
  resources: ["poddisruptionbudgets"]
  verbs: ["watch","list"]
- apiGroups: ["apps"]
  resources: ["statefulsets", "replicasets", "daemonsets"]
  verbs: ["watch","list","get"]
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["watch","list","get"]
- apiGroups: ["storage.k8s.io"]
  resources: ["storageclasses", "csinodes", "csidrivers", "csistoragecapacities"]
  verbs: ["watch","list","get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["create","list","watch"]
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["cluster-autoscaler-status", "cluster-autoscaler-priority-expander"]
  verbs: ["delete","get","update","watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cluster-autoscaler
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cluster-autoscaler
  namespace: kube-system
  labels:
    k8s-addon: cluster-autoscaler.addons.k8s.io
    k8s-app: cluster-autoscaler
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cluster-autoscaler
subjects:
  - kind: ServiceAccount
    name: cluster-autoscaler
    namespace: kube-system

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: cluster-autoscaler
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      dnsPolicy: "None"
      dnsConfig:
        nameservers:
          - 100.100.2.136
          - 100.100.2.138
        options:
          - name: timeout
            value: "1"
          - name: attempts
            value: "3"
      priorityClassName: system-cluster-critical
      serviceAccountName: cluster-autoscaler
      containers:
        - command:
            - ./cluster-autoscaler
            - '--v=2'
            - '--logtostderr=true'
            - '--stderrthreshold=info'
            - '--cloud-provider=alicloud'
            - '--expander=least-waste'
            - '--scan-interval=60s'
            - '--scale-down-enabled=true'
            - '--scale-down-delay-after-add=10m'
            - '--scale-down-delay-after-failure=1m'
            - '--scale-down-unready-time=2m'
            - '--ok-total-unready-count=1000'
            - '--max-empty-bulk-delete=50'
            - '--leader-elect=false'
            - '--max-node-provision-time=5m'
            - '--scale-up-from-zero=true'
            - '--daemonset-eviction-for-empty-nodes=false'
            - '--daemonset-eviction-for-occupied-nodes=false'
            - '--max-graceful-termination-sec=14400'
            - '--skip-nodes-with-system-pods=true'
            - '--skip-nodes-with-local-storage=false'
            - '--min-replica-count=0'
            - '--scale-down-unneeded-time=10m'
            - '--scale-down-utilization-threshold=0.3'
            - '--scale-down-gpu-utilization-threshold=0.3'
            - '--nodes=0:100:<<YOUR_ESS_SCALING_GROUP_ID>>'
          image: >-
            ess-cluster-autoscaler-registry.cn-hangzhou.cr.aliyuncs.com/ess-cluster-autoscaler/cluster-autoscaler:v1.7
          imagePullPolicy: Always
          name: cluster-autoscaler
          resources:
            requests:
              cpu: 100m
              memory: 300Mi
          securityContext:
            allowPrivilegeEscalation: true
            capabilities:
              add:
                - SYS_ADMIN
              drop:
                - ALL
          env:
          - name: ACCESS_KEY_ID
            valueFrom:
              secretKeyRef:
                name: cloud-config
                key: access-key-id
          - name: ACCESS_KEY_SECRET
            valueFrom:
              secretKeyRef:
                name: cloud-config
                key: access-key-secret
          - name: REGION_ID
            valueFrom:
              secretKeyRef:
                name: cloud-config
                key: region-id
          - name: KUBERNETES_SERVICE_HOST
            value: "<<KUBERNETES_SERVICE_HOST>>"
          - name: KUBERNETES_SERVICE_PORT
            value: "6443"
          - name: KUBERNETES_SERVICE_PORT_HTTPS
            value: "6443"

Note

You can configure the --scale-down-enabled parameter to enable automatic scale-in. After you enable automatic scale-in, CA routinely monitors the cluster status to identify nodes whose resource usage does not exceed 50%. You can configure the --scale-down-utilization-threshold parameter to specify a resource usage threshold.
By default, CA does not terminate pods in the kube-system namespace. You can set the --skip-nodes-with-system-pods parameter to false to overwrite the default setting.
By default, CA requires approximately 10 minutes to complete a scale-in operation. You can configure the scale-down-delay parameter to specify a custom waiting period. For example, if you set the --scale-down-delay parameter to 5m, the waiting period is 5 minutes.
If you want to apply the deploy-ca.yaml file to multiple scaling groups, you can set the --expander parameter to random, most-pods, or least-waste.
- random: randomly selects a scaling group during a scale-out event.
- most-pods: selects the scaling group that has the largest number of pods during a scale-out event.
- least-waste: selects the scaling group that uses the least CPU or memory resources during a scale-out event. If more than one scaling group uses the least CPU or memory resources, this parameter is automatically reset to random.

Run the following command to deploy CA to your self-managed Kubernetes cluster:

kubectl apply -f deploy-ca.yaml -n kube-system

(Optional) Verify the feature

If the Kubernetes cluster has pods in the Pending state due to resource shortages, CA triggers Auto Scaling to scale out nodes. If the resource usage of nodes consistently falls below the predefined threshold, CA triggers Auto Scaling to scale in nodes.

Deploy a simple file named nginx-demo.yaml to verify the automatic scale-out feature. In this example, the following nginx-demo.yaml file is used:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-demo
spec:
  selector:
    matchLabels:
      app: nginx-demo
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx-demo
    spec:
      containers:
        - name: nginx
          image: ess-cluster-autoscaler-registry.cn-hangzhou.cr.aliyuncs.com/ess-cluster-autoscaler/nginx-demo:v1.0
          ports:
            - containerPort: 80
              name: http
            - containerPort: 443
              name: https
          resources:
            requests:
              memory: 1Gi
              cpu: 1
            limits:
              memory: 1Gi
              cpu: '1'

Run the following command to deploy the nginx-demo.yaml file:

kubectl apply -f nginx-demo.yaml

Increase the number of replicas to check whether pods in the Pending state exist due to resource shortages. Run the following command to increase the number of replicas:
```
kubectl scale deployment nginx-demo --replicas=5
```
Wait for approximately 1 minute to check whether the scaling group has a scale-out process.
After the scale-out process is complete in the scaling group, wait for 3 minutes to check whether new nodes are added to the Kubernetes cluster. Run the following command to view all nodes in the Kubernetes cluster and check whether new nodes are added to the cluster:
```
kubectl get nodes
```

Note

To verify the automatic scale-in feature, reduce the number of nginx-demo replicas to lower the resource usage of nodes below the predefined threshold. Then, check whether the scaling group has a scale-in process