In-place vertical scaling based on CPU metrics - Container Compute Service

ack-advanced-vertical-pod-autoscaler (AVPA) is a in-placing scaling component provided by Alibaba cloud. It supports metrics-based in-place scaling and application startup acceleration. This topic describes vertical scaling based on CPU metrics and introduces the use scenarios, configuration, and limits of this feature.

Background information

AVPA supports startup acceleration and metrics-based vertical scaling. You can use these capabilities together with ACS in-place scaling to scale pod resources without business interruptions. This enables you to handle hotspots in business loads.

Comparison between AVPA and open source VPA

	VPA	AVPA
Scaling method	Recreate pods in one after one.	Perform in-place hot upgrade.
Scaling scope	Upgrade or downgrade all pods.	Upgrade or downgrade separately.
Applicable workloads	Deployments. StatefulSets (with limited effect). CronJobs.	Deployments. StatefulSets. Jobs. (supported by avpa0.2.0) Configure a selector to select pods created by any workloads. (supported by avpa0.3.0)
Use scenarios	Long-term and balanced workload changes.	Periodical workload changes, unexpected workload fluctuations, and unbalanced workloads.

Use scenarios

Gaming businesses: hot upgrade and downgrade for periodically fluctuating CPU workloads.
Online businesses: vertically scale hotspot pods.

Limits

AVPA cannot be used together with open source VPA or HPA.
Vertical scaling based on CPU metrics is supported. If the CPU specifications no longer meet the ACS requirements after scaling, the scaling request is rejected. For example, the system cannot upgrade from 2vCpu2Gi to 3vCpu2Gi. In addition, you must configure resource requests for the containers to be scaled.
Vertical scaling automatically adjusts the CPU requests and limits of pods. In ACK and ACK serverless clusters, vertical scaling may be constrained due to the limited node resources.
In ACK and ACK serverless clusters, the version of ack-virtual-node must be at least v2.14.0.
AVPA does not create profiles and takes effect only on existing pods. Therefore, new pods are created based on the original resource specifications defined in the workload.

Note

In-place scaling is in invitational preview. To use this feature, submit a ticket.
You can perform in-place scaling for ACS pods of the ComputeClass=general-purpose and ComputeQoS=default types.
- You can scale an ACS pod in the range of 50% to 100% of the original resources.
- Currently, you can scale up only to at most 16 vCPUs.
For example, you can scale an ACS pod with 4 vCPUs and 8 GiB memory in the range of 2 vCPUs and 8 GiB to 8 vCPUs and 8 GiB.

Procedure

In this topic, a sample workload and a shadow workload are used to demonstrate the procedure. The shadow workload is optional. In the following procedure, AVPA is configured for the sample workload. The shadow workload uses the same resource specifications as the sample workload but does not have AVPA configured. The example simulates CPU loads to demonstrate vertical scaling. The following figure shows the procedure.

The sample workload uses a Service as the traffic Ingress, including a load simulation tool. The tool can trigger the API to consume the specified amount of CPU resources, such as 500 millicores (0.5 core) within 6,000 seconds.

Procedure

Step 1: Enable the in-place scaling feature gate

Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its ID. In the left-side navigation pane of the cluster details page, choose Operations > Add-ons.
On the Core Components tab, select Kube API Server > Configuration. Set featureGates to InPlacePodVerticalScaling=true to enable the in-place scaling feature gate.
Note
If the status of the Kube API Server card displays Executing, the configuration is in progress. After the status changes to Installed, the feature gate is enabled.

Step 2: Install AVPA

In the left-side navigation pane, choose Applications > Helm, and find and install ack-advanced-vertical-pod-autoscaler. For more information, see Use Helm to manage applications in ACS.

Step 3: Deploy the workload and create a local connection

Create a YAML file. You can also create a YAML file named shadow-hello-avpa.yaml for stress testing.

Create a YAML file named hello-avpa.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-avpa
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      name: hello-avpa
  template:
    metadata:
      annotations:
        scaling.alibabacloud.com/enable-inplace-resource-resize: 'true'
      labels:
        name: hello-avpa
        vpa: enabled
    spec:
      containers:
        - image: 'registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/simulation-resource-consumer:1.13'
          name: hello-avpa
          resources:
            limits:
              cpu: '2'
              memory: '4Gi'
            requests:
              cpu: '2'
              memory: '4Gi'
---
apiVersion: v1
kind: Service
metadata:
  name: hello-avpa-svc
  namespace: default
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 8080
  selector:
    name: hello-avpa
  type: ClusterIP

(Optional) Create a YAML file named shadow-hello-avpa.yaml.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: shadow-hello-avpa
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      name: shadow-hello-avpa
  template:
    metadata:
      annotations:
        scaling.alibabacloud.com/enable-inplace-resource-resize: 'true'
      labels:
        vpa: enabled
        name: shadow-hello-avpa
    spec:
      containers:
        - image: 'registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/simulation-resource-consumer:1.13'
          name: shadow-hello-avpa
          resources:
            limits:
              cpu: '2'
              memory: '4Gi'
            requests:
              cpu: '2'
              memory: '4Gi'
---
apiVersion: v1
kind: Service
metadata:
  name: shadow-avpa-svc
  namespace: default
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 8080
  selector:
    name: shadow-hello-avpa
  type: ClusterIP

Deploy the workload.

kubectl apply -f hello-avpa.yaml
kubectl apply -f shadow-hello-avpa.yaml

Run kubectl port-forward to create a local connection.
Important
Port forwarding set up by using kubectl port-forward is not reliable, secure, or extensible in production environments. It is only for development and debugging. Do not use this command to set up port forwarding in production environments. For more information about networking solutions used for production in ACK clusters, see Ingress management.
```
kubectl port-forward svc/hello-avpa-svc -n default 28080:80
kubectl port-forward svc/shadow-avpa-svc -n default 28081:80
```

Step 4: Configure AVPA

You can create an AdvancedVerticalPodAutoscaler resource to configure elastic scaling.

Create a YAML file named avpa.yaml.

TargetRef mode

The following scaling configuration is based on the TargetRef mode of AVPA to adjust the resources of the pods created by the Deployment named hello-avpa in the default namespace based on the CPU utilization.

apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: AdvancedVerticalPodAutoscaler
metadata:
  name: hello-avpa
  namespace: default
spec:
  metrics:
    - containerResource:
        container: hello-avpa
        name: cpu
        target:
          averageUtilization: 30
          type: Utilization
      type: ContainerResource
      watermark: low
    - containerResource:
        container: hello-avpa
        name: cpu
        target:
          averageUtilization: 50
          type: Utilization
      type: ContainerResource
      watermark: high
  scaleResourceLimit:
    maximum:
      cpu: '4'
    minimum:
      cpu: '1'
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hello-avpa

The following table describes some of the parameters.

Parameter	Required	Description
scaleTargetRef	Yes	The target workload. Currently, AdvancedStatefulSet and CloneSet for Kubernetes-native Deployments, StatefulSets, Jobs, and OpenKruise are supported.
metrics.containerResource	Yes	Specify the type and utilization threshold of the resource whose metrics are collected: `container`: the name of the container from which metrics are collected. `name`: the name of the metric to collect. Only CPU metrics are supported. `target`: the threshold information. `type`: set to `Utilization`. `averageUtilization`: the average utilization threshold.
metrics.watermark	Yes	Specify the type of threshold. Valid values: `low`: the low threshold. When the metric value drops below the threshold, pods are scaled in. `high`: the high threshold. When the metric value exceeds the threshold, pods are scaled out.
metrics.type	Yes	Specify the metric collection granularity. You can aggregate metrics by container or pod. The default is `ContainerResource`. Note Currently, you can collect and aggregate metrics by container.
scaleResourceLimit.minimum	No	The lower limit of vertical scaling (for `cpu` only). cpu: The default is 250m (millicores).
scaleResourceLimit.maximum	No	The upper limit of vertical scaling (for `cpu` only). cpu: The default is 64 (cores).

Selector mode

The following Selector mode is introduced in AVPA 0.2.0, which provides a more flexible selector to simplify the pod configuration.

apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: AdvancedVerticalPodAutoscaler
metadata:
  name: hello-avpa
  namespace: default
spec:
  metrics:
    - containerResource:
        container: "*"
        name: cpu
        target:
          averageUtilization: 30
          type: Utilization
      type: ContainerResource
      watermark: low
    - containerResource:
        container: "*"
        name: cpu
        target:
          averageUtilization: 50
          type: Utilization
      type: ContainerResource
      watermark: high
  scaleResourceLimit:
    maximum:
      cpu: '4'
    minimum:
      cpu: '1'
  # You can modify the following sample selector configuration on demand.
  selector:
    matchLabels:
      vpa: enabled
    matchExpressions:
    # A label that applies to all ACS pods.
    - key: alibabacloud.com/compute-class 
      operator: Exists
    - key: name 
      operator: In
      values: 
      - hello-avpa
    # A reserved switch to disable AVPA for certain pods.
    - key: alibabacloud.com/disable-avpa 
      operator: DoesNotExist

The following table describes the parameters.

Unlike scaleTargetRef that takes effect only on one workload, the selector can select any pods to meet scaling requirements in different scenarios.

Parameter	Required	Description
selector	No	The `selector` and `scaleTargetRef` are mutually exclusive. The `selector` can simplify the AVPA configuration to centrally manage pods created by a type of workload. This saves you the need to configure AVPA for each workload.
metrics.containerResource	Yes	Specify the type and utilization threshold of the resource whose metrics are collected: `container`: the name of the container from which metrics are collected. Note You can enter a wildcard character (`*`) to match all containers. If a container matches both the wildcard expression and a specific container match rule, the container match rule prevails. `name`: the name of the metric to collect. Only CPU metrics are supported. `target`: the threshold information. `type`: set to `Utilization`. `averageUtilization`: the average utilization threshold.
metrics.watermark	Yes	Specify the type of threshold. Valid values: `low`: the low threshold. When the metric value drops below the threshold, pods are scaled in. `high`: the high threshold. When the metric value exceeds the threshold, pods are scaled out.
metrics.type	Yes	Specify the metric collection granularity. You can aggregate metrics by container or pod. The default is `ContainerResource`. Note Currently, you can collect and aggregate metrics by container.
scaleResourceLimit.minimum	No	The lower limit of vertical scaling (for `cpu` only). cpu: The default is 250m (millicores).
scaleResourceLimit.maximum	No	The upper limit of vertical scaling (for `cpu` only). cpu: The default is 64 (cores).

(Optional) The advanced AVPA configurations are as follows.

Advanced configuration

You can modify the scaling step in the following AVPA configuration on demand.

apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: AdvancedVerticalPodAutoscaler
metadata:
  name: hello-avpa
  namespace: default
spec:
  behavior:
    parallelism: 1
    stabilizationWindowSeconds: 600
    scaleDown: #Specify the scale-in step and policy. We recommend that you scale in resources progressively and ensure that each scale-in activity does not exceed the actual resource specifications.
      policies:
      - type: CpuPercent
        value: 10%
        periodSeconds: 60
      - type: Cpus
        value: 500m
        periodSeconds: 60
      selectPolicy: Max
    scaleUp: #Specify the scale-out step and policy. We recommend that you scale out resources quickly to reduce the number of times of scale-out.
      policies:
      - type: CpuPercent
        value: 60%
        periodSeconds: 60
      - type: Cpus
        value: 500m
        periodSeconds: 60
      selectPolicy: Max
  metricObserveWindowSeconds: 600
  metrics:
    - containerResource:
        container: hello-avpa
        name: cpu
        target:
          averageUtilization: 30
          type: Utilization
      type: ContainerResource
      watermark: low
    - containerResource:
        container: hello-avpa
        name: cpu
        target:
          averageUtilization: 50
          type: Utilization
      type: ContainerResource
      watermark: high
  scaleResourceLimit:
    maximum:
      cpu: '4'
    minimum:
      cpu: '1'
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hello-avpa

The following table describes the parameters.

The scaleDown and scaleUp settings in spec.behavior also describe the scale-in and scale-out policies. The syntaxes of the settings included in them are the same.

Parameter	Required	Description
policies[].type	No	`Cpus`: the CPU scale-out step, which is an absolute value. `CpuPercent`: the percentage of CPU resources to scale in based on the current CPU specification.
policies[].value	No	When `type` is set to `Cpus`, specify an absolute value, such as `250m` or `1`. When `type` is set to `CpuPercent`, specify a percentage value, such as `5%`.
selectPolicy	No	The policy. Valid values: `Max`: The highest value is specified as the step when multiple policies are configured. `Min`: The smallest value is specified as the step when multiple policies are configured.
metricObserveWindowSeconds	No	The data collection time window. When the value of a metric changes within the time window, the loads of the metric are calculated to determine whether scaling is needed. Unit: seconds. The default is `600` and the minimum is `300`.
behavior.parallelism	No	The scaling concurrency, which indicates the number of pods that can be concurrently scaled. The default is `1`.
behavior.stabilizationWindowSeconds	No	The scaling cooldown period. Unit: seconds. The default is `600` and the minimum is `300`. The cooldown period must not be shorter than `metricObserveWindowSeconds`.

AVPA calculates the resource specification of the target based on the default policy. It calculates the scaling step based on the average of highWaterMark and lowWaterMark. For example, a container occupies 1 vCPU. If you set the high and low thresholds to 60 and 40 and the current utilization is 100%, the target utilization is 50%. The scale-out step is 1 vCPU and the CPU resources are scaled out to 2 vCPUs.

$1000 m * 60%$ and 500m whichever is greater is used. Therefore, the scale-out step is 600m.

Note

Suggested configuration: scale out quickly to handle spikes and scale in progressively to ensure stability.

scaleUp: specify a large step to reduce the number of times of scale-out. For example, specify type: Cpus and value: 1.

scaleDown: specify a small step to scale in progressively in order to ensure stability. For example, specify type: Cpus and value: 250m.

Deploy the YAML file.
```
kubectl apply -f avpa.yaml
```

Step 5: Perform stress testing

Send requests to the backend pod through the traffic Ingress to generate CPU loads. Observe the workload monitoring data before and after AVPA is enabled based on the same resource specifications and loads.

Send requests to generate loads.

The tested pod occupies 2 vCPUs. Increase the loads by 50% by occupying 1,000 millicores. Keep the loads for 2,000 seconds. Increase the loads by occupying additional 200 milliseconds every 60 seconds. After 30 minutes, 4,000 millicores are occupied.

# Increase the loads. The upper limit of each command is 1,000 millicores.
curl --data "millicores=1000&durationSec=2000" http://localhost:28080/ConsumeCPU
curl --data "millicores=1000&durationSec=2000" http://localhost:28081/ConsumeCPU
# Continuously increase the loads.
for i in {1..30}
do
  sleep 60
  curl --data "millicores=100&durationSec=2000" http://localhost:28080/ConsumeCPU
  curl --data "millicores=100&durationSec=2000" http://localhost:28081/ConsumeCPU
done

Monitor the metric data.
In the left-side navigation pane, choose Operations > Prometheus Monitoring. On the Application Monitoring > Deployments tab, view the monitoring data.
1. The CPU loads continuously increase and then reach the threshold. After 10 minutes, CPU resources are scaled out to 2 vCPUs. After 30 minutes, CPU resources are scaled out to 4 vCPUs. After the scale-out, the CPU loads significantly drop.
2. Monitor the loads of the shadow workload. The CPU resources are exhausted within a short period of time.
View pod events.
The pod events show configuration changes in the YAML template during scaling.
Simulate scale-in activities due to low loads.
After the CPU loads drop below the threshold, CPU resources are scaled in.
- After the loads reach the scale-in threshold, CPU resources are scaled in at the specified step to ensure stability.
- The container resources will be scaled to the minimum specification 1 vCPU claimed in the AVPA configuration if the loads consistently remain at a low level.

(Optional) Step 6: Delete resources

Delete the workloads, Service, and AVPA resource.

kubectl delete -f hello-avpa.yaml
kubectl delete -f shadow-hello-avpa.yaml

In the left-side navigation pane, choose Applications > Helm. Click Delete in the Actions column of ack-advanced-vertical-pod-autoscaler.

FAQ

How do I view the status of AVPA?

Use kubectl to query the real-time status of AVPA.

# In this example, the cluster connection information is stored in the ~/.kube/acs-test file.
# export KUBECONFIG=~/.kube/acs-test 
kubectl get avpa -n [namespace] [-oyaml]

Expected output:

$ kubectl get avpa
NAME         TARGETTYPE   TARGETNAME   REPLICAS   UPDATING   WAITING   LASTSCALED   AGE
hello-avpa                             1          0          0         11d          11d

How do I query pods are being scaled?

In AVPA versions later than 0.3.0, pods that are being scaled have the avpa.alibabacloud.com/resizing-lock label.

kubectl get po -n  [ns] -lavpa.alibabacloud.com/resizing-lock

How do I query pods that encounter scaling failures?

In AVPA versions later than 0.3.0, scaling failures are recorded in Kubernetes events. You can query scaling failures in the Logstore by specifying the InplaceResizedTimeoutFailed keyword.

Note

You can create alert rules for these events. For more information, see Create alert rules.