ack-advanced-vertical-pod-autoscaler (AVPA) is a in-placing scaling component provided by Alibaba cloud. It supports metrics-based in-place scaling and application startup acceleration. This topic describes vertical scaling based on CPU metrics and introduces the use scenarios, configuration, and limits of this feature.
Background information
AVPA supports startup acceleration and metrics-based vertical scaling. You can use these capabilities together with ACS in-place scaling to scale pod resources without business interruptions. This enables you to handle hotspots in business loads.
Comparison between AVPA and open source VPA
VPA | AVPA | |
Scaling method | Recreate pods in one after one. | Perform in-place hot upgrade. |
Scaling scope | Upgrade or downgrade all pods. | Upgrade or downgrade separately. |
Applicable workloads |
|
|
Use scenarios | Long-term and balanced workload changes. | Periodical workload changes, unexpected workload fluctuations, and unbalanced workloads. |
Use scenarios
Gaming businesses: hot upgrade and downgrade for periodically fluctuating CPU workloads.
Online businesses: vertically scale hotspot pods.
Limits
AVPA cannot be used together with open source VPA or HPA.
Vertical scaling based on CPU metrics is supported. If the CPU specifications no longer meet the ACS requirements after scaling, the scaling request is rejected. For example, the system cannot upgrade from
2vCpu2Gito3vCpu2Gi. In addition, you must configure resourcerequestsfor the containers to be scaled.Vertical scaling automatically adjusts the CPU requests and limits of pods. In ACK and ACK serverless clusters, vertical scaling may be constrained due to the limited node resources.
In ACK and ACK serverless clusters, the version of
ack-virtual-nodemust be at leastv2.14.0.AVPA does not create profiles and takes effect only on existing pods. Therefore, new pods are created based on the original resource specifications defined in the workload.
In-place scaling is in invitational preview. To use this feature, submit a ticket.
You can perform in-place scaling for ACS pods of the
ComputeClass=general-purposeandComputeQoS=defaulttypes.You can scale an ACS pod in the range of 50% to 100% of the original resources.
Currently, you can scale up only to at most 16 vCPUs.
For example, you can scale an ACS pod with
4 vCPUs and 8 GiB memoryin the range of2 vCPUs and 8 GiBto8 vCPUs and 8 GiB.
Procedure
In this topic, a sample workload and a shadow workload are used to demonstrate the procedure. The shadow workload is optional. In the following procedure, AVPA is configured for the sample workload. The shadow workload uses the same resource specifications as the sample workload but does not have AVPA configured. The example simulates CPU loads to demonstrate vertical scaling. The following figure shows the procedure.
The sample workload uses a Service as the traffic Ingress, including a load simulation tool. The tool can trigger the API to consume the specified amount of CPU resources, such as 500 millicores (0.5 core) within 6,000 seconds.
Procedure
Step 1: Enable the in-place scaling feature gate
Log on to the ACS console. In the left-side navigation pane, click Clusters.
On the Clusters page, find the cluster that you want to manage and click its ID. In the left-side navigation pane of the cluster details page, choose Operations > Add-ons.
On the Core Components tab, select Kube API Server > Configuration. Set featureGates to
InPlacePodVerticalScaling=trueto enable the in-place scaling feature gate.
NoteIf the status of the Kube API Server card displays Executing, the configuration is in progress. After the status changes to Installed, the feature gate is enabled.
Step 2: Install AVPA
In the left-side navigation pane, choose Applications > Helm, and find and install ack-advanced-vertical-pod-autoscaler. For more information, see Use Helm to manage applications in ACS.

Step 3: Deploy the workload and create a local connection
Create a YAML file. You can also create a YAML file named shadow-hello-avpa.yaml for stress testing.
Create a YAML file named hello-avpa.yaml.
apiVersion: apps/v1 kind: Deployment metadata: name: hello-avpa namespace: default spec: replicas: 1 selector: matchLabels: name: hello-avpa template: metadata: annotations: scaling.alibabacloud.com/enable-inplace-resource-resize: 'true' labels: name: hello-avpa vpa: enabled spec: containers: - image: 'registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/simulation-resource-consumer:1.13' name: hello-avpa resources: limits: cpu: '2' memory: '4Gi' requests: cpu: '2' memory: '4Gi' --- apiVersion: v1 kind: Service metadata: name: hello-avpa-svc namespace: default spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: name: hello-avpa type: ClusterIP(Optional) Create a YAML file named shadow-hello-avpa.yaml.
apiVersion: apps/v1 kind: Deployment metadata: name: shadow-hello-avpa namespace: default spec: replicas: 1 selector: matchLabels: name: shadow-hello-avpa template: metadata: annotations: scaling.alibabacloud.com/enable-inplace-resource-resize: 'true' labels: vpa: enabled name: shadow-hello-avpa spec: containers: - image: 'registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/simulation-resource-consumer:1.13' name: shadow-hello-avpa resources: limits: cpu: '2' memory: '4Gi' requests: cpu: '2' memory: '4Gi' --- apiVersion: v1 kind: Service metadata: name: shadow-avpa-svc namespace: default spec: ports: - port: 80 protocol: TCP targetPort: 8080 selector: name: shadow-hello-avpa type: ClusterIPDeploy the workload.
kubectl apply -f hello-avpa.yaml kubectl apply -f shadow-hello-avpa.yamlRun
kubectl port-forwardto create a local connection.ImportantPort forwarding set up by using
kubectl port-forwardis not reliable, secure, or extensible in production environments. It is only for development and debugging. Do not use this command to set up port forwarding in production environments. For more information about networking solutions used for production in ACK clusters, see Ingress management.kubectl port-forward svc/hello-avpa-svc -n default 28080:80 kubectl port-forward svc/shadow-avpa-svc -n default 28081:80
Step 4: Configure AVPA
You can create an AdvancedVerticalPodAutoscaler resource to configure elastic scaling.
Create a YAML file named avpa.yaml.
TargetRef mode
The following scaling configuration is based on the TargetRef mode of AVPA to adjust the resources of the
podscreated by theDeploymentnamedhello-avpain thedefaultnamespace based on the CPU utilization.apiVersion: autoscaling.alibabacloud.com/v1beta1 kind: AdvancedVerticalPodAutoscaler metadata: name: hello-avpa namespace: default spec: metrics: - containerResource: container: hello-avpa name: cpu target: averageUtilization: 30 type: Utilization type: ContainerResource watermark: low - containerResource: container: hello-avpa name: cpu target: averageUtilization: 50 type: Utilization type: ContainerResource watermark: high scaleResourceLimit: maximum: cpu: '4' minimum: cpu: '1' scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: hello-avpaThe following table describes some of the parameters.
Parameter
Required
Description
scaleTargetRef
Yes
The target workload. Currently, AdvancedStatefulSet and CloneSet for Kubernetes-native Deployments, StatefulSets, Jobs, and OpenKruise are supported.
metrics.containerResource
Yes
Specify the type and utilization threshold of the resource whose metrics are collected:
container: the name of the container from which metrics are collected.name: the name of the metric to collect. Only CPU metrics are supported.target: the threshold information.type: set toUtilization.averageUtilization: the average utilization threshold.
metrics.watermark
Yes
Specify the type of threshold. Valid values:
low: the low threshold. When the metric value drops below the threshold, pods are scaled in.high: the high threshold. When the metric value exceeds the threshold, pods are scaled out.
metrics.type
Yes
Specify the metric collection granularity. You can aggregate metrics by container or pod. The default is
ContainerResource.NoteCurrently, you can collect and aggregate metrics by container.
scaleResourceLimit.minimum
No
The lower limit of vertical scaling (for
cpuonly).cpu: The default is 250m (millicores).
scaleResourceLimit.maximum
No
The upper limit of vertical scaling (for
cpuonly).cpu: The default is 64 (cores).
Selector mode
The following Selector mode is introduced in AVPA 0.2.0, which provides a more flexible selector to simplify the pod configuration.
apiVersion: autoscaling.alibabacloud.com/v1beta1 kind: AdvancedVerticalPodAutoscaler metadata: name: hello-avpa namespace: default spec: metrics: - containerResource: container: "*" name: cpu target: averageUtilization: 30 type: Utilization type: ContainerResource watermark: low - containerResource: container: "*" name: cpu target: averageUtilization: 50 type: Utilization type: ContainerResource watermark: high scaleResourceLimit: maximum: cpu: '4' minimum: cpu: '1' # You can modify the following sample selector configuration on demand. selector: matchLabels: vpa: enabled matchExpressions: # A label that applies to all ACS pods. - key: alibabacloud.com/compute-class operator: Exists - key: name operator: In values: - hello-avpa # A reserved switch to disable AVPA for certain pods. - key: alibabacloud.com/disable-avpa operator: DoesNotExistThe following table describes the parameters.
Unlike
scaleTargetRefthat takes effect only on one workload, theselectorcan select any pods to meet scaling requirements in different scenarios.Parameter
Required
Description
selector
No
The
selectorandscaleTargetRefare mutually exclusive. Theselectorcan simplify the AVPA configuration to centrally manage pods created by a type of workload. This saves you the need to configure AVPA for each workload.metrics.containerResource
Yes
Specify the type and utilization threshold of the resource whose metrics are collected:
container: the name of the container from which metrics are collected.
NoteYou can enter a wildcard character (
*) to match all containers.If a container matches both the wildcard expression and a specific container match rule, the container match rule prevails.
name: the name of the metric to collect. Only CPU metrics are supported.target: the threshold information.type: set toUtilization.averageUtilization: the average utilization threshold.
metrics.watermark
Yes
Specify the type of threshold. Valid values:
low: the low threshold. When the metric value drops below the threshold, pods are scaled in.high: the high threshold. When the metric value exceeds the threshold, pods are scaled out.
metrics.type
Yes
Specify the metric collection granularity. You can aggregate metrics by container or pod. The default is
ContainerResource.NoteCurrently, you can collect and aggregate metrics by container.
scaleResourceLimit.minimum
No
The lower limit of vertical scaling (for
cpuonly).cpu: The default is 250m (millicores).
scaleResourceLimit.maximum
No
The upper limit of vertical scaling (for
cpuonly).cpu: The default is 64 (cores).
(Optional) The advanced AVPA configurations are as follows.
Advanced configuration
You can modify the scaling step in the following AVPA configuration on demand.
apiVersion: autoscaling.alibabacloud.com/v1beta1 kind: AdvancedVerticalPodAutoscaler metadata: name: hello-avpa namespace: default spec: behavior: parallelism: 1 stabilizationWindowSeconds: 600 scaleDown: #Specify the scale-in step and policy. We recommend that you scale in resources progressively and ensure that each scale-in activity does not exceed the actual resource specifications. policies: - type: CpuPercent value: 10% periodSeconds: 60 - type: Cpus value: 500m periodSeconds: 60 selectPolicy: Max scaleUp: #Specify the scale-out step and policy. We recommend that you scale out resources quickly to reduce the number of times of scale-out. policies: - type: CpuPercent value: 60% periodSeconds: 60 - type: Cpus value: 500m periodSeconds: 60 selectPolicy: Max metricObserveWindowSeconds: 600 metrics: - containerResource: container: hello-avpa name: cpu target: averageUtilization: 30 type: Utilization type: ContainerResource watermark: low - containerResource: container: hello-avpa name: cpu target: averageUtilization: 50 type: Utilization type: ContainerResource watermark: high scaleResourceLimit: maximum: cpu: '4' minimum: cpu: '1' scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: hello-avpaThe following table describes the parameters.
The
scaleDownandscaleUpsettings inspec.behavioralso describe the scale-in and scale-out policies. The syntaxes of the settings included in them are the same.Parameter
Required
Description
policies[].type
No
Cpus: the CPU scale-out step, which is an absolute value.CpuPercent: the percentage of CPU resources to scale in based on the current CPU specification.
policies[].value
No
When
typeis set toCpus, specify an absolute value, such as250mor1.When
typeis set toCpuPercent, specify a percentage value, such as5%.
selectPolicy
No
The policy. Valid values:
Max: The highest value is specified as the step when multiple policies are configured.Min: The smallest value is specified as the step when multiple policies are configured.
metricObserveWindowSeconds
No
The data collection time window. When the value of a metric changes within the time window, the loads of the metric are calculated to determine whether scaling is needed. Unit: seconds. The default is
600and the minimum is300.behavior.parallelism
No
The scaling concurrency, which indicates the number of pods that can be concurrently scaled. The default is
1.behavior.stabilizationWindowSeconds
No
The scaling cooldown period. Unit: seconds. The default is
600and the minimum is300. The cooldown period must not be shorter thanmetricObserveWindowSeconds.AVPA calculates the resource specification of the target based on the default policy. It calculates the scaling step based on the average of
highWaterMarkandlowWaterMark. For example, a container occupies1 vCPU. If you set the high and low thresholds to60and40and the current utilization is100%, the target utilization is50%. The scale-out step is1 vCPUand the CPU resources are scaled out to2 vCPUs.and 500m whichever is greater is used. Therefore, the scale-out step is 600m. NoteSuggested configuration: scale out quickly to handle spikes and scale in progressively to ensure stability.
scaleUp: specify a large step to reduce the number of times of scale-out. For example, specify
type: Cpusandvalue: 1.scaleDown: specify a small step to scale in progressively in order to ensure stability. For example, specify
type: Cpusandvalue: 250m.Deploy the YAML file.
kubectl apply -f avpa.yaml
Step 5: Perform stress testing
Send requests to the backend pod through the traffic Ingress to generate CPU loads. Observe the workload monitoring data before and after AVPA is enabled based on the same resource specifications and loads.
Send requests to generate loads.
The tested pod occupies
2 vCPUs. Increase the loads by 50% by occupying 1,000 millicores. Keep the loads for 2,000 seconds. Increase the loads by occupying additional 200 milliseconds every 60 seconds. After 30 minutes, 4,000 millicores are occupied.# Increase the loads. The upper limit of each command is 1,000 millicores. curl --data "millicores=1000&durationSec=2000" http://localhost:28080/ConsumeCPU curl --data "millicores=1000&durationSec=2000" http://localhost:28081/ConsumeCPU # Continuously increase the loads. for i in {1..30} do sleep 60 curl --data "millicores=100&durationSec=2000" http://localhost:28080/ConsumeCPU curl --data "millicores=100&durationSec=2000" http://localhost:28081/ConsumeCPU doneMonitor the metric data.
In the left-side navigation pane, choose . On the tab, view the monitoring data.
The CPU loads continuously increase and then reach the threshold. After 10 minutes, CPU resources are scaled out to
2 vCPUs. After 30 minutes, CPU resources are scaled out to4 vCPUs. After the scale-out, the CPU loads significantly drop.
Monitor the loads of the shadow workload. The CPU resources are exhausted within a short period of time.

View pod events.
The pod events show configuration changes in the YAML template during scaling.

Simulate scale-in activities due to low loads.
After the CPU loads drop below the threshold, CPU resources are scaled in.
After the loads reach the scale-in threshold, CPU resources are scaled in at the specified step to ensure stability.
The container resources will be scaled to the minimum specification
1 vCPUclaimed in the AVPA configuration if the loads consistently remain at a low level.

(Optional) Step 6: Delete resources
Delete the workloads, Service, and AVPA resource.
kubectl delete -f hello-avpa.yaml kubectl delete -f shadow-hello-avpa.yamlIn the left-side navigation pane, choose Applications > Helm. Click Delete in the Actions column of ack-advanced-vertical-pod-autoscaler.
FAQ
How do I view the status of AVPA?
Use kubectl to query the real-time status of AVPA.
# In this example, the cluster connection information is stored in the ~/.kube/acs-test file.
# export KUBECONFIG=~/.kube/acs-test
kubectl get avpa -n [namespace] [-oyaml]Expected output:
$ kubectl get avpa
NAME TARGETTYPE TARGETNAME REPLICAS UPDATING WAITING LASTSCALED AGE
hello-avpa 1 0 0 11d 11dHow do I query pods are being scaled?
In AVPA versions later than 0.3.0, pods that are being scaled have the avpa.alibabacloud.com/resizing-lock label.
kubectl get po -n [ns] -lavpa.alibabacloud.com/resizing-lockHow do I query pods that encounter scaling failures?
In AVPA versions later than 0.3.0, scaling failures are recorded in Kubernetes events. You can query scaling failures in the Logstore by specifying the InplaceResizedTimeoutFailed keyword.
You can create alert rules for these events. For more information, see Create alert rules.