All Products
Search
Document Center

Container Compute Service:In-place vertical scaling based on CPU metrics

Last Updated:Jun 09, 2025

ack-advanced-vertical-pod-autoscaler (AVPA) is a in-placing scaling component provided by Alibaba cloud. It supports metrics-based in-place scaling and application startup acceleration. This topic describes vertical scaling based on CPU metrics and introduces the use scenarios, configuration, and limits of this feature.

Background information

AVPA supports startup acceleration and metrics-based vertical scaling. You can use these capabilities together with ACS in-place scaling to scale pod resources without business interruptions. This enables you to handle hotspots in business loads.

Comparison between AVPA and open source VPA

VPA

AVPA

Scaling method

Recreate pods in one after one.

Perform in-place hot upgrade.

Scaling scope

Upgrade or downgrade all pods.

Upgrade or downgrade separately.

Applicable workloads

  • Deployments.

  • StatefulSets (with limited effect).

  • CronJobs.

  • Deployments.

  • StatefulSets.

  • Jobs. (supported by avpa0.2.0)

  • Configure a selector to select pods created by any workloads. (supported by avpa0.3.0)

Use scenarios

Long-term and balanced workload changes.

Periodical workload changes, unexpected workload fluctuations, and unbalanced workloads.

Use scenarios

  • Gaming businesses: hot upgrade and downgrade for periodically fluctuating CPU workloads.

  • Online businesses: vertically scale hotspot pods.

Limits

  • AVPA cannot be used together with open source VPA or HPA.

  • Vertical scaling based on CPU metrics is supported. If the CPU specifications no longer meet the ACS requirements after scaling, the scaling request is rejected. For example, the system cannot upgrade from 2vCpu2Gi to 3vCpu2Gi. In addition, you must configure resource requests for the containers to be scaled.

  • Vertical scaling automatically adjusts the CPU requests and limits of pods. In ACK and ACK serverless clusters, vertical scaling may be constrained due to the limited node resources.

  • In ACK and ACK serverless clusters, the version of ack-virtual-node must be at least v2.14.0.

  • AVPA does not create profiles and takes effect only on existing pods. Therefore, new pods are created based on the original resource specifications defined in the workload.

Note
  • In-place scaling is in invitational preview. To use this feature, submit a ticket.

  • You can perform in-place scaling for ACS pods of the ComputeClass=general-purpose and ComputeQoS=default types.

    • You can scale an ACS pod in the range of 50% to 100% of the original resources.

    • Currently, you can scale up only to at most 16 vCPUs.

    For example, you can scale an ACS pod with 4 vCPUs and 8 GiB memory in the range of 2 vCPUs and 8 GiB to 8 vCPUs and 8 GiB.

Procedure

In this topic, a sample workload and a shadow workload are used to demonstrate the procedure. The shadow workload is optional. In the following procedure, AVPA is configured for the sample workload. The shadow workload uses the same resource specifications as the sample workload but does not have AVPA configured. The example simulates CPU loads to demonstrate vertical scaling. The following figure shows the procedure.

image

The sample workload uses a Service as the traffic Ingress, including a load simulation tool. The tool can trigger the API to consume the specified amount of CPU resources, such as 500 millicores (0.5 core) within 6,000 seconds.

Procedure

Step 1: Enable the in-place scaling feature gate

  1. Log on to the ACS console. In the left-side navigation pane, click Clusters.

  2. On the Clusters page, find the cluster that you want to manage and click its ID. In the left-side navigation pane of the cluster details page, choose Operations > Add-ons.

  3. On the Core Components tab, select Kube API Server > Configuration. Set featureGates to InPlacePodVerticalScaling=true to enable the in-place scaling feature gate.

    image

    Note

    If the status of the Kube API Server card displays Executing, the configuration is in progress. After the status changes to Installed, the feature gate is enabled.

Step 2: Install AVPA

In the left-side navigation pane, choose Applications > Helm, and find and install ack-advanced-vertical-pod-autoscaler. For more information, see Use Helm to manage applications in ACS.

image

Step 3: Deploy the workload and create a local connection

  1. Create a YAML file. You can also create a YAML file named shadow-hello-avpa.yaml for stress testing.

    Create a YAML file named hello-avpa.yaml.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: hello-avpa
      namespace: default
    spec:
      replicas: 1
      selector:
        matchLabels:
          name: hello-avpa
      template:
        metadata:
          annotations:
            scaling.alibabacloud.com/enable-inplace-resource-resize: 'true'
          labels:
            name: hello-avpa
            vpa: enabled
        spec:
          containers:
            - image: 'registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/simulation-resource-consumer:1.13'
              name: hello-avpa
              resources:
                limits:
                  cpu: '2'
                  memory: '4Gi'
                requests:
                  cpu: '2'
                  memory: '4Gi'
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: hello-avpa-svc
      namespace: default
    spec:
      ports:
        - port: 80
          protocol: TCP
          targetPort: 8080
      selector:
        name: hello-avpa
      type: ClusterIP

    (Optional) Create a YAML file named shadow-hello-avpa.yaml.

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: shadow-hello-avpa
      namespace: default
    spec:
      replicas: 1
      selector:
        matchLabels:
          name: shadow-hello-avpa
      template:
        metadata:
          annotations:
            scaling.alibabacloud.com/enable-inplace-resource-resize: 'true'
          labels:
            vpa: enabled
            name: shadow-hello-avpa
        spec:
          containers:
            - image: 'registry.cn-hangzhou.aliyuncs.com/acs-demo-ns/simulation-resource-consumer:1.13'
              name: shadow-hello-avpa
              resources:
                limits:
                  cpu: '2'
                  memory: '4Gi'
                requests:
                  cpu: '2'
                  memory: '4Gi'
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: shadow-avpa-svc
      namespace: default
    spec:
      ports:
        - port: 80
          protocol: TCP
          targetPort: 8080
      selector:
        name: shadow-hello-avpa
      type: ClusterIP
  2. Deploy the workload.

    kubectl apply -f hello-avpa.yaml
    kubectl apply -f shadow-hello-avpa.yaml
  3. Run kubectl port-forward to create a local connection.

    Important

    Port forwarding set up by using kubectl port-forward is not reliable, secure, or extensible in production environments. It is only for development and debugging. Do not use this command to set up port forwarding in production environments. For more information about networking solutions used for production in ACK clusters, see Ingress management.

    kubectl port-forward svc/hello-avpa-svc -n default 28080:80
    kubectl port-forward svc/shadow-avpa-svc -n default 28081:80

Step 4: Configure AVPA

You can create an AdvancedVerticalPodAutoscaler resource to configure elastic scaling.

  1. Create a YAML file named avpa.yaml.

    TargetRef mode

    The following scaling configuration is based on the TargetRef mode of AVPA to adjust the resources of the pods created by the Deployment named hello-avpa in the default namespace based on the CPU utilization.

    apiVersion: autoscaling.alibabacloud.com/v1beta1
    kind: AdvancedVerticalPodAutoscaler
    metadata:
      name: hello-avpa
      namespace: default
    spec:
      metrics:
        - containerResource:
            container: hello-avpa
            name: cpu
            target:
              averageUtilization: 30
              type: Utilization
          type: ContainerResource
          watermark: low
        - containerResource:
            container: hello-avpa
            name: cpu
            target:
              averageUtilization: 50
              type: Utilization
          type: ContainerResource
          watermark: high
      scaleResourceLimit:
        maximum:
          cpu: '4'
        minimum:
          cpu: '1'
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: hello-avpa

    The following table describes some of the parameters.

    Parameter

    Required

    Description

    scaleTargetRef

    Yes

    The target workload. Currently, AdvancedStatefulSet and CloneSet for Kubernetes-native Deployments, StatefulSets, Jobs, and OpenKruise are supported.

    metrics.containerResource

    Yes

    Specify the type and utilization threshold of the resource whose metrics are collected:

    • container: the name of the container from which metrics are collected.

    • name: the name of the metric to collect. Only CPU metrics are supported.

    • target: the threshold information.

      • type: set to Utilization.

      • averageUtilization: the average utilization threshold.

    metrics.watermark

    Yes

    Specify the type of threshold. Valid values:

    • low: the low threshold. When the metric value drops below the threshold, pods are scaled in.

    • high: the high threshold. When the metric value exceeds the threshold, pods are scaled out.

    metrics.type

    Yes

    Specify the metric collection granularity. You can aggregate metrics by container or pod. The default is ContainerResource.

    Note

    Currently, you can collect and aggregate metrics by container.

    scaleResourceLimit.minimum

    No

    The lower limit of vertical scaling (for cpu only).

    • cpu: The default is 250m (millicores).

    scaleResourceLimit.maximum

    No

    The upper limit of vertical scaling (for cpu only).

    • cpu: The default is 64 (cores).

    Selector mode

    The following Selector mode is introduced in AVPA 0.2.0, which provides a more flexible selector to simplify the pod configuration.

    apiVersion: autoscaling.alibabacloud.com/v1beta1
    kind: AdvancedVerticalPodAutoscaler
    metadata:
      name: hello-avpa
      namespace: default
    spec:
      metrics:
        - containerResource:
            container: "*"
            name: cpu
            target:
              averageUtilization: 30
              type: Utilization
          type: ContainerResource
          watermark: low
        - containerResource:
            container: "*"
            name: cpu
            target:
              averageUtilization: 50
              type: Utilization
          type: ContainerResource
          watermark: high
      scaleResourceLimit:
        maximum:
          cpu: '4'
        minimum:
          cpu: '1'
      # You can modify the following sample selector configuration on demand.
      selector:
        matchLabels:
          vpa: enabled
        matchExpressions:
        # A label that applies to all ACS pods.
        - key: alibabacloud.com/compute-class 
          operator: Exists
        - key: name 
          operator: In
          values: 
          - hello-avpa
        # A reserved switch to disable AVPA for certain pods.
        - key: alibabacloud.com/disable-avpa 
          operator: DoesNotExist

    The following table describes the parameters.

    Unlike scaleTargetRef that takes effect only on one workload, the selector can select any pods to meet scaling requirements in different scenarios.

    Parameter

    Required

    Description

    selector

    No

    The selector and scaleTargetRef are mutually exclusive. The selector can simplify the AVPA configuration to centrally manage pods created by a type of workload. This saves you the need to configure AVPA for each workload.

    metrics.containerResource

    Yes

    Specify the type and utilization threshold of the resource whose metrics are collected:

    • container: the name of the container from which metrics are collected.

    Note

    You can enter a wildcard character (*) to match all containers.

    If a container matches both the wildcard expression and a specific container match rule, the container match rule prevails.

    • name: the name of the metric to collect. Only CPU metrics are supported.

    • target: the threshold information.

      • type: set to Utilization.

      • averageUtilization: the average utilization threshold.

    metrics.watermark

    Yes

    Specify the type of threshold. Valid values:

    • low: the low threshold. When the metric value drops below the threshold, pods are scaled in.

    • high: the high threshold. When the metric value exceeds the threshold, pods are scaled out.

    metrics.type

    Yes

    Specify the metric collection granularity. You can aggregate metrics by container or pod. The default is ContainerResource.

    Note

    Currently, you can collect and aggregate metrics by container.

    scaleResourceLimit.minimum

    No

    The lower limit of vertical scaling (for cpu only).

    • cpu: The default is 250m (millicores).

    scaleResourceLimit.maximum

    No

    The upper limit of vertical scaling (for cpu only).

    • cpu: The default is 64 (cores).

  2. (Optional) The advanced AVPA configurations are as follows.

    Advanced configuration

    You can modify the scaling step in the following AVPA configuration on demand.

    apiVersion: autoscaling.alibabacloud.com/v1beta1
    kind: AdvancedVerticalPodAutoscaler
    metadata:
      name: hello-avpa
      namespace: default
    spec:
      behavior:
        parallelism: 1
        stabilizationWindowSeconds: 600
        scaleDown: #Specify the scale-in step and policy. We recommend that you scale in resources progressively and ensure that each scale-in activity does not exceed the actual resource specifications.
          policies:
          - type: CpuPercent
            value: 10%
            periodSeconds: 60
          - type: Cpus
            value: 500m
            periodSeconds: 60
          selectPolicy: Max
        scaleUp: #Specify the scale-out step and policy. We recommend that you scale out resources quickly to reduce the number of times of scale-out.
          policies:
          - type: CpuPercent
            value: 60%
            periodSeconds: 60
          - type: Cpus
            value: 500m
            periodSeconds: 60
          selectPolicy: Max
      metricObserveWindowSeconds: 600
      metrics:
        - containerResource:
            container: hello-avpa
            name: cpu
            target:
              averageUtilization: 30
              type: Utilization
          type: ContainerResource
          watermark: low
        - containerResource:
            container: hello-avpa
            name: cpu
            target:
              averageUtilization: 50
              type: Utilization
          type: ContainerResource
          watermark: high
      scaleResourceLimit:
        maximum:
          cpu: '4'
        minimum:
          cpu: '1'
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: hello-avpa

    The following table describes the parameters.

    The scaleDown and scaleUp settings in spec.behavior also describe the scale-in and scale-out policies. The syntaxes of the settings included in them are the same.

    Parameter

    Required

    Description

    policies[].type

    No

    • Cpus: the CPU scale-out step, which is an absolute value.

    • CpuPercent: the percentage of CPU resources to scale in based on the current CPU specification.

    policies[].value

    No

    • When type is set to Cpus, specify an absolute value, such as 250m or 1.

    • When type is set to CpuPercent, specify a percentage value, such as 5%.

    selectPolicy

    No

    The policy. Valid values:

    • Max: The highest value is specified as the step when multiple policies are configured.

    • Min: The smallest value is specified as the step when multiple policies are configured.

    metricObserveWindowSeconds

    No

    The data collection time window. When the value of a metric changes within the time window, the loads of the metric are calculated to determine whether scaling is needed. Unit: seconds. The default is 600 and the minimum is 300.

    behavior.parallelism

    No

    The scaling concurrency, which indicates the number of pods that can be concurrently scaled. The default is 1.

    behavior.stabilizationWindowSeconds

    No

    The scaling cooldown period. Unit: seconds. The default is 600 and the minimum is 300. The cooldown period must not be shorter than metricObserveWindowSeconds.

    AVPA calculates the resource specification of the target based on the default policy. It calculates the scaling step based on the average of highWaterMark and lowWaterMark. For example, a container occupies 1 vCPU. If you set the high and low thresholds to 60 and 40 and the current utilization is 100%, the target utilization is 50%. The scale-out step is 1 vCPU and the CPU resources are scaled out to 2 vCPUs.

    and 500m whichever is greater is used. Therefore, the scale-out step is 600m.

    Note

    Suggested configuration: scale out quickly to handle spikes and scale in progressively to ensure stability.

    scaleUp: specify a large step to reduce the number of times of scale-out. For example, specify type: Cpus and value: 1.

    scaleDown: specify a small step to scale in progressively in order to ensure stability. For example, specify type: Cpus and value: 250m.

  3. Deploy the YAML file.

    kubectl apply -f avpa.yaml

Step 5: Perform stress testing

Send requests to the backend pod through the traffic Ingress to generate CPU loads. Observe the workload monitoring data before and after AVPA is enabled based on the same resource specifications and loads.

  1. Send requests to generate loads.

    The tested pod occupies 2 vCPUs. Increase the loads by 50% by occupying 1,000 millicores. Keep the loads for 2,000 seconds. Increase the loads by occupying additional 200 milliseconds every 60 seconds. After 30 minutes, 4,000 millicores are occupied.

    # Increase the loads. The upper limit of each command is 1,000 millicores.
    curl --data "millicores=1000&durationSec=2000" http://localhost:28080/ConsumeCPU
    curl --data "millicores=1000&durationSec=2000" http://localhost:28081/ConsumeCPU
    # Continuously increase the loads.
    for i in {1..30}
    do
      sleep 60
      curl --data "millicores=100&durationSec=2000" http://localhost:28080/ConsumeCPU
      curl --data "millicores=100&durationSec=2000" http://localhost:28081/ConsumeCPU
    done
  2. Monitor the metric data.

    In the left-side navigation pane, choose Operations > Prometheus Monitoring. On the Application Monitoring > Deployments tab, view the monitoring data.

    1. The CPU loads continuously increase and then reach the threshold. After 10 minutes, CPU resources are scaled out to 2 vCPUs. After 30 minutes, CPU resources are scaled out to 4 vCPUs. After the scale-out, the CPU loads significantly drop.

      image

    2. Monitor the loads of the shadow workload. The CPU resources are exhausted within a short period of time.

      image

  3. View pod events.

    The pod events show configuration changes in the YAML template during scaling.

    image

  4. Simulate scale-in activities due to low loads.

    After the CPU loads drop below the threshold, CPU resources are scaled in.

    • After the loads reach the scale-in threshold, CPU resources are scaled in at the specified step to ensure stability.

    • The container resources will be scaled to the minimum specification 1 vCPU claimed in the AVPA configuration if the loads consistently remain at a low level.

    image

(Optional) Step 6: Delete resources

  1. Delete the workloads, Service, and AVPA resource.

    kubectl delete -f hello-avpa.yaml
    kubectl delete -f shadow-hello-avpa.yaml
  2. In the left-side navigation pane, choose Applications > Helm. Click Delete in the Actions column of ack-advanced-vertical-pod-autoscaler.

FAQ

How do I view the status of AVPA?

Use kubectl to query the real-time status of AVPA.

# In this example, the cluster connection information is stored in the ~/.kube/acs-test file.
# export KUBECONFIG=~/.kube/acs-test 
kubectl get avpa -n [namespace] [-oyaml]

Expected output:

$ kubectl get avpa
NAME         TARGETTYPE   TARGETNAME   REPLICAS   UPDATING   WAITING   LASTSCALED   AGE
hello-avpa                             1          0          0         11d          11d

How do I query pods are being scaled?

In AVPA versions later than 0.3.0, pods that are being scaled have the avpa.alibabacloud.com/resizing-lock label.

kubectl get po -n  [ns] -lavpa.alibabacloud.com/resizing-lock

How do I query pods that encounter scaling failures?

In AVPA versions later than 0.3.0, scaling failures are recorded in Kubernetes events. You can query scaling failures in the Logstore by specifying the InplaceResizedTimeoutFailed keyword.

Note

You can create alert rules for these events. For more information, see Create alert rules.