All Products
Search
Document Center

Container Service for Kubernetes:Workload scaling FAQ

Last Updated:Dec 18, 2025

This topic answers frequently asked questions about workload scaling, such as Horizontal Pod Autoscaler (HPA) and CronHPA.

Table of contents

Why does the current field of HPA monitoring data show unknown?

If the current field in the HPA monitoring data shows unknown, it indicates that the kube-controller-manager cannot access the monitoring data source to retrieve the required data. As a result, HPA scaling fails.

Name:                                                  kubernetes-tutorial-deployment
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 10 Jun 2019 11:46:48  0530
Reference:                                             Deployment/kubernetes-tutorial-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 2%
Min replicas:                                          1
Max replicas:                                          4
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
  Type     Reason                   Age                      From                       Message
  ----     ------                   ----                     ----                       -------
  Warning  FailedGetResourceMetric  3m3s (x1009 over 4h18m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)

Cause 1: The resource metrics data source is unavailable

First, you can run the kubectl top pod command to check if data is returned. If no data is returned for any pods, you can run the kubectl get apiservice command to check the status of the data source that provides resource metrics. The following code block shows a sample output.

Show sample output

NAME                                   SERVICE                      AVAILABLE   AGE
v1.                                    Local                        True        29h
v1.admissionregistration.k8s.io        Local                        True        29h
v1.apiextensions.k8s.io                Local                        True        29h
v1.apps                                Local                        True        29h
v1.authentication.k8s.io               Local                        True        29h
v1.authorization.k8s.io                Local                        True        29h
v1.autoscaling                         Local                        True        29h
v1.batch                               Local                        True        29h
v1.coordination.k8s.io                 Local                        True        29h
v1.monitoring.coreos.com               Local                        True        29h
v1.networking.k8s.io                   Local                        True        29h
v1.rbac.authorization.k8s.io           Local                        True        29h
v1.scheduling.k8s.io                   Local                        True        29h
v1.storage.k8s.io                      Local                        True        29h
v1alpha1.argoproj.io                   Local                        True        29h
v1alpha1.fedlearner.k8s.io             Local                        True        5h11m
v1beta1.admissionregistration.k8s.io   Local                        True        29h
v1beta1.alicloud.com                   Local                        True        29h
v1beta1.apiextensions.k8s.io           Local                        True        29h
v1beta1.apps                           Local                        True        29h
v1beta1.authentication.k8s.io          Local                        True        29h
v1beta1.authorization.k8s.io           Local                        True        29h
v1beta1.batch                          Local                        True        29h
v1beta1.certificates.k8s.io            Local                        True        29h
v1beta1.coordination.k8s.io            Local                        True        29h
v1beta1.events.k8s.io                  Local                        True        29h
v1beta1.extensions                     Local                        True        29h
...
[v1beta1.metrics.k8s.io                 kube-system/metrics-server   True        29h]
...
v1beta1.networking.k8s.io              Local                        True        29h
v1beta1.node.k8s.io                    Local                        True        29h
v1beta1.policy                         Local                        True        29h
v1beta1.rbac.authorization.k8s.io      Local                        True        29h
v1beta1.scheduling.k8s.io              Local                        True        29h
v1beta1.storage.k8s.io                 Local                        True        29h
v1beta2.apps                           Local                        True        29h
v2beta1.autoscaling                    Local                        True        29h
v2beta2.autoscaling                    Local                        True        29h

If the API Service that corresponds to v1beta1.metrics.k8s.io is not kube-system/metrics-server, check if it was overwritten by a prometheus operator installation. If it was overwritten, you can deploy the following YAML template to restore the service.

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

If the issue persists, make sure that you have installed the metrics-server component on the Operations Management > Add-ons page of your cluster. For more information, see metrics-server.

Cause 2: Data cannot be retrieved during a rolling deployment or scale-out

By default, the collection epoch for the metrics-server component is 1 minute. After a scale-out or update is complete, metrics-server cannot retrieve monitoring data for a period of time. You can check again about 2 minutes after the scale-out or update is complete.

Cause 3: The request field is not configured

By default, HPA calculates utilization using the formula: actual usage / request. Check if the request field is included in the resource field of the pod.

Cause 4: The metric name is incorrect

Check if the metric name is correct, including its case. For example, if you specify the HPA-supported metric cpu as CPU, the current field of the monitoring data displays unknown.

What do I do if HPA scaling fails due to abnormal metric retrieval?

HPA scaling may fail because of an abnormal metric retrieval. When this occurs, the current field of the HPA monitoring data displays unknown. In this case, HPA cannot retrieve the metrics required for scaling decisions and cannot adjust the number of pods. For more information about how to troubleshoot this issue, see Node autoscaling FAQ.

Why does HPA scale out excess pods during a rolling deployment?

During a rolling deployment, the community Controller Manager performs zero-filling for pods that do not have monitoring data. This may cause HPA to scale out too many pods. You can use the following configurations to prevent this issue.

Cluster-level configuration

You can upgrade the ACK-provided metrics-server to the latest version and enable a switch in the startup parameters of metrics-server to prevent this issue.

This is a global switch. After you set this switch, it takes effect for all related workloads in the cluster.

# Add the following option to the startup parameters of metrics-server.
--enable-hpa-rolling-update-skipped=true  

Workload-level configuration

If you want to prevent this issue for only specific workloads, you can use one of the following two methods.

  • Method 1: To temporarily pause Horizontal Pod Autoscaler (HPA) evaluation during a rolling deployment, add the following annotation to the specified workload's template.

    # Add the annotation to spec.template.metadata.annotations of the workload to temporarily pause HPA evaluation during a rolling deployment.
    HPARollingUpdateSkipped: "true"

    Show sample code

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-basic
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
        template:
            metadata:
              labels:
                app: nginx
              annotations:
                HPARollingUpdateSkipped: "true"  # Skip HPA evaluation during a rolling deployment.
            spec:
              containers:
              - name: nginx
                image: nginx:1.7.9
                ports:
                - containerPort: 80
  • Method 2: You can add the following annotation to the template of the specified workload to skip a configured ramp-up period at the beginning of an application deployment.

    # Add the annotation to spec.template.metadata.annotations of the workload to skip a configured ramp-up period at the beginning of an application deployment.
    HPAScaleUpDelay: 3m # 3m is an example. Set the time period as needed.

    Show sample code

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx-deployment-basic
      labels:
        app: nginx
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
        template:
            metadata:
              labels:
                app: nginx
              annotations:
                HPAScaleUpDelay: 3m  # m stands for minutes. This indicates that HPA takes effect 3 minutes after the pod is created. Valid units are s (seconds) and m (minutes).
            spec:
              containers:
              - name: nginx
                image: nginx:1.7.9
                ports:
                - containerPort: 80

Why does HPA not scale when a threshold is reached?

HPA scaling is triggered not only when CPU or memory usage exceeds or falls below a threshold. HPA also considers whether a scale-out or scale-in action might immediately trigger the opposite action. This helps reduce fluctuations.

For example, if your scale-out threshold is set to 80% and you have two pods that both have 70% CPU usage, a scale-in does not occur. This is because after a scale-in, the single remaining pod might have a CPU usage higher than 80%, which would trigger a scale-out. This behavior prevents repeated scaling fluctuations.

How do I configure the HPA data retrieval epoch?

For metrics-server versions later than v0.2.1-b46d98c-aliyun, you can set --metric-resolution in the startup parameters of metrics-server. For example, --metric-resolution=15s.

Is CronHPA compatible with HPA? How do I make them compatible?

Yes, CronHPA is compatible with HPA. Container Service for Kubernetes (ACK) sets the `scaleTargetRef` in CronHPA to the HPA object. Then, ACK uses the HPA object to find the real `scaleTargetRef`. This allows CronHPA to be aware of the current state of HPA. CronHPA does not directly adjust the number of replicas of a deployment. Instead, it uses HPA to manage the deployment. This approach avoids conflicts between HPA and CronHPA. For more information about how to make CronHPA compatible with HPA, see Enable CronHPA to work with HPA.

How do I resolve the issue where HPA scales out excess pods due to CPU or memory spikes on startup?

For languages and frameworks that require a ramp-up period, such as Java, CPU and memory usage can spike for several minutes when a container starts. This may cause HPA to be triggered incorrectly. To resolve this issue, you can upgrade the metrics-server component that is provided by ACK to version 0.3.9.6 or later and add a switch in the pod's annotation to prevent incorrect triggers. For more information about how to upgrade the metrics-server component, see Upgrade the metrics-server component before you upgrade a cluster to Kubernetes 1.12.

The following sample YAML file shows a deployment that has a switch added to prevent incorrect triggers.

Show sample YAML

## A Deployment is used as an example.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-basic
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        HPAScaleUpDelay: 3m # m stands for minutes. This indicates that HPA takes effect 3 minutes after the pod is created. Valid units are s (seconds) and m (minutes).
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # Replace it with your exactly <image_name:tags>.
        ports:
        - containerPort: 80 

Why did HPA scale the application when the value in the audit log did not reach the threshold?

Cause

HPA calculates the scaling ratio based on the current and desired metrics: desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue)).

This formula shows that the accuracy of the desired number of replicas depends on the accuracy of the current number of replicas, the current metric, and the desired metric. Take resource metrics, which are widely used in HPA, as an example. When HPA retrieves the current number of replicas, it first obtains the scale sub-resource (subResources) of the object defined by scaleTargetRef. Then, HPA converts the value of the Selector in the status of the scale object into a labelselector and uses it as a condition to match and retrieve pods. If the pods retrieved with this condition do not fully belong to the object defined in scaleTargetRef, the calculated desired number of replicas may not meet expectations. For example, the application may be scaled out even if the real-time metric is below the threshold.

Common reasons for an inaccurate number of matched pods include the following:

  • A rolling deployment is in progress.

  • Other pods that do not belong to the object in `scaleTargetRef` are tagged with the same label. You can run the following command to check if other such pods exist.

    kubectl get pods -n {namespace_name} -l {value_of_status.selector_of_scale_sub-resource}

Solution

  • For issues related to rolling deployments, see Node autoscaling FAQ.

  • For issues where other pods that do not belong to the object in `scaleTargetRef` are tagged with the same label, you must locate these pods. If they are still in use, you can change their labels. If they are no longer needed, you can delete them.

Can I determine the scale-in order of pods for HPA?

No, you cannot. HPA can automatically increase or decrease the number of pods based on defined metrics, but it cannot directly determine which pods are terminated first. The termination order of pods, graceful shutdown time, and other attributes are determined by the controller that manages the pods.

In scenarios such as hybrid deployments of ECS and ECI instances or multi-node pools, you can configure a ResourcePolicy for your application to specify the scale-in priority. HPA can be used with Deployments and Resource Policies to preferentially release ECI node resources and then ECS resources during a scale-in. For more information, see Custom elastic resource priority scheduling.

What do the units of HPA utilization metrics mean?

Utilization metrics are typically unitless integers or integers that use m as the unit, with a conversion rate of 1000m to 1. For example, a tcp_connection_counts value of 70000m is equivalent to 70.

What do I do if the target column shows unknown after I run kubectl get hpa?

You can perform the following operations to resolve the issue.

  1. Run kubectl describe hpa <hpa_name> to confirm the reason for the HPA failure.

    • If the Conditions field shows that AbleToScale is False, confirm that the deployment is normal.

    • If the Conditions field shows that ScalingActive is False, proceed to the next step.

  2. Run kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/". If Error from server (NotFound): the server could not find the requested resource is returned, confirm the startup status of alibaba-cloud-metrics-adapter.

    If alibaba-cloud-metrics-adapter is in the Normal state, check if the HPA metrics are related to an Ingress. If they are, you must deploy the Simple Log Service component in advance. For more information, see Collect and analyze Nginx Ingress access logs.

  3. Confirm that the HPA metric is entered correctly. The value of sls.ingress.route must be in the <namespace>-<svc>-<port> format.

    • namespace: the namespace where the Ingress is located.

    • svc: the name of the Service that corresponds to the Ingress.

    • port: the name of the port of the Service that corresponds to the Ingress.

How do I find the metric names that HPA supports?

For more information, see Alibaba Cloud HPA metrics. The following table lists commonly used metrics.

Metric name

Description

Additional parameter

sls_ingress_qps

The queries per second (QPS) of the specified IngressRoute.

sls.ingress.route

sls_alb_ingress_qps

The QPS of the IngressRoute for ALB data.

sls.ingress.route

sls_ingress_latency_avg

The latency of all requests.

sls.ingress.route

sls_ingress_latency_p50

The latency of 50% of requests.

sls.ingress.route

sls_ingress_latency_p95

The latency of 95% of requests.

sls.ingress.route

sls_ingress_latency_p99

The latency of 99% of requests.

sls.ingress.route

sls_ingress_latency_p9999

The latency of 99.99% of requests.

sls.ingress.route

sls_ingress_inflow

The inbound bandwidth of the Ingress.

sls.ingress.route

How do I adapt the configuration after I customize the Nginx Ingress log format?

For more information about how to use SLS Ingress metrics for horizontal pod autoscaling, see Horizontal pod autoscaling based on metrics of Alibaba Cloud services. You must enable and correctly configure Nginx Ingress log collection to Simple Log Service for your cluster.

  • Simple Log Service is enabled by default when you create a cluster. If you keep the default settings, you can view Nginx Ingress access log analysis reports and monitor the real-time status of Nginx Ingress in the Simple Log Service console after the cluster is created.

  • If you manually disable Simple Log Service when you create a cluster, you must re-enable or configure Simple Log Service after the cluster is created if you want to use SLS Ingress metrics for horizontal pod autoscaling. For more information, see Collect and analyze Nginx Ingress access logs.

  • To customize the Nginx Ingress log format, the AliyunLogConfig Custom Resource Definition (CRD) that is deployed when you first enable Simple Log Service in the cluster takes effect only for the log format in the default ACK Ingress Controller. If you modified the access log format of the Ingress Controller, you must modify the processor_regex part of the regular expression extraction in the CRD configuration. For more information, see Use a DaemonSet and a CRD to collect container logs.

How do I obtain the QPS metric sls_ingress_qps using the command line?

You can query data using the following command. The `sls_ingress_qps` metric is used as an example.

kubectl get --raw  /apis/external.metrics.k8s.io/v1beta1/namespaces/*/sls_ingress_qps?labelSelector=sls.project={{SLS_Project}},sls.logstore=nginx-ingress

In the command, {{SLS_Project}} is the name of the SLS project that corresponds to the ACK cluster. If you do not customize the configuration, the default name is k8s-log-{{ClusterId}}, where {{ClusterId}} is the ID of the cluster.

If the following result is returned:

Error from server: {
    "httpCode": 400,
    "errorCode": "ParameterInvalid",
    "errorMessage": "key (slb_pool_name) is not config as key value config,if symbol : is  in your log,please wrap : with quotation mark \"",
    "requestID": "xxxxxxx"
}

This indicates that no data is available for this metric. This may occur because you did not configure an ALB Ingress but used the `sls_alb_ingress_qps` metric to query data.

If the output is similar to the following:

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "metricName": "sls_ingress_qps",
      "timestamp": "2025-02-26T16:45:00Z", 
      "value": "50",   # The QPS value
      "metricLabels": {
        "sls.project": "your-sls-project-name",
        "sls.logstore": "nginx-ingress"
      }
    }
  ]
}

This indicates that the external Kubernetes QPS metric was found, and its value is specified in the value field.

Failed to pull alibaba-cloud-metrics-adapter image

Symptom

When you upgrade the ack-alibaba-cloud-metrics-adapter component to version 1.3.7, an error occurs when you pull the image. The error message is similar to the following:

Failed to pull image "registry-<region-id>-vpc.ack.aliyuncs.com/acs/alibaba-cloud-metrics-adapter-amd64:v0.2.9-ba634de-aliyun".

Cause

The ack-alibaba-cloud-metrics-adapter component does not support direct updates.

Solution

You can upgrade the component by following these steps.

  1. Back up the current component configuration.

  2. Uninstall the old version of the component.

  3. Use the backup configuration to install the latest version of the component.

Important

During the uninstallation and reinstallation, related HPA policies pause scaling because monitoring data retrieval is stopped.