K8s Workload Scaling FAQ and Solutions - Container Service for Kubernetes

This topic describes common issues and solutions when you use the workload scaling feature, which includes Horizontal Pod Autoscaler (HPA), CronHPA, and other methods.

Index

Why is the `current` field for HPA monitoring data displayed as `unknown`?
What should I do if HPA scaling fails due to abnormal metric retrieval?
Why does HPA scale out extra pods during a rolling deployment?
Why does HPA not scale when the threshold is reached?
How do I configure the HPA collection cycle?
Is CronHPA compatible with HPA? How do I make it compatible?
How do I prevent HPA from scaling out extra pods due to high CPU or memory usage during startup?
Why does HPA scale even if the audit log values do not reach the threshold?
Can I determine the pod scale-in order during an HPA scale-in?
What do the units of HPA utilization metrics mean?
What should I do if the `target` column is `unknown` after I run `kubectl get hpa`?
How do I find the metric names supported by HPA?
How do you adapt after customizing the NGINX Ingress log format?
How do I obtain the Prometheus data request URL?
How do I obtain the sls_ingress_qps metric using the command line?
How do I manage a VPA that is installed with kubectl using the console?
Failed to pull the alibaba-cloud-metrics-adapter image

Why is the `current` field of HPA monitoring data displayed as `unknown`?

If the current field for HPA monitoring data shows unknown, it means that the kube-controller-manager cannot access the monitoring data source to retrieve data. As a result, HPA scaling fails.

Name:                                                  kubernetes-tutorial-deployment
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 10 Jun 2019 11:46:48  0530
Reference:                                             Deployment/kubernetes-tutorial-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 2%
Min replicas:                                          1
Max replicas:                                          4
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
  Type     Reason                   Age                      From                       Message
  ----     ------                   ----                     ----                       -------
  Warning  FailedGetResourceMetric  3m3s (x1009 over 4h18m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)

Cause 1: Resource Metrics data source is unavailable

Run the kubectl top pod command to check if data is returned. If no data is returned for any pod, run the kubectl get apiservice command to check the status of the Resource Metrics data source. The following example shows the returned data.

Expand to view example data

NAME                                   SERVICE                      AVAILABLE   AGE
v1.                                    Local                        True        29h
v1.admissionregistration.k8s.io        Local                        True        29h
v1.apiextensions.k8s.io                Local                        True        29h
v1.apps                                Local                        True        29h
v1.authentication.k8s.io               Local                        True        29h
v1.authorization.k8s.io                Local                        True        29h
v1.autoscaling                         Local                        True        29h
v1.batch                               Local                        True        29h
v1.coordination.k8s.io                 Local                        True        29h
v1.monitoring.coreos.com               Local                        True        29h
v1.networking.k8s.io                   Local                        True        29h
v1.rbac.authorization.k8s.io           Local                        True        29h
v1.scheduling.k8s.io                   Local                        True        29h
v1.storage.k8s.io                      Local                        True        29h
v1alpha1.argoproj.io                   Local                        True        29h
v1alpha1.fedlearner.k8s.io             Local                        True        5h11m
v1beta1.admissionregistration.k8s.io   Local                        True        29h
v1beta1.alicloud.com                   Local                        True        29h
v1beta1.apiextensions.k8s.io           Local                        True        29h
v1beta1.apps                           Local                        True        29h
v1beta1.authentication.k8s.io          Local                        True        29h
v1beta1.authorization.k8s.io           Local                        True        29h
v1beta1.batch                          Local                        True        29h
v1beta1.certificates.k8s.io            Local                        True        29h
v1beta1.coordination.k8s.io            Local                        True        29h
v1beta1.events.k8s.io                  Local                        True        29h
v1beta1.extensions                     Local                        True        29h
...
[v1beta1.metrics.k8s.io                 kube-system/metrics-server   True        29h]
...
v1beta1.networking.k8s.io              Local                        True        29h
v1beta1.node.k8s.io                    Local                        True        29h
v1beta1.policy                         Local                        True        29h
v1beta1.rbac.authorization.k8s.io      Local                        True        29h
v1beta1.scheduling.k8s.io              Local                        True        29h
v1beta1.storage.k8s.io                 Local                        True        29h
v1beta2.apps                           Local                        True        29h
v2beta1.autoscaling                    Local                        True        29h
v2beta2.autoscaling                    Local                        True        29h

If the API Service that corresponds to v1beta1.metrics.k8s.io is not kube-system/metrics-server, check if it was overwritten by a Prometheus Operator installation. If so, you can deploy the following YAML template to restore it.

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.metrics.k8s.io
spec:
  service:
    name: metrics-server
    namespace: kube-system
  group: metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

If the issue persists, confirm that you have installed the metrics-server component on the Operations > Add-ons page of your cluster. For more information, see metrics-server.

Cause 2: Data cannot be retrieved during rolling deployment or scale-out

By default, the metrics-server collection cycle is 1 minute. After a scale-out or update, metrics-server cannot retrieve monitoring data for a short period. You can check again approximately 2 minutes after the scale-out or update is complete.

Cause 3: The `request` field is not configured

By default, HPA uses the formula actual utilization/request to calculate the utilization value. Check whether the resource field of the pod contains the request field.

Cause 4: Incorrect metric name

Check if the metric name is correct, including the case. For example, if you enter CPU instead of the HPA-supported cpu metric, the current field of the monitoring data will show unknown.

What if HPA scaling fails due to abnormal metric retrieval?

HPA scaling may fail due to abnormal metric retrieval, which causes the current field for HPA monitoring data to show unknown. In this case, HPA cannot obtain the metrics required for scaling decisions and cannot adjust the number of replicas. For more information about how to troubleshoot this issue, see Node Autoscaling FAQ.

Why does HPA scale out extra pods during rolling deployment?

During a rolling deployment, the community Controller Manager performs zero-padding for pods that do not have monitoring data. This can cause HPA to scale out extra pods. You can use the following configurations to prevent this issue.

Cluster-level configuration

Upgrade to the latest version of the metrics-server component provided by ACK and enable the switch in the metrics-server startup parameters to prevent extra pods from being scaled out.

This is a global switch that takes effect for all relevant workloads in the cluster after it is set.

# Add the following options to the metrics-server startup parameters.
--enable-hpa-rolling-update-skipped=true

Workload-level configuration

If you want to prevent extra pods from being scaled out for only specific workloads, you can use one of the following two methods.

Method 1: Add the following annotation to the template of the specified workload to temporarily pause HPA evaluation during a rolling deployment.

# Add the annotation to spec.template.metadata.annotations of the workload to temporarily pause HPA evaluation during rolling deployment.
HPARollingUpdateSkipped: "true"

Expand to view example code details

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-basic
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template: 
    metadata:
      labels:
        app: nginx
      annotations:
        HPARollingUpdateSkipped: "true"  # Skip HPA effects during rolling deployment.
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

Method 2: Add the following annotation to the template of the specified workload to skip the configured ramp-up period at the beginning of an application deployment.

# Add the annotation to spec.template.metadata.annotations of the workload to skip the configured ramp-up period at the beginning of application deployment.
HPAScaleUpDelay: 3m # 3m is an example. Set the specific time period as needed.

Expand to view example code details

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-basic
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template: 
    metadata:
      labels:
        app: nginx
      annotations:
        HPAScaleUpDelay: 3m  # m indicates minutes. This means HPA takes effect 3 minutes after the pod is created. Valid units are [s (seconds), m (minutes)].
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

Why does HPA not scale when the threshold is reached?

HPA scaling is triggered when CPU or memory usage exceeds or falls below a threshold. HPA also considers whether a scale-out or scale-in action would immediately trigger another scaling event. This helps reduce oscillation.

For example, if your scale-out threshold is 80% and two pods currently have 70% CPU usage, HPA does not scale in. This is because scaling in might cause the CPU usage of a single pod to exceed 80%, which would trigger a scale-out and lead to oscillation.

How do I configure the HPA collection cycle?

For metrics-server versions later than v0.2.1-b46d98c-aliyun, you can set the --metric-resolution parameter in the metrics-server startup parameters. For example, you can set the parameter to --metric-resolution=15s.

Is CronHPA compatible with HPA? How do I make it compatible?

CronHPA is compatible with HPA. Container Service for Kubernetes (ACK) sets the scaleTargetRef in a CronHPA object to an HPA object. CronHPA then uses the HPA object to find the actual scaleTargetRef, which allows CronHPA to perceive the current state of HPA. CronHPA does not directly adjust the number of replicas for a deployment. Instead, it operates on the deployment through HPA. This avoids conflicts between HPA and CronHPA. For more information about how to ensure CronHPA and HPA compatibility, see Achieve CronHPA and HPA collaboration.

How do I prevent HPA from scaling out extra pods due to high CPU or memory usage during startup?

For languages and frameworks that require a ramp-up period, such as Java, CPU and memory usage might spike for several minutes when a container starts. This can cause HPA to trigger incorrectly. To prevent incorrect triggers, you can upgrade the metrics-server component provided by ACK to version 0.3.9.6 or later and add a switch to the pod's annotation. For more information about how to upgrade the metrics-server component, see Upgrade the metrics-server component before you upgrade the cluster to v1.12.

The following YAML file provides an example of a deployment with the switch added to prevent incorrect triggers.

Expand to view example YAML details

## Example: Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-basic
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        HPAScaleUpDelay: 3m # m indicates minutes. This means HPA takes effect 3 minutes after the pod is created. Valid units are [s (seconds), m (minutes)].
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # Replace it with your exactly <image_name:tags>.
        ports:
        - containerPort: 80

Why does HPA scale even if the audit log values do not reach the threshold?

Cause

The horizontal pod autoscaler controller calculates the scaling ratio based on the current and desired metrics using the following formula: desired number of replicas = ceil (current number of replicas × (current metric / desired metric)).

The formula indicates that the accuracy of the desired number of replicas depends on the accuracy of the current number of replicas, the current metric value, and the desired metric value. Take resource metrics, which are widely used in HPA, as an example. When HPA retrieves the current number of replicas, it first obtains the scale subresource (subResources) of the object defined by scaleTargetRef. Then, HPA converts the value of the Selector in the status of scale to a label selector and uses this as a condition to match pods. If the pods that are retrieved based on this condition do not all belong to the object defined in scaleTargetRef, the calculated desired number of replicas may not meet expectations. For example, a scale-out may be triggered even if the real-time metric is below the threshold.

The following list describes common reasons for inaccurate pod matching:

Rolling deployment.
Other pods that do not belong to the object defined in scaleTargetRef have the same label. You can run the following command to check for other pods.
```
kubectl get pods -n {namespace name} -l {value of the status.Selector of the scale sub-resource}
```

Solution

For issues related to rolling deployments, see Node Autoscaling FAQ.
If other pods that do not belong to the object defined in scaleTargetRef have the same label, you must identify these pods. If you still need the pods, change their labels. Otherwise, you can delete them.

Can I determine the pod scale-in order during HPA scale-in?

HPA can automatically increase or decrease the number of pods based on defined metrics, but it cannot determine which pods are terminated first. The order of pod termination, the graceful shutdown period, and other attributes are determined by the controller that manages the pods.

In scenarios that involve ECS and ECI, or multi-node pool hybrid deployments, you can configure a custom resource policy (ResourcePolicy) for your application to specify scale-in priorities. HPA can work with a Deployment and a custom resource policy (ResourcePolicy) to prioritize the release of ECI node resources during a scale-in, followed by ECS resources. For more information, see Custom elastic resource scheduling priority.

What do the units of HPA utilization metrics mean?

Utilization metrics are typically unitless integers or integers in m units. The conversion is 1000m to 1. For example, a tcp_connection_counts value of 70000m is equivalent to 70.

What if the `target` column is `unknown` after running `kubectl get hpa`?

You can perform the following steps to resolve this issue.

Run the kubectl describe hpa <hpa_name> command to identify why HPA is not working.
- If the value of AbleToScale in the Conditions field is False, confirm that the deployment is normal.
- If the value of ScalingActive in the Conditions field is False, proceed to the next step.
Run the kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/" command. If Error from server (NotFound): the server could not find the requested resource is returned, check the startup status of alibaba-cloud-metrics-adapter.

If alibaba-cloud-metrics-adapter is running as normal, confirm whether the HPA metrics are related to Ingress. If they are, you must deploy the Simple Log Service component in advance. For more information, see Collect and analyze Nginx Ingress access logs.
Confirm that the HPA metrics are entered correctly. The value format for sls.ingress.route is <namespace>-<svc>-<port>.
- namespace: The namespace where the Ingress is located.
- svc: The name of the Service that corresponds to the Ingress.
- port: The name of the port for the Service that corresponds to the Ingress.

How do I find the metric names supported by HPA?

For more information, see Alibaba Cloud HPA Metrics. The following table lists common metrics.

Metric name	Description	Additional parameters
sls_ingress_qps	Queries per second (QPS) for the specified IngressRoute	sls.ingress.route
sls_alb_ingress_qps	Queries per second (QPS) for ALB data IngressRoute	sls.ingress.route
sls_ingress_latency_avg	Latency for all requests	sls.ingress.route
sls_ingress_latency_p50	Latency for 50% of requests	sls.ingress.route
sls_ingress_latency_p95	Latency for 95% of requests	sls.ingress.route
sls_ingress_latency_p99	Latency for 99% of requests	sls.ingress.route
sls_ingress_latency_p9999	Latency for 99.99% of requests	sls.ingress.route
sls_ingress_inflow	Inbound bandwidth for Ingress	sls.ingress.route

How to Adapt After Customizing the Nginx Ingress Log Format?

For more information about how to use SLS Ingress metrics for horizontal pod autoscaling, see Horizontal pod autoscaling based on Nginx Ingress component metrics. You must enable and correctly configure Nginx Ingress logs to be ingested into Simple Log Service in your cluster.

When you create a cluster, Simple Log Service is enabled by default. If you retain the default settings, you can view Nginx Ingress access log analysis reports and monitor the real-time status of Nginx Ingress in the Simple Log Service console after the cluster is created.
If you manually disabled Simple Log Service when you created the cluster and you want to use SLS Ingress metrics for horizontal pod autoscaling after the cluster is created, you must re-enable or configure Simple Log Service. For more information, see Collect and analyze Nginx Ingress access logs.
To customize the Nginx Ingress log format, the AliyunLogConfig custom resource definition (CRD) that is deployed when Simple Log Service is first enabled in the cluster applies only to the log format in the default ACK Ingress Controller. If you modified the access log format of the Ingress Controller, you must modify the processor_regex section in the regular expression of the CRD configuration. For more information, see Collect container logs using a DaemonSet and a CRD.

How do I obtain the `sls_ingress_qps` metric using the command line?

You can run the following command to query data. The sls_ingress_qps request metric is used as an example.

kubectl get --raw  /apis/external.metrics.k8s.io/v1beta1/namespaces/*/sls_ingress_qps?labelSelector=sls.project={{SLS_Project}},sls.logstore=nginx-ingress

In the preceding command, {{SLS_Project}} is the name of the Simple Log Service project that corresponds to the ACK cluster. If you do not specify a custom configuration, the default project name is k8s-log-{{ClusterId}}. {{ClusterId}} is the ID of the cluster.

If the following result is returned:

Error from server: {
    "httpCode": 400,
    "errorCode": "ParameterInvalid",
    "errorMessage": "key (slb_pool_name) is not config as key value config,if symbol : is  in your log,please wrap : with quotation mark \"",
    "requestID": "xxxxxxx"
}

This indicates that no data is available for this metric. This may be because ALB Ingress is not configured, but the sls_alb_ingress_qps metric was used for the data query.

If a result similar to the following is returned:

{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "metricName": "sls_ingress_qps",
      "timestamp": "2025-02-26T16:45:00Z", 
      "value": "50",   # QPS value
      "metricLabels": {
        "sls.project": "your-sls-project-name",
        "sls.logstore": "nginx-ingress"
      }
    }
  ]
}

This indicates that the queries per second (QPS) of the Kubernetes external metric is found. value is the QPS value.

Failed to pull alibaba-cloud-metrics-adapter image

Symptom

When you upgrade the ack-alibaba-cloud-metrics-adapter component to version 1.3.7, an error occurs while you pull the image. The error message is as follows:

Failed to pull image "registry-<region-id>-vpc.ack.aliyuncs.com/acs/alibaba-cloud-metrics-adapter-amd64:v0.2.9-ba634de-aliyun".

Cause

The ack-alibaba-cloud-metrics-adapter component does not support direct updates.

Solution

You can perform the following steps to upgrade the component.

Back up the current component configuration.
Uninstall the old version of the component.
Install the latest version of the component using the backup configuration.

Important

During the uninstallation and reinstallation of the component, related HPAs pause scaling because monitoring data retrieval stops.

Index

Why is the current field of HPA monitoring data displayed as unknown?

Cause 1: Resource Metrics data source is unavailable

Cause 2: Data cannot be retrieved during rolling deployment or scale-out

Cause 3: The request field is not configured

Cause 4: Incorrect metric name

What if HPA scaling fails due to abnormal metric retrieval?

Why does HPA scale out extra pods during rolling deployment?

Cluster-level configuration

Workload-level configuration

Why does HPA not scale when the threshold is reached?

How do I configure the HPA collection cycle?

Is CronHPA compatible with HPA? How do I make it compatible?

How do I prevent HPA from scaling out extra pods due to high CPU or memory usage during startup?

Why does HPA scale even if the audit log values do not reach the threshold?

Cause

Solution

Can I determine the pod scale-in order during HPA scale-in?

What do the units of HPA utilization metrics mean?

What if the target column is unknown after running kubectl get hpa?

How do I find the metric names supported by HPA?

How to Adapt After Customizing the Nginx Ingress Log Format?

How do I obtain the sls_ingress_qps metric using the command line?

Failed to pull alibaba-cloud-metrics-adapter image

Symptom

Cause

Solution

Why is the `current` field of HPA monitoring data displayed as `unknown`?

Cause 3: The `request` field is not configured

What if the `target` column is `unknown` after running `kubectl get hpa`?

How do I obtain the `sls_ingress_qps` metric using the command line?