This topic provides answers to some frequently asked questions about workload scaling (HPA and CronHPA).
Table of contents
What do I do if unknown is displayed in the current field in the HPA metrics?
What do I do if HPA cannot collect metrics and fails to perform scaling?
What do I do if excess pods are added by HPA during a rolling update?
What do I do if HPA does not scale pods when the scaling threshold is reached?
Can CronHPA and HPA interact without conflicts? How do I enable CronHPA to interact with HPA?
How do I fix the issue that excess pods are added by HPA when CPU or memory usage rapidly increases?
What does the unit of the utilization metric collected by HPA mean?
What do I do if unknown is displayed in the TARGETS column after I run the kubectl get hpa command?
How do I configure horizontal autoscaling after I customize the format of NGINX Ingress logs?
How do I query the sls_ingress_qps metric from the command line?
How to use the console to manage the VPA installed using kubectl?
What do I do if unknown is displayed in the current field in the HPA metrics?
If the current field of the Horizontal Pod Autoscaler (HPA) metrics shows unknown, it indicates that kube-controller-manager cannot collect resource metrics from the monitoring data source. Consequently, HPA fails to perform scaling.
Name: kubernetes-tutorial-deployment
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 10 Jun 2019 11:46:48 0530
Reference: Deployment/kubernetes-tutorial-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 2%
Min replicas: 1
Max replicas: 4
Deployment pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 3m3s (x1009 over 4h18m) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)Cause 1: The data source from which resource metrics are collected is unavailable
Run the kubectl top pod command to check whether metric data of monitored pods is returned. If no metric data is returned, run the kubectl get apiservice command to check whether the metrics-server add-on is available. The following output is an example of the returned data:
If the API service for v1beta1.metrics.k8s.io is not kube-system/metrics-server, check whether metrics-server is overwritten by Prometheus Operator. If metrics-server is overwritten by Prometheus Operator, use the following YAML template to redeploy metrics-server:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100If the issue persists, go to the cluster details page in the ACK console and choose Operations > Add-ons to check whether metrics-server is installed. For more information, see metrics-server.
Cause 2: Metrics cannot be collected during a rolling update or scale-out activity
By default, metrics-server collects metrics at intervals of 1 minute. However, metrics-server must wait a few minutes before it can collect metrics after a rolling update or scale-out activity is completed. We recommend that you query metrics 2 minutes after a rolling update or scale-out activity is completed.
Cause 3: The request field is not specified
By default, HPA obtains the CPU or memory usage of the pod by calculating the value of resource usage/resource request. If the resource request is not specified in the pod configurations, HPA cannot calculate the resource usage. Therefore, you must make sure that the request field is specified in the resource parameter of the pod configurations.
Cause 4: The metric name is incorrect
Check whether the metric name is correct. The metric name is case-sensitive. For example, if the cpu metric supported by HPA is accidentally written as CPU, unknown is displayed in the current field.
What do I do if HPA cannot collect metrics and fails to perform scaling?
HPA may fail to perform scaling when it cannot collect metrics. When this issue occurs, unknown is displayed in the current field in the HPA metrics. In this case, HPA cannot collect metrics that are used to determine whether to perform scaling. As a result, HPA cannot scale pods. Refer to FAQ about node auto scaling to troubleshoot and fix this issue.
What do I do if excess pods are added by HPA during a rolling update?
During a rolling update, kube-controller-manager performs zero filling on pods whose monitoring data cannot be collected. This may cause HPA to add an excessive number of pods. You can perform the following steps to fix this issue.
Fix this issue for all workloads in the cluster
Update metrics-server to the latest version and add the following parameter to the startup settings of metrics-server.
The following configuration takes effect on all workloads in the cluster.
# Add the following configuration to the startup settings of metrics-server.
--enable-hpa-rolling-update-skipped=true Fix this issue for specific workloads
You can use one of the following methods to fix this issue for specific workloads:
Method 1: Add the following annotation to the template of a workload to skip HPA during rolling updates.
# Add the following annotation to the spec.template.metadata.annotations parameter of the workload configuration to skip HPA during rolling updates. HPARollingUpdateSkipped: "true"Method 2: Add the following annotation to the template of a workload to skip the warm-up period before rolling updates.
# Add the following annotation to the spec.template.metadata.annotations parameter of the workload configuration to skip the warm-up period before rolling updates. HPAScaleUpDelay: 3m # You can change the value based on your business requirements.
What do I do if HPA does not scale pods when the scaling threshold is reached?
HPA may not scale pods even if the CPU or memory usage drops below the scale-in threshold or exceeds the scale-out threshold. HPA also takes other factors into consideration when it scales pods. For example, HPA checks whether the current scale-out activity triggers a scale-in activity or the current scale-in activity triggers a scale-out activity. This prevents repetitive scaling and unnecessary resource consumption.
For example, if the scale-out threshold is 80% and you have two pods whose CPU utilizations are both 70%, the pods are not scaled in. This is because the CPU utilization of one pod may be higher than 80% after the pods are scaled in. This triggers another scale-out activity.
How do I configure the metric collection interval of HPA?
For metrics-server versions later than 0.2.1-b46d98c-aliyun, specify the --metric-resolution parameter in the startup settings. Example: --metric-resolution=15s.
Can CronHPA and HPA interact without conflicts? How do I enable CronHPA to interact with HPA?
CronHPA and HPA can interact without conflicts. ACK modifies the CronHPA configurations by setting scaleTargetRef to the scaling object of HPA. This way, only HPA scales the application that is specified by scaleTargetRef. This also enables CronHPA to detect the status of HPA. CronHPA does not directly change the number of pods for the Deployment. CronHPA triggers HPA to scale the pods. This prevents conflicts between CronHPA and HPA. For more information about how to enable CronHPA and HPA to interact without conflicts, see Interaction between CronHPA and HPA.
How do I fix the issue that excess pods are added by HPA when CPU or memory usage rapidly increases?
When the pods of Java applications or applications powered by Java frameworks start, the CPU and memory usage may be high for a few minutes during the warm-up period. This may trigger HPA to scale out the pods. To fix this issue, update the version of metrics-server provided by ACK to 0.3.9.6 or later and add annotations to the pod configurations to prevent HPA from accidentally triggering scaling activities. For more information about how to update metrics-server, see Update metrics-server before updating the Kubernetes version to 1.12.
The following YAML template provides the sample pod configurations that prevent HPA from accidentally triggering scaling activities in this scenario.
What do I do if HPA scales out an application while the metric value in the audit log is lower than the threshold?
Cause
HPA calculates the desired number of replicated pods based on the ratio of the current metric value to the desired metric value: Desired number of replicated pods = ceil[Current number of replicated pods × (Current metric value/Desired metric value)].
The formula indicates that the accuracy of the result depends on the accuracies of the current number of replicated pods, the current metric value, and the desired metric value. For example, when HPA collects metrics about the current number of replicated pods, HPA first queries the subresource named scale of the object specified by the scaleTargetRef parameter and then selects pods based on the label specified in the Selector field in the status section of the scale subresource. If some pods queried by HPA do not belong to the object specified by the scaleTargetRef parameter, the desired number of replicated pods calculated by HPA may not meet your expectations. For example, HPA may scale out the application while the real-time metric value is lower than the threshold.
The number of matching pods may be inaccurate due to the following reasons:
A rolling update is in progress.
Pods that do not belong to the object specified by the scaleTargetRef parameter have the label specified in the Selector field in the status section of the scale subresource. Run the following command to query the pods:
kubectl get pods -n {Namespace} -l {Value of the Selector field in the status section of the subresource named scale}
Solution
If a rolling update is in progress, refer to FAQ about node auto scaling to resolve this issue.
If pods that do not belong to the object specified by the scaleTargetRef parameter have the label specified in the Selector field in the status section of the scale subresource, locate these pods and then change the label. You can also delete the pods that you no longer require.
Can HPA control the order in which pods are scaled in?
No, the HPA itself cannot directly determine which specific pods should be terminated first during a scale-in event. The HPA's responsibility is only to increase or decrease the replica count of a workload based on the metrics it monitors. The actual order of pod termination, as well as the graceful shutdown period, is controlled by the workload controller that manages the pods (such as a Deployment or StatefulSet).
However, if you are running a mix of compute types—such as a combination of Elastic Compute Service (ECS) instances and serverless Container Compute Service (ACS) / Elastic Container Instance (ECI) resources, or workloads spanning multiple node pools—you can influence the scale-down priority by using a custom ResourcePolicy.
By using an HPA with a Deployment and a ResourcePolicy, you can achieve a prioritized scale-in. For example, you can configure the system to first terminate pods running on serverless ACS/ECI resources before terminating pods on standard ECS instances, thereby optimizing costs.
For a detailed guide on this feature, see Customize elastic resource priority scheduling.
What does the unit of the utilization metric collected by HPA mean?
The unit of the utilization metric collected by HPA is m, which stands for the prefix milli-. The prefix means one thousandth. The value of the utilization metric is an integer. For example, if the value of the tcp_connection_counts metric is 70000m, the value is equal to 70.
What do I do if unknown is displayed in the TARGETS column after I run the kubectl get hpa command?
Perform the following operations to troubleshoot the issue:
Run the
kubectl describe hpa <hpa_name>command to check why HPA becomes abnormal.If the value of
AbleToScaleisFalsein theConditionsfield, check whether the Deployment is created as expected.If the value of
ScalingActiveisFalsein theConditionsfield, proceed to the next step.
Run the
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/"command. IfError from server (NotFound): the server could not find the requested resourceis returned, verify the status of alibaba-cloud-metrics-adapter.If the status of
alibaba-cloud-metrics-adapteris normal, check whether the HPA metrics are related to the Ingress. If the metrics are related to the Ingress, make sure that you deploy the Simple Log Service (SLS) add-on before deployingack-alibaba-cloud-metrics-adapter. For more information, see Analyze and monitor the access log of nginx-ingress.Make sure that the values of the HPA metrics are valid. The value of sls.ingress.route must be in the
<namespace>-<svc>-<port>format.namespace: the namespace to which the Ingress belongs.svc: the name of the Service that you selected when you created the Ingress.port: the port of the Service.
How do I find the metrics that are supported by HPA?
For more information about the metrics that are supported by HPA, see Alibaba Cloud metrics adapter. The following table describes the commonly used metrics.
Metric | Description | Additional parameter |
sls_ingress_qps | The number of requests that the Ingress can process per second based on a specific routing rule. | sls.ingress.route |
sls_alb_ingress_qps | The number of requests that the ALB Ingress can process per second based on a specific routing rule. | sls.ingress.route |
sls_ingress_latency_avg | The average latency of all requests. | sls.ingress.route |
sls_ingress_latency_p50 | The maximum latency for the fastest 50% of all requests. | sls.ingress.route |
sls_ingress_latency_p95 | The maximum latency for the fastest 95% of all requests. | sls.ingress.route |
sls_ingress_latency_p99 | The maximum latency for the fastest 99% of all requests. | sls.ingress.route |
sls_ingress_latency_p9999 | The maximum latency for the fastest 99.99% of all requests. | sls.ingress.route |
sls_ingress_inflow | The inbound bandwidth of the Ingress. | sls.ingress.route |
How do I configure horizontal autoscaling after I customize the format of NGINX Ingress logs?
Refer to Implement horizontal auto scaling based on Alibaba Cloud metrics to perform horizontal pod autoscaling based on the Ingress metrics that are collected by SLS. You must configure SLS to collect NGINX Ingress logs.
By default, SLS is enabled when you create a cluster. If you use the default log collection settings, you can view the log analysis reports and real-time status of NGINX Ingresses in the SLS console after you create the cluster.
If you disable SLS when you create an ACK cluster, you cannot perform horizontal pod autoscaling based on the Ingress metrics that are collected by SLS. You must enable SLS for the cluster before you can use this feature. For more information, see Analyze and monitor the access log of nginx-ingress-controller.
The AliyunLogConfig that is generated when you enable SLS for the cluster for the first time applies only to the default log format that ACK defines for the Ingress controller. If you have changed the log format, you must modify the
processor_regexsettings in the AliyunLogConfig. For more information, see Use CRDs to collect container logs in DaemonSet mode.
How do I query the sls_ingress_qps metric from the command line?
You can query the external metric directly from the Kubernetes API. The following example shows how to query for sls_ingress_qps.
kubectl get --raw /apis/external.metrics.k8s.io/v1beta1/namespaces/*/sls_ingress_qps?labelSelector=sls.project={{SLS_Project}},sls.logstore=nginx-ingress{{SLS_Project}}: The name of the SLS Project associated with this ACK cluster. If you have not customized it, the default name isk8s-log-{{ClusterId}}, where{{ClusterId}}is the ID of your cluster.
Analyzing the results
If you receive an error similar to this:
Error from server: { "httpCode": 400, "errorCode": "ParameterInvalid", "errorMessage": "key (slb_pool_name) is not config as key value config,if symbol : is in your log,please wrap : with quotation mark \"", "requestID": "xxxxxxx" }This indicates that no data was found for this metric. A common cause is trying to query for an ALB Ingress metric (
sls_alb_ingress_qps) when you are not using an ALB Ingress.If the query is successful, the output will look like this:
{ "kind": "ExternalMetricValueList", "apiVersion": "external.metrics.k8s.io/v1beta1", "metadata": {}, "items": [ { "metricName": "sls_ingress_qps", "timestamp": "2025-02-26T16:45:00Z", "value": "50", # This is the QPS value "metricLabels": { "sls.project": "your-sls-project-name", "sls.logstore": "nginx-ingress" } } ] }This output confirms that the Kubernetes external metrics API successfully retrieved the QPS data. The
valuefield contains the current QPS.
Failed to pull alibaba-cloud-metrics-adapter image
Symptom
When attempting to upgrade ack-alibaba-cloud-metrics-adapter to version 1.3.7, the image pull fails with an error similar to this:
Failed to pull image "registry-<region-id>-vpc.ack.aliyuncs.com/acs/alibaba-cloud-metrics-adapter-amd64:v0.2.9-ba634de-aliyun"
Cause
ack-alibaba-cloud-metrics-adapter does not currently support in-place upgrades.
Solution
You must upgrade it by performing a clean reinstallation.
Back up your current add-on configuration.
Uninstall the old version of the add-on.
Install the new version using your backed-up configuration.
During the uninstall and reinstall process, any HPAs that rely on this metrics adapter will be unable to fetch metrics and will temporarily suspend all scaling operations.