All Products
Search
Document Center

Container Service for Kubernetes:FAQ about auto scaling

Last Updated:Aug 16, 2023

This topic provides answers to some frequently asked questions about auto scaling in Container Service for Kubernetes (ACK).

Category

Issue

FAQ about HPA

FAQ about HPA based on Alibaba Cloud metrics

What do I do if the current field shows unknown in the HPA metrics?

If the current field of Horizontal Pod Autoscaler (HPA) metrics shows unknown, it indicates that kube-controller-manager cannot access the data sources to collect resource metrics.

Name:                                                  kubernetes-tutorial-deployment
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 10 Jun 2019 11:46:48  0530
Reference:                                             Deployment/kubernetes-tutorial-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 2%
Min replicas:                                          1
Max replicas:                                          4
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
  Type     Reason                   Age                      From                       Message
  ----     ------                   ----                     ----                       -------
  Warning  FailedGetResourceMetric  3m3s (x1009 over 4h18m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)

Possible causes:

  • Cause 1: The data sources from which resource metrics are collected are unavailable. Run the kubectl top pod command to check whether the metric data of monitored pods is returned. If no metric data is returned, run the kubectl get apiservice command to check whether the metrics-server component is available. The following output shows an example of the returned data:

    Show sample code

    NAME                                   SERVICE                      AVAILABLE   AGE
    v1.                                    Local                        True        29h
    v1.admissionregistration.k8s.io        Local                        True        29h
    v1.apiextensions.k8s.io                Local                        True        29h
    v1.apps                                Local                        True        29h
    v1.authentication.k8s.io               Local                        True        29h
    v1.authorization.k8s.io                Local                        True        29h
    v1.autoscaling                         Local                        True        29h
    v1.batch                               Local                        True        29h
    v1.coordination.k8s.io                 Local                        True        29h
    v1.monitoring.coreos.com               Local                        True        29h
    v1.networking.k8s.io                   Local                        True        29h
    v1.rbac.authorization.k8s.io           Local                        True        29h
    v1.scheduling.k8s.io                   Local                        True        29h
    v1.storage.k8s.io                      Local                        True        29h
    v1alpha1.argoproj.io                   Local                        True        29h
    v1alpha1.fedlearner.k8s.io             Local                        True        5h11m
    v1beta1.admissionregistration.k8s.io   Local                        True        29h
    v1beta1.alicloud.com                   Local                        True        29h
    v1beta1.apiextensions.k8s.io           Local                        True        29h
    v1beta1.apps                           Local                        True        29h
    v1beta1.authentication.k8s.io          Local                        True        29h
    v1beta1.authorization.k8s.io           Local                        True        29h
    v1beta1.batch                          Local                        True        29h
    v1beta1.certificates.k8s.io            Local                        True        29h
    v1beta1.coordination.k8s.io            Local                        True        29h
    v1beta1.events.k8s.io                  Local                        True        29h
    v1beta1.extensions                     Local                        True        29h
    ...
    [v1beta1.metrics.k8s.io                 kube-system/metrics-server   True        29h]
    ...
    v1beta1.networking.k8s.io              Local                        True        29h
    v1beta1.node.k8s.io                    Local                        True        29h
    v1beta1.policy                         Local                        True        29h
    v1beta1.rbac.authorization.k8s.io      Local                        True        29h
    v1beta1.scheduling.k8s.io              Local                        True        29h
    v1beta1.storage.k8s.io                 Local                        True        29h
    v1beta2.apps                           Local                        True        29h
    v2beta1.autoscaling                    Local                        True        29h
    v2beta2.autoscaling                    Local                        True        29h

    If the apiservice for v1beta1.metrics.k8s.io is not kube-system/metrics-server, check whether metrics-server is overwritten by Prometheus Operator. If metrics-server is overwritten by Prometheus Operator, use the following YAML template to redeploy metrics-server:

    apiVersion: apiregistration.k8s.io/v1beta1
    kind: APIService
    metadata:
      name: v1beta1.metrics.k8s.io
    spec:
      service:
        name: metrics-server
        namespace: kube-system
      group: metrics.k8s.io
      version: v1beta1
      insecureSkipTLSVerify: true
      groupPriorityMinimum: 100
      versionPriority: 100

    If no error is found after you have performed the preceding checks, go to the Operations > Add-ons page and check whether metrics-server is installed. For more information, see metrics-server.

  • Cause 2: Metrics cannot be collected during a rolling update or scale-out activity.

    By default, metrics-server collects metrics at intervals of 1 minute. However, metrics-server must wait a few minutes before it can collect metrics after a rolling update or scale-out activity. We recommend that you query metrics 2 minutes after a rolling update or scale-out activity is complete.

  • Cause 3: The request field is not specified for the pod.

    By default, HPA obtains the CPU or memory usage of the pod by calculating the value of used resources/requested resources. If the requested resources are not specified in the pod configurations, HPA cannot calculate the resource usage. Therefore, you must ensure that the request field is specified in the resource parameter of the pod configurations.

What do I do if excess pods are added by HPA during a rolling update?

During a rolling update, kube-controller-manager performs zero filling on pods whose monitoring data cannot be collected. This may cause HPA to add an excessive number of pods. You can perform the following steps to fix this issue.

  • Fix this issue for all workloads in the cluster:

    To fix this issue, we recommend that you update metrics-server to the latest version and add the following parameter to the startup settings of metrics-server:

    The following configuration takes effect on all workloads in the cluster.

    ## Add the following configuration to the startup settings of metrics-server. 
    --enable-hpa-rolling-update-skipped=true  
  • Fix this issue for specified workloads. You can use one of the following methods to fix this issue for specified workloads:

    • Method 1: Add the following annotation to the template of a workload to skip HPA during rolling updates.

      ## Add the following annotation to the spec.template.metadata.annotations parameter of the workload configuration to skip HPA during rolling updates. 
      HPARollingUpdateSkipped: "true"

      Show sample code

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: nginx-deployment-basic
        labels:
          app: nginx
      spec:
        replicas: 2
        selector:
          matchLabels:
            app: nginx
          template:
              metadata:
                labels:
                  app: nginx
                annotations:
                  HPARollingUpdateSkipped: "true"  # Skip HPA during rolling updates. 
              spec:
                containers:
                - name: nginx
                  image: nginx:1.7.9
                  ports:
                  - containerPort: 80

    • Method 2: Add the following annotation to the template of a workload to skip the warm-up period before rolling updates.

      ## Add the following annotation to the spec.template.metadata.annotations parameter of the workload configuration to skip the warm-up period before rolling updates. 
      HPAScaleUpDelay: 3m # You can change the value based on your business requirements.

      Show sample code

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: nginx-deployment-basic
        labels:
          app: nginx
      spec:
        replicas: 2
        selector:
          matchLabels:
            app: nginx
          template:
              metadata:
                labels:
                  app: nginx
                annotations:
                  HPAScaleUpDelay: 3m  # This setting indicates that HPA takes effect 3 minutes after the pods are created. Valid units: s and m. s indicates seconds and m indicates minutes. 
              spec:
                containers:
                - name: nginx
                  image: nginx:1.7.9
                  ports:
                  - containerPort: 80

What do I do if HPA does not scale pods when the scaling threshold is reached?

HPA may not scale pods even if the CPU or memory usage drops below the scale-in threshold or exceeds the scale-out threshold. HPA also takes other factors into consideration when it scales pods. For example, HPA checks whether the current scale-out activity triggers a scale-in activity or the current scale-in activity triggers a scale-out activity. This avoids repetitive scaling and prevents unnecessary resource consumption.

For example, if the scale-out threshold is 80% and you have two pods whose CPU utilizations are both 70%, the pods are not scaled in. This is because the CPU utilization of one pod may be higher than 80% after the pods are scaled in. This triggers another scale-out activity.

How do I configure the metric collection interval of HPA?

For metrics-server versions later than 0.2.1-b46d98c-aliyun, specify the --metric_resolution parameter in the startup settings. Example: --metric_resolution=15s.

Can CronHPA and HPA interact without conflicts?

CronHPA and HPA can interact without conflicts. ACK modifies the CronHPA configurations by setting scaleTargetRef to the scaling object of HPA. This way, only HPA scales the application that is specified by scaleTargetRef. This also enables CronHPA to detect the state of HPA. CronHPA does not directly change the number of pods for the Deployment. CronHPA triggers HPA to scale the pods. This way, conflicts between CronHPA and HPA are resolved. For more information about how to enable CronHPA and HPA to interact without conflicts, see CronHPA.

How do I fix the issue that excess pods are added by HPA when CPU or memory usage rapidly increases?

When the pods of Java applications or applications powered by Java frameworks start to run, the CPU and memory usage may be high for a few minutes during the warm-up period. This may trigger HPA to scale out the pods. To fix this issue, update metrics-server to the latest version and add annotations to the pod configurations to prevent HPA from triggering scaling activities. For more information about how to update metrics-server, see Update the metrics-server component.

The following YAML template provides the sample pod configurations that prevent HPA from triggering scaling activities in this scenario.

Show sample code

## A Deployment is used in this example.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment-basic
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        HPAScaleUpDelay: 3m # This setting indicates that HPA takes effect 3 minutes after the pods are created. Valid units: s and m. s indicates seconds and m indicates minutes. 
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # Replace it with your exactly <image_name:tags>.
        ports:
        - containerPort: 80 

FAQ

Why does cluster-autoscaler fail to add nodes after a scale-out activity is triggered?

Check whether the following situations exist:

  • The instance types in the scaling group cannot fulfill the resource request from pods. Some resources provided by the specified ECS instance type are reserved or occupied for the following purposes:

  • Cross-zone scale-out activities cannot be triggered for pods that have limits on zones.

  • The RAM role does not have the permissions to manage the Kubernetes cluster. You must configure RAM roles for each Kubernetes cluster that is involved in the scale-out activity. For more information about the authorization, see Step 2: Perform authorization.

  • The following issues occur when you activate Auto Scaling:

    • The instance fails to be added to the cluster and a timeout error occurs.

    • The node is not ready and a timeout error occurs.

    To ensure that nodes can be accurately scaled, cluster-autoscaler does not perform any scaling activities before it fixes the abnormal nodes.

Why does cluster-autoscaler fail to remove nodes after a scale-in activity is triggered?

Check whether the following situations exist:

  • The requested resource threshold of each pod is higher than the specified scale-in threshold.

  • Pods that belong to the kube-system namespace are running on the node.

  • A scheduling policy forces the pods to run on the current node. Therefore, the pods cannot be scheduled to other nodes.

  • PodDisruptionBudget is set for the pods on the node and the minimum value of PodDisruptionBudget is reached.

For more information about FAQ, see open source component.

How does the system choose a scaling group for a scaling activity?

When pods cannot be scheduled to nodes, cluster-autoscaler simulates the scheduling of the pods based on the configurations of scaling groups. The configurations include labels, taints, and instance specifications. If a scaling group meets the requirements, this scaling group is selected for the scale-out activity. If more than one scaling group meet the requirements, the system selects the scaling group that has the fewest idle resources after simulation.

What types of pods can prevent cluster-autoscaler from removing nodes?

What scheduling policies does cluster-autoscaler use to determine whether the unschedulable pods can be scheduled to a node pool that has the auto scaling feature enabled?

The following list describes the scheduling policies used by cluster-autoscaler.

  • PodFitsResources

  • GeneralPredicates

  • PodToleratesNodeTaints

  • MaxGCEPDVolumeCount

  • NoDiskConflict

  • CheckNodeCondition

  • CheckNodeDiskPressure

  • CheckNodeMemoryPressure

  • CheckNodePIDPressure

  • CheckVolumeBinding

  • MaxAzureDiskVolumeCount

  • MaxEBSVolumeCount

  • ready

  • MatchInterPodAffinity

  • NoVolumeZoneConflict

FAQ about HPA based on Alibaba Cloud metrics

  • What do I do if the TARGETS column shows <unknow> after I run the kubectl get hpa command?

    Perform the following operations to troubleshoot the issue:

    1. Run the kubectl describe hpa <hpa_name> command to check why HPA does not function as normal.

      • If the value of AbleToScale is False in the Conditions field, check whether the Deployment is created as normal.

      • If the value of ScalingActive is False in the Conditions field, proceed to the next step.

    2. Run the kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/" command. If Error from server (NotFound): the server could not find the requested resource is returned, verify the status of alibaba-cloud-metrics-adapter.

      If the status of alibaba-cloud-metrics-adapter is normal, check whether the HPA metrics are related to the Ingress. If the metrics are related to the Ingress, make sure that you deploy the Log Service component before ack-alibaba-cloud-metrics-adapter is deployed. For more information, see Analyze and monitor the access log of nginx-ingress-controller.

    3. Make sure that the values of the HPA metrics are valid. The value of sls.ingress.route must be in the <namespace>-<svc>-<port> format.

      • namespace specifies the namespace to which the Ingress belongs.

      • svc specifies the name of the Service that you selected when you created the Ingress.

      • port specifies the port of the Service.

  • Where can I find the metrics that are supported by HPA?

    For more information about the metrics that are supported by HPA, see Alibaba Cloud metrics adapter. The following table describes the commonly used metrics.

    Metric

    Description

    Additional parameter

    sls_ingress_qps

    The number of requests that the Ingress can process per second based on a specific routing rule.

    sls.ingress.route

    sls_alb_ingress_qps

    The number of requests that the ALB Ingress can process per second based on a specific routing rule.

    sls.ingress.route

    sls_ingress_latency_avg

    The average latency of all requests.

    sls.ingress.route

    sls_ingress_latency_p50

    The maximum latency for the fastest 50% of all requests.

    sls.ingress.route

    sls_ingress_latency_p95

    The maximum latency for the fastest 95% of all requests.

    sls.ingress.route

    sls_ingress_latency_p99

    The maximum latency for the fastest 99% of all requests.

    sls.ingress.route

    sls_ingress_latency_p9999

    The maximum latency for the fastest 99.99% of all requests.

    sls.ingress.route

    sls_ingress_inflow

    The inbound bandwidth of the Ingress.

    sls.ingress.route

  • How do I collect NGINX Ingress logs in a custom format?

    In this topic, horizontal pod autoscaling is performed based on the Ingress metrics that are collected by Log Service. You must configure Log Service to collect NGINX Ingress logs.

    • When you create an ACK cluster, Log Service is enabled for the cluster by default. If you use the default log collection settings, you can view the log analysis reports and real-time status of NGINX Ingresses in the Log Service console after you create the cluster.

    • If you disable Log Service when you create an ACK cluster, you cannot perform horizontal pod autoscaling based on the Ingress metrics that are collected by Log Service. You must enable Log Service for the cluster before you can use this feature. For more information, see Analyze and monitor the access log of nginx-ingress-controller.

    • The AliyunLogConfig that is generated the first time you enable Log Service applies only to the default log format that ACK defines for the Ingress controller. If you have changed the log format, you must modify the processor_regex settings in the AliyunLogConfig. For more information, see Use CRDs to collect container logs in DaemonSet mode.

Why does a pod fail to be scheduled to a node that is added by HPA?

The estimated amount of available resources on the node that is added by Cluster Autoscaler may be more than the actual amount of resources available on the node due to the accuracy of underlying resource calculation by Cluster Autoscaler. For more information about the precision of underlying resource calculation by Cluster Autoscaler, see Why does a purchased instance have a memory size different from that defined in the instance type? If the resources requested by the pods on a node exceed 70% of the computing resources provided by the node, we recommend that you check whether the pods can be scheduled to another node of the same instance type.

Cluster Autoscaler checks only the resource requests of pending pods and pods created by DaemonSets when the Cluster Autoscaler determines whether the nodes in your cluster can provide sufficient resources for pod scheduling. If the nodes in your cluster have static pods that are not created by DaemonSets, we recommend that you reserve resources for these pods.

How do I configure a pod to allow Cluster Autoscaler to remove the node that hosts the pod or prevent Cluster Autoscaler from removing the node that hosts the pod?

You can configure a pod to allow Cluster Autoscaler to remove the node that hosts the pod or prevent Cluster Autoscaler from removing the node that hosts the pod.

  • To configure a pod to prevent Cluster Autoscaler from removing the node that hosts the pod, add the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" to the pod configuration.

  • To configure a pod to allow Cluster Autoscaler to remove the node that hosts the pod, add the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" to the pod configuration.

How do I enable or disable pod eviction for a DaemonSet pod?

Cluster Autoscaler decides whether to evict DaemonSet pods based on the Enable DaemonSet Pod Eviction setting, as shown in the following figure. The setting takes effect on all DaemonSet pods in the cluster. If you want to enable DaemonSet pod eviction for a DaemonSet pod, add the annotation "cluster-autoscaler.kubernetes.io/enable-ds-eviction": "true" to the pod configuration.

If you want to disable DaemonSet pod eviction for a DaemonSet pod, add the annotation "cluster-autoscaler.kubernetes.io/enable-ds-eviction": "false" to the pod configuration.daemonsetpod排水..png

Note
  • If the DaemonSet pod eviction feature is disabled, DaemonSet pods with the annotation are evicted only when the node hosts pods other than DaemonSet pods. If you want to use the annotation to drain a node that hosts only DaemonSet pods, you need to first enable the DaemonSet pod eviction feature.

  • You need to add the preceding annotation to the DaemonSet pod but not the DaemonSet.

  • This annotation does not take effect on pods that are not created by DaemonSets.

  • By default, Cluster Autoscaler does not delay other tasks when evicting DaemonSet pods. DaemonSet pod eviction is performed simultaneously with other tasks. If you want Cluster Autoscaler to wait until all DaemonSet pods are evicted, you need to add the annotation "cluster-autoscaler.kubernetes.io/wait-until-evicted":"true" to the pod configuration.

What do I do if HPA scales out an application while the metric value in the audit log is lower than the threshold?

Possible causes

HPA calculates the desired number of replicated pods based on the ratio of the current metric value to the desired metric value: Desired number of replicated pods = ceil[Current number of replicated pods × (Current metric value/Desired metric value)].

The formula shows that the accuracy of the result depends on the accuracies of the current number of replicated pods, the current metric value, and the desired metric value. For example, when HPA queries metrics about the current number of replicated pods, HPA first queries the subresource named scale of the object specified by the scaleTargetRef parameter and then selects pods based on the label specified in the Selector field in the status section of the subresource. If some pods queried by HPA do not belong to the object specified by the scaleTargetRef parameter, the desired number of replicated pods calculated by HPA may not meet your expectations. For example, HPA may scale out the application while the real-time metric value is lower than the threshold.

The number of matching pods may be inaccurate due to the following reasons:

  • A rolling update is in progress.

  • Pods that do not belong to the object specified by the scaleTargetRef parameter have the label specified in the Selector field in the status section of the scale subresource. Run the following command to query the pods:

    kubectl get pods -n {Namespace} -l {Value of the Selector field in the status section of the subresource named scale}

Solution

  • If a rolling update is in progress, refer to What do I do if excess pods are added by HPA during a rolling update? to resolve this issue .

  • If pods that do not belong to the object specified by the scaleTargetRef parameter have the label specified in the Selector field in the status section of the scale subresource, Locate these pods and then change the label or run the following command to delete the pods:

How do I prevent a node from being removed by Cluster Autoscaler?

To prevent a node from being removed by Cluster Autoscaler, add the annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true" to the node configurations. Run the following command to add the annotation:

kubectl annotate node <nodename> cluster-autoscaler.kubernetes.io/scale-down-disabled=true

How do I set a scale-out delay in Cluster Autoscaler for unschedulable pods?

You can add the annotation cluster-autoscaler.kubernetes.io/pod-scale-up-delay to set a scale-out delay for all pods. If pods are still unschedulable after the delay ends, Cluster Autoscaler may add nodes to schedule the pods. Example: "cluster-autoscaler.kubernetes.io/pod-scale-up-delay": "600s".

How do I add custom resources to node pools that have auto scaling enabled?

To enable Cluster Autoscaler to identify custom resources provided by a node pool that has auto scaling enabled or identify the amounts of specific resource types provided the node pool, you need to add an Elastic Compute Service (ECS) tag with the following prefix to the node pool.

k8s.io/cluster-autoscaler/node-template/resource/{Resource name}:{Resource amount}

Example

k8s.io/cluster-autoscaler/node-template/resource/hugepages-1Gi:2Gi

What operations can trigger the system to automatically update Cluster Autoscaler?

The following operations can trigger the system to automatically update Cluster Autoscaler. This ensures that the configurations of Cluster Autoscaler are update-to-date and its version is compatible with the cluster.

  • Update the auto scaling setting

  • Create, delete, or update node pools that have auto scaling enabled

  • Successfully update clusters

How does Auto Scaling evaluate the resource capacity of a scaling group that uses multiple types of instances?

For a scaling group that uses multiple types of instances, Auto Scaling evaluates the resource capacity of the scaling group based on the least amount of resources that the scaling group can provide.

For example, a scaling group uses two types of instances. One instance type provides 4 vCores and 32 GB of memory and the other one provides 8 vCores and 16 GB of memory. In this scenario, Auto Scaling considers that the scaling group can add instances each of which provides 4 vCores and 16 GB of memory. If a pending pod requests resources more than 4 vCores and 16 GB of memory, the pod is not scheduled.

You still need to take resource reservation into consideration after you specify multiple instance types for a scaling group. For more information, see Why does cluster-autoscaler fail to add nodes after a scale-out activity is triggered?