Taking Full Advantage of Horizontal Pod Autoscaler in Kubernetes (Continued)

By Alwyn Botha, Alibaba Cloud Community Blog author.

This tutorial is the second part of a two-parts series. You can find the other tutorial in this series here.

In this part of this two part series, we will cover how down scaling works with Kubernetes's horizontal pod autoscaler, then consider the overall sequence of events for auto scaling. After this, this article will also consider an example, go into the details of how cleanup works, discuss the related algorithm and look at what exactly downscale delay is and how you can configure it.

Scaling Down

So far in this two-part tutorial, we saw that auto scaling in an upward fashion works well. So, now it's time to see how it works in the other direction. Theoretically, if we reduce the work, the process of auto scaling will automatically scale down as well.

To test this theory, you'll want to switch to terminal running the wget loop and press Control C to stop it. With this, the workload on the deployment will change to zero. And by doing this, over time the horizontal pod autoscaler should also reduce the number of Pods.

In reality, the horizontal pod autoscaler does support down scaling, having support for coolday/delay. For information about this, check out the Support for cooldown/delay page in the Kubernetes documentation.

Our results, shown below, confirm this. It took around five minutes for the down scaling to occur.

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   37%/45%   1         10        5          7m10s
Thu Feb 21 15:45:09 SAST 2019

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   37%/45%   1         10        5          7m43s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   1%/45%    1         10        5          8m44s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        5          9m20s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        5          10m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        5          12m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        1          13m

To summarize the above output, it took three minutes for the metrics server to measure that overall deployment CPU load to be at 0%, which you can see in the TARGET column. And REPLICAS stayed at 5 even when the CPU at 0%. Then, at eight minutes and 44 seconds, the CPU use was at 1%, and at 13 minutes REPLICAS scaled to one.

According to this page about the horizontal pod autocaler on GitHub:

Autoscaler works in a conservative way. If a new user load appears, it is important for us to rapidly increase the number of pods, so that user requests will not be rejected. In other words, lowering the number of pods is not that urgent.

Starting and stopping pods may introduce noise to the metric (for instance, starting may temporarily increase CPU). So, after each action, the autoscaler should wait some time for reliable data.

Scale-up can only happen if there was no rescaling within the last 3 minutes.

Scale-down will wait for 5 minutes from the last rescaling.

You will see evidence for the above description throughout this tutorial. The output of the kubectl describe command, in particular the last line, explains it perfectly:

horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Below is what the entire output looks like:

kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment

Name:                                                  my-hpa-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (0) / 45%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Normal   SuccessfulRescale             10m                horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
  Normal   SuccessfulRescale             10m                horizontal-pod-autoscaler  New size: 5; reason:
  Normal   SuccessfulRescale             48s                horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Consider the conditions output above. ScaleDownStabilized was outputted, as as expected. Again, remember that our auto scale definition is:

kubectl autoscale deployment my-hpa-deployment --cpu-percent=45 --min=1 --max=10

Now, given that min = 1 Pod, this line in the output makes sense:

Deployment pods:   1 current / 1 desired

On the other hand, if our auto scale definition was:

kubectl autoscale deployment my-hpa-deployment --cpu-percent=45 --min=3 --max=10

Then, given that min = 3 Pods, we can expect the line to be:

Deployment pods:   3 current / 1 desired

Based on this, the horizontal pod autoscaler won't scale below specified minimum of 3. Therefore, following this, we would expect ScalingLimited to be True in that case because scaling is limited, and it cannot scale down to desired Pod count of one.

The final HPA status output is as follows:

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        1          15m

Within a running time of 15 minutes, we saw auto scaling, both in the up and down directions in action. This is the current state of running Apache child processes. Six of the children processes are idle.

  PID USER      PR  NI    VIRT    RES  %CPU  %MEM     TIME+ S COMMAND
 5771 www-data  20   0  217.8m  10.9m   0.0   0.5   0:49.64 S apache2 -DFOREGROUND
 5772 www-data  20   0  217.8m  10.9m   0.0   0.5   0:49.48 S apache2 -DFOREGROUND
 5773 www-data  20   0  217.8m  10.9m   0.0   0.5   0:49.84 S apache2 -DFOREGROUND
 
 5774 www-data  20   0  217.8m  10.9m   0.0   0.5   0:49.61 S apache2 -DFOREGROUND
 5775 www-data  20   0  217.8m  10.9m   0.0   0.5   0:49.75 S apache2 -DFOREGROUND
 6937 www-data  20   0  217.8m  10.9m   0.0   0.5   0:48.63 S apache2 -DFOREGROUND

In the first part of this tutorial, we only used kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment to obtain the status information of our horizontal pod autoscaler.

An alternative is used below is kubectl describe deployment.extensions/my-hpa-deployment. You will see this command provides much more detailed information about the status of our horizontal pod autoscaler. However, the command below provides no new insights.

kubectl describe deployment.extensions/my-hpa-deployment
Name:                   my-hpa-deployment
CreationTimestamp:      Thu, 21 Feb 2019 15:37:52 +0200
Labels:                 app=my-hpa-deploy
Selector:               app=my-hpa-pod
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=my-hpa-pod
  Containers:
   my-hpa-container:
    Image:      HPA-example:latest
    Requests:
      cpu:        500m
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
NewReplicaSet:   my-hpa-deployment-78d4586d7f (1/1 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  15m    deployment-controller  Scaled up replica set my-hpa-deployment-78d4586d7f to 1
  Normal  ScalingReplicaSet  12m    deployment-controller  Scaled up replica set my-hpa-deployment-78d4586d7f to 4
  Normal  ScalingReplicaSet  12m    deployment-controller  Scaled up replica set my-hpa-deployment-78d4586d7f to 5
  Normal  ScalingReplicaSet  2m52s  deployment-controller  Scaled down replica set my-hpa-deployment-78d4586d7f to 1

The Overall Sequence of Events for Auto Scaling

A beginner to all of this may think that the overall sequence of events for auto scaling is as follows.

The metrics server measures CPU usage average for each Pod.
HPA read metrics and our requirements.
HPA calculates required replicas.
HPA send this required replicas to the deployment.
Deployment signals ReplicaSet the number of replicas to use.
ReplicaSet up or downscale replicas as needed.
Our load generator Pod sends work via single IP of the service.
Service distributes work amongst all the Pod replicas.
Children PHP processes in Pods process the work.
Apache manages the children PHP processes.

However, this is incorrect. A more accurate correct description, as described in this document, is as follows:

The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes
A controller is a control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.

So, given these facts, deployment does not signal ReplicaSet. Rather, ReplicaSet control loop watches the shared state of the cluster through the API server and up or down scales as needed based on the information in the API database, or rather the etcd data store.

Following this, the horizontal pod autoscaler doesn't send these required replicas to the deployment. Rather, HPA only updates API data store with a calculated number of replicas. Then, the deployment control loop watches API data store for changes to the number of required replicas.

This ten sentence description may be adequate to explain to beginners how ten different Kubernetes components interact to make horizontal pod auto scaling possible. Later, beginners may want to read a more in-depth description of how Kubernetes internally controls all its components: reading shared object state in the API data store and makes changes attempting to move the current state towards the desired state.

Note that, for this tutorial in particular, I am running this on a single node, and auto scaling, for us, does nothing in terms of lessening the overall CPU load on a single node. Rather, a single node is auto scaled to many more pods where each pod receives less stress on its CPU.

Therefore, following this, the purpose of horizontal pod auto scaling is to calculate number of replicas for our desired CPU load percentage, so that the Kubernetes scheduler distributes pods among several different hardware nodes. Horizontal pod auto scaling helps distribute CPU load among Pods on several different nodes.

Note now that, if you run kubectl get rs, you will also see a ReplicaSet. Our deployment is the ReplicaSet-manager that gets its instructions from our horizontal pod autoscaler. I have, for the most part, ignored the ReplicaSet in this tutorial. But it's important to note that it still plays a crucial role. However, it is more difficult to understand, and so we have focused only on HPA commands and their output in this two-part tutorial series.

Now, that our first auto scaling demo is all finished. You'll want to delete the deployment with the following command:

kubectl delete -f myHPA-Deployment.yaml
deployment.apps "my-hpa-deployment" deleted

Note that deleting the deployment deletes the deployment object, and the related API information structure, the ReplicaSet and all the pod replicas.

Another Horizontal Pod Auto Scaling Example

In the previous example covered in this tutorial, our deployment had to scale from one to five pods so to be able to use the CPU percentage we specified.

Now, in this example, we will start our deployment with three pods at the beginning. I already secretly ran this. My expectation was that our deployment to scale quicker to five replicas since fewer auto scale adjustments are needed in this scenario.

So, let's see what happens together. To start things out, make just one change to you deployment below. Set replicas to 3.

nano myHPA-Deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-hpa-deployment
  labels:
    app: my-hpa-deploy
spec:
  replicas: 3
  strategy: 
    type: RollingUpdate
  selector:
    matchLabels:
      app: my-hpa-pod
  template:
    metadata:
      labels:
        app: my-hpa-pod
    spec:
      containers:
       - name: my-hpa-container
         image: HPA-example:latest
         imagePullPolicy: IfNotPresent
      
         resources:
           requests:
             cpu: 500m
        
      terminationGracePeriodSeconds: 0

Now, create this three replicas deployment with the following command:

kubectl create -f myHPA-Deployment.yaml
deployment.apps/my-hpa-deployment created

Next, you'll need to generate a load using your second terminal again. Note that, if you exited your exec command, you'll need to exec into the load generator Pod again, using the following command.

kubectl exec -it myloadgenpod -- /bin/sh

Enter this at shell again:

while true; do wget -q -O- http://172.17.0.7:80; done

Next, you'll want to press the up arrow, as you do normally at the shell, for previous command works quicker. While the loop is running in the foreground, you can monitor its effects in your original terminal.

Just as before, monitor by running the kubectl get hpa command every 30 seconds or so. The following is the output:

kubectl get hpa
NAME                REFERENCE                      TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   <unknown>/45%   1         10        3          23s

NAME                REFERENCE                      TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   <unknown>/45%   1         10        3          55s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        3          65s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        3          108s

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   48%/45%   1         10        3          2m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   48%/45%   1         10        3          2m46s

During the first three minutes, the replicas number stayed at three. Moreover, the metrics server took one minute to transition from unknown CPU use to 0%.

In reality, the 0% is wrong, as it is not up-to-date with what is actually going on, with actual CPU utilization for each pod averaging around 48%. The metrics server will take another minute to update its CPU measure to be the accurate and current 48 percent.

Therefore, one important lesson for this is that we cannot evaluate HPA functionality from just observing one output measurement. Rather, we need to consider what is going on over time. On that note, at the end of this tutorial, I will explain how to let the metrics server update quicker.

Now, let's continue on by investigating a detailed status for the deployment. We'll see that just two seconds ago it did scale up from three to five replicas. In particular, consider this line of the output:

Replicas:               5 desired | 5 updated | 5 total | 3 available | 2 unavailable

To explain the above output, the deployment requires five replicas and has five replicas available to it, but there are only three of them which available at this particular second. This is because the last 2 replicas are busy starting up, hence why we see that they are only 2 unavailable.

kubectl describe deployment.extensions/my-hpa-deployment
Name:                   my-hpa-deployment
Labels:                 app=my-hpa-deploy
Selector:               app=my-hpa-pod
Replicas:               5 desired | 5 updated | 5 total | 3 available | 2 unavailable
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=my-hpa-pod
  Containers:
   my-hpa-container:
    Image:      HPA-example:latest
    Requests:
      cpu:        500m
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      False   MinimumReplicasUnavailable
NewReplicaSet:   my-hpa-deployment-78d4586d7f (5/5 replicas created)
Events:
  Type    Reason             Age    From                   Message
  ----    ------             ----   ----                   -------
  Normal  ScalingReplicaSet  3m23s  deployment-controller  Scaled up replica set my-hpa-deployment-78d4586d7f to 3
  Normal  ScalingReplicaSet  2s     deployment-controller  Scaled up replica set my-hpa-deployment-78d4586d7f to 5

We see below that the horizontal pod autoscaler did scale up to five replicas:

kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment

Name:                                                  my-hpa-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  62% (311m) / 45%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       3 current / 5 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    SucceededRescale    the HPA controller was able to update the target scale to 5
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Warning  FailedGetResourceMetric       2m22s (x3 over 2m52s)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  2m22s (x3 over 2m52s)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Normal   SuccessfulRescale             15s                    horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target

Below we confirm that the replicas now is five. Note, though, that it took from three minutes and 18 seconds to five minutes and 21 second for the CPU utilization percentage to drop from the 60% to the correct 35% percent for each pod.

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   62%/45%   1         10        5          3m18s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   59%/45%   1         10        5          4m4s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   59%/45%   1         10        5          4m46s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   35%/45%   1         10        5          5m21s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   36%/45%   1         10        5          6m13s

Note that the Conditions: output below is a bit confusing:

kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment

Name:                                                  my-hpa-deployment
Namespace:                                             default
CreationTimestamp:                                     Thu, 21 Feb 2019 16:10:15 +0200
Reference:                                             Deployment/my-hpa-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  36% (180m) / 45%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       5 current / 5 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:
  Type     Reason                        Age                   From                       Message
  ----     ------                        ----                  ----                       -------
  Warning  FailedGetResourceMetric       5m33s (x3 over 6m3s)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  5m33s (x3 over 6m3s)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Normal   SuccessfulRescale             3m26s                 horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target

From this output, we can see that the horizontal pod autoscaler scaled up from three to five, which would make us think, then, that the output should be Scale UP Stabilized. However, despite this contradiction, the longer prose description, shown below, is more accurate.

  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation

Remember that our previous example demonstrated the process of scaling down to one pod. Now it's time to demonstrate auto scaling down with just one, two or three pods. For this, of course, what is needed is a slightly lower workload.

Interestingly, for this, inserting a sleep .05 is absolutely perfect for this demo on my four-core server. Following this, go to your second terminal. Stop the while loop by pressing Control C. Then, enter the command below.

while true; do wget -q -O- http://172.17.0.7:80; sleep .05 ;done

The metrics server should in a few minutes determine a slightly lesser workload and automatically scale the pods downward. To show this, you'll want to monitor every 30 seconds. The output is as follows:

kubectl get hpa

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   17%/45%   1         10        5          10m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   19%/45%   1         10        5          14m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   31%/45%   1         10        3          15m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   31%/45%   1         10        3          15m

As expected replicas are scaled downwards from five to three.

kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name:                                                  my-hpa-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  31% (157m) / 45%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       3 current / 3 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Normal   SuccessfulRescale             12m                horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target
  Normal   SuccessfulRescale             69s                horizontal-pod-autoscaler  New size: 3; reason: All metrics below target

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   32%/45%   1         10        3          16m

This output proves the horizontal pod autoscaler can automatically scale upwards and download when reacting to changing workloads.

Now, you'll want to remove all workloads on your service. To do this, go to the second terminal and press Control C to break out of the loop. Within five minutes, horizontal pod autoscaler will scale the pod number from three down to just one. During this process, it will first scale down to two, then to one.

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   27%/45%   1         10        3          17m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        3          18m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        3          20m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        3          21m

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        2          22m

Now, let's investigate more detail:

kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name:                                                  my-hpa-deployment
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Thu, 21 Feb 2019 16:10:15 +0200
Reference:                                             Deployment/my-hpa-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (0) / 45%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       2 current / 2 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ScaleDownStabilized  recent recommendations were higher than current one, applying the highest recent recommendation
  ScalingActive   True    ValidMetricFound     the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Warning  FailedGetResourceMetric       21m (x3 over 22m)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  21m (x3 over 22m)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Normal   SuccessfulRescale             19m                horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target
  Normal   SuccessfulRescale             7m44s              horizontal-pod-autoscaler  New size: 3; reason: All metrics below target
  Normal   SuccessfulRescale             44s                horizontal-pod-autoscaler  New size: 2; reason: All metrics below target

The current status of your horizontal pod autoscaler deployment is as follows:

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        1          24m

And, the final output for your horizontal pod autoscaler is as follows:

kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment
Name:                                                  my-hpa-deployment
Reference:                                             Deployment/my-hpa-deployment
Events:
  Type     Reason                        Age                From                       Message
  ----     ------                        ----               ----                       -------
  Normal   SuccessfulRescale             96s                horizontal-pod-autoscaler  New size: 1; reason: All metrics below target

Cleanup

Now, let's perform some clean up by deleting all of the unnecessary stuff left over from our demonstration above. First, you'll want to delete load generator Pod using the following command:

kubectl delete pod/myloadgenpod
pod "myloadgenpod" deleted

Next, you can delete the horizontal pod autoscaler:

kubectl delete horizontalpodautoscaler.Auto scaling/my-hpa-deployment
horizontalpodautoscaler.Auto scaling "my-hpa-deployment" deleted

Deleting your horizontal pod autoscaler only deletes your API definition structure involved with the horizontal pod autoscaler, such as the minimum and maximum pod numbers, and maximum CPU utilization percentage allowed. However, deleting HPA does not delete your deployments or Pods. Therefore, you'll also want to delete those separately.

First, let's delete deployment:

kubectl delete -f myHPA-Deployment.yaml
deployment.apps "my-hpa-deployment" deleted

This will delete deployment, replicaset and all its pods. Then, delete the service with the following command:

kubectl delete svc/my-service
service "my-service" deleted

Last, if you do not need the 3Docker PHP and Apache image, you may delete it as well with the following command:

docker rmi HPA-example:latest

Algorithm Details

In reality, the exercise that we went through in the above demonstrations in this tutorial series shows us exactly how auto scaling algorithm works. Consider the following excerpt from this algorithm details page of the Kubernetes documentation.

...if the current metric value is 200m, and the desired value is 100m, the number of replicas will be doubled, since 200.0 / 100.0 == 2.0

If the current value is instead 50m, we'll halve the number of replicas, since 50.0 / 100.0 == 0.5.

We'll skip scaling if the ratio is sufficiently close to 1.0 (within a globally-configurable tolerance, from the --horizontal-pod-autoscaler-tolerance flag, which defaults to 0.1).

... the currentMetricValue is computed by taking the average of the given metric across all Pods in the HorizontalPodAutoscaler's scale target.

Horizontal Pod Autoscaler Downscale Delay

Now, let's discuss the delay that we noticed throughout this tutorial. As we saw, the HPA takes five minutes before down scaling the number of replicas. In reality, this can be changed, as this number represents the default setting. You can reduce this time with --horizontal-pod-autoscaler-downscale-delay.

As a point of reference, consider the information presented in the document Support for cooldown/delay.

Before you implement any changes be aware of the consequences. The below consequences were taken from the above document:

When tuning these parameter values, a cluster operator should be aware of the possible consequences.

If the delay (cooldown) value is set too long, there could be complaints that the Horizontal Pod Autoscaler is not responsive to workload changes.

However, if the delay value is set too short, the scale of the replicas set may keep thrashing as usual.

Community

Taking Full Advantage of Horizontal Pod Autoscaler in Kubernetes (Continued)

Scaling Down

The Overall Sequence of Events for Auto Scaling

Another Horizontal Pod Auto Scaling Example

Cleanup

Algorithm Details

Horizontal Pod Autoscaler Downscale Delay

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

Container Service for Kubernetes

Container Registry

ECS(Elastic Compute Service)