Community Blog Taking Full Advantage of the Horizontal Pod Autoscaler in Kubernetes

Taking Full Advantage of the Horizontal Pod Autoscaler in Kubernetes

This tutorial will look at how you can perform auto scaling based on CPU usage with the horizontal pod autoscaler.

By Alwyn Botha, Alibaba Cloud Community Blog author.

According to the Horizontal Pod Autoscaler page of the Kubernetes Documentation, the Horizontal Pod Autoscaler (or HPA for short) of Kubernetes can be described as follows:

The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).

In this tutorial, we are going to focus on the first part of this definition, in particular, auto scaling based on observed CPU utilization because, once you understand the basics, you can take full advantage of the fact that auto scaling can be implemented based on several different metrics related to CPU utilization at the same time, and you will be able to use various software to provide you with additional metrics.

This tutorial is the first part of a two-parts series. You can find the other tutorial in this series here.

In this part of this two part series, specifically, we will cover the Docker build, Kubernetes deployment, and monitoring horizontal pod autoscaler functions using Kubectl.


Auto scaling itself is relatively easy to understand. However, to be able to implement auto scaling well, you'll need the knowledge and know-how of the following operations and topics:

It's important to note that, as a point of reference, this tutorial is made with minikube running locally on Windows 10. Minikube also works on Linux. Of course, as an alternative, you can also use the full-install of Kubernetes. For this tutorial, you'll need one Kubernetes node with at least two cores. For the example in this tutorial, I'll be using a four-core Kubernetes node.

Another important consideration is that the term auto scaling, as far as we are concerned, is unrelated to scaling of a workload across CPUs, which is the job of the operating system. Rather, the term relates to the scaling in the number of Kubernetes Pods. Through auto scaling capabilities, for example, instead of having one Pod using 400% CPU at one time, you can have eight Pods using 50% CPU each. In this tutorial, you will probably only be running the CPU at 25% to 45% usage for around two hours or so. Therefore, you won't benefit much from faster CPUs and more cores in this tutorial.

The metrics server gathers stats every minute by default, and therefore auto scaling only happens every five minutes or so. As such, your workload has to run for several minutes before you'll be able to see the HPA act automatically. Therefore, it's best to run this tutorial on a dedicated server in the cloud. For this, consider using an Alibaba Cloud ECS instance.

Overall Plan

In this tutorial, you will complete the following operations:

  • Create a custom Docker image that runs Apache and PHP processes.
  • Use a Kubernetes deployment with Pods running your custom Docker image.
  • Define a Kubernetes service to access all working Pods through one IP address.
  • Use a BusyBox Pod to send work to the service using Wget in a bash loop.
  • Monitor the horizontal pod autoscaler (HPA) in action using HPA-specific commands provided by Kubernetes.
  • Interpret HPA data output and make adjustments accordingly.

Creating the Docker Image

For the first leg of this tutorial, you'll need to create a custom Docker image that runs Apache and PHP. This Docker image will do CPU-intensive work though a five-line PHP program, which is taken from the Kubernetes website. To start, you'll need to first add the following two files to a temporary working directory.

  • Dockerfile:
FROM php:5-apache
ADD index.php /var/www/html/index.php
RUN chmod a+rx index.php
  • index.php:
  $x = 0.0001;
  for ($i = 0; $i <= 1000000; $i++) {
    $x += sqrt($x);
  echo "OK!";

Next, you'll want to use docker build to build your image. It will download Apache and PHP images (which are 360 MB in total) if you do not have these images already. Also, note that HPA-example:latest is the name of your docker image. You'll need to refer to this image from your worker Pod.

docker build -t HPA-example .

Sending build context to Docker daemon  5.632kB
Step 1/3 : FROM php:5-apache
5-apache: Pulling from library/php
5e6ec7f28fb7: Pull complete
cf165947b5b7: Pull complete
7bd37682846d: Pull complete
99daf8e838e1: Pull complete
ae320713efba: Pull complete
ebcb99c48d8c: Pull complete
9867e71b4ab6: Pull complete
936eb418164a: Pull complete
bc298e7adaf7: Pull complete
ccd61b587bcd: Pull complete
b2d4b347f67c: Pull complete
56e9dde34152: Pull complete
9ad99b17eb78: Pull complete
Digest: sha256:0a40fd273961b99d8afe69a61a68c73c04bc0caa9de384d3b2dd9e7986eec86d
Status: Downloaded newer image for php:5-apache
 ---> 24c791995c1e
Step 2/3 : ADD index.php /var/www/html/index.php
 ---> 9ccff8324890
Step 3/3 : RUN chmod a+rx index.php
 ---> Running in ab82b65295b9
Removing intermediate container ab82b65295b9
 ---> 8a322330700f
Successfully built 8a322330700f
Successfully tagged HPA-example:latest

Next, make sure that you have this docker image available:

docker images

REPOSITORY                                TAG                 IMAGE ID            CREATED             SIZE
hpa-example                               latest              8a322330700f        26 seconds ago      355MB

Now that all of this is complete, let's turn to Kubernetes.

Deploying Kubernetes

Kubernetes cannot automatically scale single Pods. Therefore, you'll need a higher level manager that inherently allows itself to be adjusted. You can learn about one at the Deployments page of the Kubernetes documentation.

The required specifications for your deployment are as follows:

  • replicas: 1: You only need one Pod running to start. Your horizontal pod autoscaler (HPA) specifications definition will cause the number of replicas to be automatically scaled based on your own specific requirements.
  • image: HPA-example:latest: You'll need to use your own custom Docker image with your work-generator PHP program.
  • app: my-hpa-pod: This is a label. Your services must refer to this label. The service is a front-end to this Pod.
  • requests: cpu: 500m: You'll need to have such a request definition. This will cause the metrics server to pay particular attention to it, gathering the stats that the HPA also needs to function.

Now consider the following:

nano myHPA-Deployment.yaml

apiVersion: apps/v1
kind: Deployment
  name: my-hpa-deployment
    app: my-hpa-deploy
  replicas: 1
    type: RollingUpdate
      app: my-hpa-pod
        app: my-hpa-pod
       - name: my-hpa-container
         image: HPA-example:latest
         imagePullPolicy: IfNotPresent
             cpu: 500m
      terminationGracePeriodSeconds: 0 

Something important to note is selector: app: my-hpa-pod. This service is a front-end relative to the Pod running in your deployment defined above. It will be accessed through port 80. You can read more about what a Service is here.

nano myService.yaml

kind: Service
apiVersion: v1
  name: my-service
    app: my-hpa-pod
  - protocol: TCP
    port: 80
    targetPort: 8080

Now, you'll want to create your service.

kubectl create -f myService.yaml
service/my-service created

Next, you'll want to list services with the kubectl get command. For for information, you can read about this command in the here in the Kubernetes documentation. This document provides many useful examples.

kubectl get svc

kubernetes   ClusterIP        <none>        443/TCP   16d
my-service   ClusterIP   <none>        80/TCP    4s

Next, create your deployment:

kubectl create -f myHPA-Deployment.yaml
deployment.apps/my-hpa-deployment created

And list deployments with the following command. You'll see your desired Pod running.

kubectl get deploy
my-hpa-deployment   1         1         1            1           8s

Define your Auto scaling requirements as follows:

  • Set the autoscaler to deployment: my-hpa-deployment.
  • Set --cpu-percent=45 because each Pod needs to be set to have a maximum CPU utilization of 45%.
  • Set --min=1 --max=10. By doing so, you'll define that the autoscaler may automatically scale the number of Pods in our deployment from 1 to maximally 10 Pods.
kubectl autoscale deployment my-hpa-deployment --cpu-percent=45 --min=1 --max=10
horizontalpodautoscaler.Auto scaling/my-hpa-deployment autoscaled

Next, you'll want to determine the status of your HPA:

kubectl get hpa
NAME                REFERENCE                      TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   <unknown>/45%  1         10        0          8s

In the above, TARGETS is set to have a maximum CPU utilization of 45%. It will take several minutes for the metrics server values to become available. However, at this stage you have:

  • an Apache Pod that is running PHP processes in your deployment.
  • a service that is pointing to the Pod.
  • a horizontal pod autoscaler (HPA) definition that defines a maximum CPU usage of 45% for each Pod.

Note that, if you run the kubectl get rs command, you will also see a ReplicaSet. Your deployment is the ReplicaSet-manager that gets its instructions from your HPA.

All this is running, so now the only thing you'll need is to know the CPU load on your PHP Pod. For this, you'll need to define the load generator Pod. Below this is just a basic BusyBox Pod that you can exec into. From there, you will wget PHP webpages in a bash loop.

nano myLoad-Generator.yaml

apiVersion: v1
kind: Pod
  name: myloadgenpod
    app: my-loadgen-pod
  - name: my-loadgen-container
    image:  busybox
    imagePullPolicy: IfNotPresent

    command: ['sh', '-c', 'sleep 3600']

  restartPolicy: Never
  terminationGracePeriodSeconds: 0 

Create the load generator Pod.

kubectl create -f myLoad-Generator.yaml
pod/myloadgenpod created

Next, run the kubectl exec command into loadgenpod. For reference, check out this document. You need to run the following in a separate terminal window. That is because, while the loop is running in the foreground, you can monitor its effects in your original terminal.

kubectl exec -it myloadgenpod -- /bin/sh
/ # cd tmp
/tmp # wget
Connecting to (
index.html           100% |************************************************************************|    26  0:00:00 ETA
/tmp # cat index.html
/tmp #

So, now, let's do a quick check to see if your load generator Pod can access your PHP webpage. For this, you'll want to do the following. Use the command cd tmp, and change to tmp directory. Then, use wget to get your index.php load generator page through a service IP address. And then use the index.html, if it returns OK, then it works.

Now that you know it' works fine, you can send some workload to index.php. And enter this at shell:

while true; do wget -q -O-; done

The above is endless loop fetching your index.php page.

Now, go back to your original terminal to investigate horizontal pod autoscaler's functionality based on this workload. Then, if you find your terminal to be very slow, as in your node overwhelmed by this work, you can experiment to lessen the impact on your shell response time.

If you have to, experiment by lessening the load and checking CPU utilization levels with the top command. Of course, an CPU utilization of around 50% is fine. What it means is that, if you have four cores, as I have in my example, then 4 * 100 = 400 CPU capacity, so 200 usage is absolutely fine.

You should now have one Pod running Apache with five PHP child processes that are processing this workload. Below are the suggested experiment values:

while true; do wget -q -O-; sleep .05 ; done
while true; do wget -q -O-; sleep .09 ; done
while true; do wget -q -O-; sleep .2 ; done

If you're using four cores like me, then no sleep is required to lessen the overall CPU load.

Monitor HPA Functions with kubectl get hpa

Now, you will monitor HPA functionality with kubectl get hpa. For this tutorial, you'll want to enter this command every 30 seconds in the same terminal. As seen below, from the results of this command, within a few minutes you can see auto scaling is occurring over time.

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        1          75s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   0%/45%    1         10        1          110s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   40%/45%   1         10        1          3m17s

NAME                REFERENCE                      TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   187%/45%   1         10        1          3m31s

NAME                REFERENCE                      TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   187%/45%   1         10        4          3m52s

kubectl get hpa
NAME                REFERENCE                      TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   186%/45%   1         10        5          4m31s

The metrics server takes three minutes to record that CPU use is more than 0%, and it's not until around three minutes and 30 seconds that it measures CPU use 187% for the one Pod. The horizontal pod autoscaler then decides to scale to four pods the CPU utilization number is above the target you set. Then, following this, 40 seconds later, it is scaled to five pods for the same reason.

Unfortunately, the measured CPU use stays 186% with five Pods. This reason for this is because the system does not immediately divide this number by five since that would require even more CPU usage. Later, we will see this more clearly.

At this point, you can view the horizontal pod autoscaler details for this deployment to see when it scaled up or down and why it did so. In particular, you can use the kubectl describe command for that:

kubectl describe horizontalpodautoscaler.Auto scaling/my-hpa-deployment

Name:                                                  my-hpa-deployment
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Thu, 21 Feb 2019 15:37:59 +0200
Reference:                                             Deployment/my-hpa-deployment
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  186% (933m) / 45%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       5 current / 5 desired
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Warning  FailedGetResourceMetric       3m37s (x4 over 4m22s)  horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedComputeMetricsReplicas  3m37s (x4 over 4m22s)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Normal   SuccessfulRescale             74s                    horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
  Normal   SuccessfulRescale             59s                    horizontal-pod-autoscaler  New size: 5; reason:

Now, let's discuss the conditions and events seen above.


  • AbleToScale True ReadyForNewScale: This indicates whether or not the HPA is able to fetch and update scales, as well as whether or not any backoff-related conditions would prevent scaling
  • ScalingActive True ValidMetricFound: In this condition, ScalingActive indicates whether or not the horizontal pod autoscaler is enabled, or more specifically if the replica count of the target is not zero, and is able to calculate desired scales. When ScalingActive is False, it usually means that there exist problems in fetching the corresponding metrics.
  • ScalingLimited False: With this condition, you have minimum replica number of 1 and a maximum of 10. The current number of Pods is 5. This deployment is able to scale up or down within that range. Scaling is not limited.


Of the events seen above. The first event, Warning FailedGetResourceMetric is abnormal. It indicates that metrics server is still busy gathering data and did not send any data to the HPA to use. However, the last two events are normal. It shows autoscaler in action, and also what it did and why.

You still need to see the CPU use per Pod reduce. To this end, if you continue to monitor get HPA, you'll see that happen after a few minutes. In fact, after five minutes and 32 seconds, the CPU use for one Pod is at 37%, and it continues to hover around that value since workload is a steady stream of Wgets.

kubectl get hpa
NAME                REFERENCE                      TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   186%/45%   1         10        5          5m6s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   37%/45%   1         10        5          5m32s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   36%/45%   1         10        5          6m33s

NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-hpa-deployment   Deployment/my-hpa-deployment   36%/45%   1         10        5          6m45s
0 0 0
Share on

Alibaba Clouder

2,600 posts | 750 followers

You may also like