Kubernetes Horizontal Pod Autoscaler

Pods can be created as stand alone objects, or as a part of a scalable replica set or a deployment. Each of the latter two objects are used to deploy not just one pod, but a multitude of them. The objective here is that the pods can be fungible if one has too much traffic two more can spawn up and take the extra burden. However, an important thing to note here is that both replica set and deployment objects have a hard coded number of pod replicas that they intend to run.

If the replica count is set to 100, and the demand is too little, even then the 100 pods will be up and running. This results in waste of CPU and memory resources. Yes, it offers reliability, in the sense that if a node crashes and pods within it die, the Replica Set controller would try to bring back the number of pods back to 100 by spawning pods in other nodes. The application stays online.

In a more abstract sense, the Replica Set would try to achieve a desired state of the cluster and would look at the current state and figure out how it can achieve the desired state.

However, we would like something a bit more sensitive to the real-world demand. Enter Horizontal Pod Autoscaler. It is the job of Horizontal Pod Autoscaler to scale the application up when there is need for it and then scale it back down once the workload drops.

Why use a Horizontal Pod Autoscaler?

As the name suggests, this component would scale your application automatically. In the cloud, this can really help you reduce the compute and memory resources you will be billed for. Since the Autoscaler is sensitive to the resource utilization, when it sees that a lot of pods are just sitting idle it scales the application down and when the demand on those pods increases it scales the application up by creating new pods and the load gets distributed to those.

It can save you both valuable time and compute resources. You won’t have to worry about what the Replica count should be for your pods when writing a deployment, autoscaler would manage that for you.

Initial Setup

First and foremost requirement would be for you to have a running Kubernetes cluster. Use Katacoda Playground which is perfect for experimentation and learning about Kubernetes. The next thing you would require is a metric server.

This add-on to your Kubernetes system (kube-system namespace) would gather metrics such as CPU and memory usage from two different perspective:

  1. Resource used by each pod
  2. Resource consumed at each node

Metrics from both the perspective are crucial in helping Autoscaler decide what its next move should be. To add metric server to your Kubernetes cluster, follow this guide. Now we are ready to see Horizontal Pod Autoscaler in action.

Using the Autoscaler

To see the Autoscaler working, we need a test application. Let’s create a simple php-apache server and expose it as a service.

$ kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose

The image used in here is one of the sample images provide by the Kubernetes project. It performs some CPU intensive tasks and makes the process much more apparent by doing so.

To autoscale this deployment, we need to inform the autoscaler what are the minimum and maximum number of pods that we will allow and CPU percentage that they are allowed to use. There are many more factors that you can consider like memory, storage and network as well.

$ kubectl autoscale deployments/php-apache --cpu-percent=50 --min=1 --max=10

In the current state, since no one is consuming this service, it will most like stay at the minimum value. You can check the state of all autoscaled deployment in the default namespace by running:

$ kubectl get hpa
php-apache   Deployment/php-apache   0%/50%    1         10        1          2m

Generating Load and Testing the Autoscale Feature

You can see the number of replica is still only one and CPU load is insignificantly low. We can create additional load and see how the autoscaler responds to it. The service that exposes our php-apache pods is not exposed to the outside world, so we will create a temporary pod and open an interactive shell session in that pod.

This will allow us to communicate with all the services available in the cluster, including the php-apache service.

$ kubectl run -i --tty busybox --image=busybox --restart=Never -- sh
/ #

You will notice that the prompt will change indicating that we are inside this container. Let’s now try and put some load on our service by repeatedly making requests. In the new prompt, let’s run the following while loop:

/ # while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done

Open a new terminal, since we can’t let this loop terminate just yet. Upon inspecting the autoscaler you will see the CPU utilization and upon listing the pods you will see there are now multiple instances of php-apache server,

$ kubectl get hpa

php-apache   Deployment/php-apache   121%/50%   1         10        4          1h
$ kubectl get pods

NAME                          READY     STATUS    RESTARTS   AGE
busybox                       1/1       Running   0          6m
php-apache-8699449574-7qwxd   1/1       Running   0          28s
php-apache-8699449574-c9v54   1/1       Running   0          10h
php-apache-8699449574-h9s5f   1/1       Running   0          28s
php-apache-8699449574-sg4hz   1/1       Running   0          28s

Terminate the while loop and the number of pods will die down to one in a few minutes.


So that’s a simple demonstration of Horizontal Pod Autoscaler. Remember to have a functional metrics-server for your cluster and while creating a deployment keep the replica count at 1. The horizontal pod autoscaler will take care of the rest.

About the author

Ranvir Singh

Ranvir Singh

I am a tech and science writer with quite a diverse range of interests. A strong believer of the Unix philosophy. Few of the things I am passionate about include system administration, computer hardware and physics.