Autoscaling strategy in GKE | C2C Community

Autoscaling strategy in GKE

  • 25 July 2022
  • 7 replies

Userlevel 1

I am working along with my team on GCP migration of on prem workloads. Below are the details:

We have our 2 calculation workloads(large and small) deployed on GKE within GCP. These workloads will simultaneously listen to a single queue(IBM MQ) on-prem.
There is a threshold parameter within message which distinguish between large message and small message. Once the message is identified as large/small it will appropriately be re-directed to respective calc component.

Now we want to device an autoscaling of GKE calc components so that resource utilization is optimized. Can anyone help us in setting up this custom metric/external metric for autoscaling?


Best answer by yuval 25 July 2022, 19:54

View original

7 replies

Userlevel 7
Badge +28

Hello @gcparch2022 and welcome to C2C!


 @yuval , since you’re good with GKE, perhaps you could offer an opinion? Thanks in advance!


Userlevel 4
Badge +3

Hello @gcparch2022 and thanks @Dimitris Petrakis for mentioning me on this!


@gcparch2022 this is a very good questions - if you implemented the large/small workloads as deployments, then you can use custom metrics to scale on but it will be harder to set up the metric to be certain messages within a queue. It will be easier to send messages to two queues - a small and a large queue and set an HPA object for each deployment.

You can also consider using K8s Jobs:

Hi @gcparch2022 ! 


Regarding your question, the GKE Cluster Autoscaler by itself has 2 flavors : ”balanced” and “optimize-utilization” (details on autoscaling profiles here), that is with regards to adding/removing nodes in node-pools more or less aggressively. 

I think your question is more centered around your deployments being placed on either small/large clusters based on an external metric and scaling that deployment as well based on that metric. You can use Custom Metrics for that value and use it to autoscale your deployment in combination with HPA (Horizontal Pod Autoscaler) to scale the numbers of replicas needed. Here below are a couple useful links/tutorials to do that, as well as another post from this community doing something similar but based on a queue size : 




Userlevel 1

Thanks   gtesseyre ,@Dimitris Petrakis  and @yuval for taking time our and reverting immediately. As @gtesseyre mentioned we are trying to scale both GKE calc components listening to single queue and that is the challenge. I will read through links shared and see if we land on a solution.

Userlevel 7
Badge +65

Hi @gcparch2022,

let us know if any of the above links - answers helped you.

Userlevel 7
Badge +35

Hello @gcparch2022,

Thanks for your question. I have read all of the valuable replies then I summarise the following details. I hope this additional resource will fulfill your use case and also you will achieve full confidence in Google Kubernetes Engines.

Google Kubernetes Engine has horizontal and vertical solutions for automatically scaling your pods as well as your infrastructure. When it comes to cost-optimization, these tools become extremely useful in ensuring that your workloads are being run as efficiently as possible and that you're only paying for what you're using.

You will set up and observe Horizontal Pod Autoscaling and Vertical Pod Autoscaling for pod-level scaling and Cluster Autoscaler (horizontal infrastructure solution) and Node Auto Provisioning (vertical infrastructure solution) for node-level scaling. First you'll use these autoscaling tools to save as many resources as possible and shrink your cluster's size during a period of low demand. Then you will increase the demands of your cluster and observe how autoscaling maintains availability.

What startegy you should perform.

  • Decrease number of replicas for a Deployment with Horizontal Pod Autoscaler

  • Decrease CPU request of a Deployment with Vertical Pod Autoscaler

  • Decrease number of nodes used in cluster with Cluster Autoscaler

  • Automatically create an optimized node pool for workload with Node Auto Provisioning

  • Test the autoscaling behavior against a spike in demand

  • Overprovision your cluster with Pause Pods

Benefits of different Google Kubernetes Engine autoscaling strategies, like Horizontal Pod Autoscaling and Vertical Pod Autoscaling for pod-level scaling, and Cluster Autoscaler and Node Auto Provisioning for node-level scaling.

Due to text limit restiction I didn’t summarise the full answer here.

You should click the following cloud skill bost lab link to get the details:

Lab: Understanding and Combining GKE Autoscaling Strategies


Additionally, You can browse the following series of Lab practical and theoretical use case scenarios.

Quest: Optimize Costs for Google Kubernetes Engine

Quest contains followings labs:


Note: To get all of the details, simply click the lab link. If you want to do lab hand practice, create an account with cloud skill boost and use a small amount of credit.





Userlevel 7
Badge +28

Hello there, @gcparch2022 ! I hope you are doing great! :)

I just thought to send you a quick hello and do a quick follow up with you, just to ask you if you read through the links shared by @yuval, @gtesseyre and @malamin and if they helped you out! :)

If you have any further questions, please don’t hesitate to ask! But it would be great to know if you found the answers good, if you had any progress, or if you have any further questions! 😎