Introduction
When sizing a Google Kubernetes Engine (GKE) cluster, several factors come into play such as the expected workload, resource requirements, performance needs etc. Similar to performance, scalability is a very critical factor to the success of a growing organisation which should be considered during the design of a cluster.
Scalability of a Kubernetes cluster is not one dimensional but a combination of a lot of components. When considering the scalability of a Kubernetes cluster, a better way to think about scalability is that within all possible configurations of the cluster there is a cube which wraps an envelope of those configurations which will offer you good stability and performance and as long as you are within that envelope essentially meaning within the limits of scalability on multiple dimensions you are safe and you can say that you are within the scalability limits.
However, it’s important to note that the dimensions that describe the configurations of the cube are not independent. This means that moving further in any one dimension may contract the viable space in the other dimensions. In practical terms, this means that increasing scalability in one area may come at the cost of reduced scalability in another area.
One potential solution to this issue is to decompose the cube into independent sub-cubes. This approach can help reduce the dimensionality of the problem and make it easier to focus on specific areas of scalability. By breaking down the cube into smaller, more manageable parts, it is possible to more effectively analyze and optimize each individual component.
Defining a Kubernetes cluster
The Kubernetes cluster can scale within a range and limits to accommodate workloads and resource demands effectively. Kubernetes scalability envelope defines the boundaries and constraints of scalability based on various factors such as cluster size, node capacity, resource allocation and performance requirements. The scalability envelope helps cluster administrators and operators understand the cluster’s capabilities and plan for scaling appropriately.
The key considerations during designing a cluster include:
- Cluster Size: Sizing a cluster correctly to accommodate all the workloads is very important. Resource is not cheap, if we oversize a cluster, we waste resources 🤦 and if we undersize the cluster, we get production issues 😛. Sizing basically takes into account the desired number of worker nodes in the cluster. It defines the minimum and maximum limits for the number of nodes that can be added or removed from the cluster to handle workload fluctuations efficiently.
- Node Capacity: Each worker node in the cluster has its own capacity in terms of CPU, memory, and other resources. Kubernetes can work with a wide range of node sizes. Choose appropriate worker node instance types based on CPU, memory, and other resource specifications. Using the most cost effective node size that the cloud provider offers is considered an optimal solution but often times larger nodes works out to be cheaper. This increases pod density and the pods are not scattered out in the cluster which minimizes the idea of pod replication and high availability in case of a node failure. In these cases try adding some smaller nodes too to help with redundancy.
- Resource Allocation: The scalability envelope includes resource allocation considerations such as CPU, memory, storage, and network bandwidth. It defines the minimum and maximum limits for resource allocation per pod, ensuring that the cluster can scale within those limits without overcommitting or under-utilizing resources.
- High Availability: Ensure sufficient redundancy and fault tolerance by sizing your cluster with multiple worker nodes. This helps distribute the workload and prevent a single point of failure.
- Future Growth: Consider future growth and scalability needs of your application or addition of new applications. Plan for potential increases in workload, additional features, or user base expansion. Ensure that the cluster sizing accounts for these future requirements.
Kubernetes Scalability Thresholds
Quoting Kubernetes official community documentation, scalability thresholds is defined on the basis of “You Promise, We Promise” principle
If you promise to:
- correctly configure your cluster
- use extensibility features “reasonably”
- keep the load in the cluster within recommended limits
then we promise that your cluster scales, i.e.:
- all the SLOs are satisfied
More on Kubernetes scalability thresholds here and here
Sizing a real world production kubernetes cluster 👇 below. This calculation can be taken as a reference and modified as per the requirements.
Sizing a Kubernetes Cluster, an example
For reference, we are using GKE as our cloud provider.
Goal
The goal is to define the configurations for a Kubernetes Cluster.
Calculating Total Capacity
Considering the total requirement of CPU and Memory in the Cluster is 90
vCPU and 250
GBs. Total number of replicas(pods) in the cluster is 40
. Mean requirement of a pod is 2
vCPU and 4
GB.
Node size (number of nodes)
- Decision: n2-standard-8
- Rationale: We do not want to go with a bigger node size to avoid the larger blast radius in case of a single node failure.
Max nodes in the Cluster
- Decision: 32 total, 28 usable
- Rationale: For a max CPU limit in the existing cluster of
90
vCPU and memory of250
GB, 12 nodesn2-standard-8
will have 96(=12*8) CPU. Keeping twice the capacity of the required resources as a buffer, we would need a total of 24 nodes. As per GKE recommendations, we need to use /27 which results in 28 (4 nodes are reserved by GKE ) nodes cluster. The first two and last two IP addresses of a primary IP address range are reserved by Google Cloud.
Pods/node
- Decision: 32
- Rationale: As per GKE autopilot recommendations, the autopilot clusters can have a maximum of
32
Pods per node. As with the GKE standard, this results in a/26
range being provisioned per node which is64
IPs.
Cluster CIDRs
-
Primary subnet IP range
- Decision: -
/26
- Rationale: The Primary range is used to assign IPs to nodes in the cluster. With a max of 32 nodes in the cluster, we would need a
/27
range. Doubling this range to/26
to support IP exhaustion cases during cluster upgrades or node pool drain.
- Decision: -
-
Secondary subnet IP range for pods
- Decision: -
/22
- Rationale: The secondary subnet IP is used to assign pods in the cluster. With a maximum of
32
pods in a node, we will need a/26
IP range for each node which gives us a total requirement for a32
nodes cluster of/22
(32*32 = 1024, /22 = 1024)
- Decision: -
-
Secondary subnet IP range for services
- Decision: -
/25
- Rationale: Considering the number of applications getting deployed in this cluster is
26
. Keeping a buffer with the introduction of a few new services we can use/25
which is128
IPs.
- Decision: -
Conclusion
Sizing a Kubernetes cluster is nothing less than an art. Properly sizing a Kubernetes cluster is a crucial step in ensuring optimal performance, resource utilization, and scalability for your applications. I’ve tried to cover the basics here but by carefully considering factors such as HPA, VPA, Quotas and LimitRanges, workload scalability, node capacity, high availability, performance objectives, storage needs, networking considerations, scaling policies, and future growth, you can design a well-sized cluster that meets your application’s demands.
Proper cluster sizing involves estimating CPU and memory requirements, selecting suitable worker node instance types, and ensuring redundancy and fault tolerance through multiple nodes. It also requires setting thresholds and policies for scaling based on resource utilization and workload demand, as well as implementing monitoring and observability to continuously monitor the cluster’s performance.
Remember that these sizing guidelines provide a starting point, and the specific requirements may vary based on your workload and application characteristics. Regular monitoring and adjustments are necessary to optimize resource allocation, accommodate growth, and maintain optimal performance.