Kubernetes — Node size

	A	C	D	G	H	L	M	N	O	P	Q	R	S	T	U
1
2		Research on the trade offs when choosing an instance type for a Kubernetes cluster
3
4	Find more research at:	How to contribute:	Leave a comment or drop us a line at research@learnk8s.io
5	https://learnk8s.io/research	License:	Apache 2.0
6		Last updated:	February 14, 2022
7
8		Check out the Kubernetes instance calculator
9
10
11
12				What is it?	Not all memory and CPU in a Node can be used to run Pods. The resources are partitioned in 4: 1. Memory and CPU reserved to the operating system and system daemons such as SSH 2. Memory and CPU reserved to the Kubelet and Kubernetes agents such as the CRI 3. Memory reserved for the hard eviction threshold 4. Memory and CPU available to Pods The graph show the total memory available for running Pods after subtracting the reserved memory	Memory (in GiB)	GKE	EKS	AKS
13						1	55.00%	0.00%	0.00%
14						2	65.00%	16.75%	32.50%
15						4	70.00%	58.38%	53.75%
16						8	75.00%	79.19%	66.88%
17						16	82.50%	89.59%	78.44%
18						64	91.13%	97.40%	90.11%
19						128	92.56%	97.50%	92.05%
20						192	93.88%	98.33%	93.54%
21						256	94.91%	98.75%	94.65%
22
23				Example	If you have a Kubernetes cluster in GKE with a single Node of 2GB of memory, only 65% of the available memory is used to run Pods. The reamining memory is necessary to run the OS, Kubelet, CRI, CNI, etc.
24
25
26
27				Notes	GKE and AKS reach 90% level of utilisation with instances over 64GB. EKS is 90% efficient starting with 8GB.
28
29
30
31				What is it?	Not all memory and CPU in a Node can be used to run Pods. The resources are partitioned in 4: 1. Memory and CPU reserved to the operating system and system daemons such as SSH 2. Memory and CPU reserved to the Kubelet and Kubernetes agents such as the CRI 3. Memory reserved for the hard eviction threshold 4. Memory and CPU available to Pods The graph show the total memory available for running Pods after subtracting the reserved memory	CPU (in millicores)	GKE	EKS	AKS
32						1	84.00%	84.00%	84.00%
33						2	91.50%	91.50%	90.00%
34						4	95.50%	95.50%	94.00%
35						8	97.63%	97.63%	96.50%
36						16	98.69%	98.69%	97.75%
37						32	99.22%	99.22%	98.38%
38						64	99.48%	99.40%	98.69%
39
40
41
42				Example	If you have a Kubernetes cluster in AKS with a single Node and 2 vCPU 90% of the available CPUs are used to run Pods. The reamining memory is necessary to run the OS, Kubelet, CRI, CNI, etc.
43
44
45
46				Notes	As long as you use node with at least 2 vCPU you should be fine.
47
48
49
50				What is it?	There's a upper limit on the number of Pods that you can run on each Node. Each cloud provider has a different limit. Most of the time the limit is independent of the Node size (e.g. GKE, AKS). There are cases where the number of Pods depends on the Node size (notable: EKS).	Memory (in GiB)	GKE	EKS	AKS
51						1	110	110	250
52						2	110	110	250
53						4	110	110	250
54						8	110	110	250
55						16	110	110	250
56						64	110	110	250
57						128	110	250	250
58						192	110	250	250
59				Notes	The metrics is relevant to measure your blast radius. Assuming that a Node is lost how many Pods are affected?	256	110	250	250
60
61
62
63
64
65
66
67
68
69				What is it?	Nodes have an upper limit on the number of Pods that they can run. Assuming that you run the max number of Pods for that node, how much memory is available to each Pod? This metric divides the available Node memory by the max number of Pods for that instance type.	Memory (in GiB)	GKE	EKS	AKS
70						1	0.01	0.00	0.00
71						2	0.01	0.00	0.00
72						4	0.03	0.02	0.01
73						8	0.05	0.06	0.02
74						16	0.12	0.13	0.05
75						64	0.53	0.57	0.23
76				Example	If you have Kubernetes cluster in GKE with a single Node of 128GB of memory, you can run up to 110 Pods and each of them can use 1.08GB of memory.	128	1.08	0.50	0.47
77						192	1.64	0.76	0.72
78						256	2.21	1.01	0.97
79
80				Notes	It's not possible to run small workloads (less than 1GB of memory) efficiently on GKE when the node size is greater than 128GB of memory. EKS has a peak at 192GB of memory. That's where there are the most Pod with the larger memory available to them (234 Pods with 810MiB of memory each).
81
82
83
84
85
86
87
88				What is it?	If all my Pods are using 1GB of memory, what instance type I should use to maximise the memory available? The charts presents 5 scenarios: what if all the Pods in the Node have limits of 1, 2, 4, 8 or 16 GiB. The chart shows how utilised is the node.	Pod memory limit	1GiB	2GiB	4GiB	8GiB	16GiB	64GiB	128 GiB	192 GiB	256 GiB
89						0.5GiB	50.00%	50.00%	62.50%	75.00%	81.25%	85.94%	42.97%	28.65%	21.48%
90						1 GiB	0.00%	50.00%	50.00%	75.00%	81.25%	90.63%	85.94%	57.29%	42.97%
91						2 GiB	0.00%	0.00%	50.00%	75.00%	75.00%	90.63%	92.19%	93.75%	85.94%
92						4 GiB	0.00%	0.00%	0.00%	50.00%	75.00%	87.50%	90.63%	93.75%	93.75%
93						8 GiB	0.00%	0.00%	0.00%	0.00%	50.00%	87.50%	87.50%	91.67%	93.75%
94						16 GiB	0.00%	0.00%	0.00%	0.00%	0.00%	75.00%	87.50%	91.67%	93.75%
95
96				Example	When all Pods in your cluster are 1GB, the best node that can allocate the most number of Pods is a Node with 64GB of memory. Values before the peak means that the node is underutilised (there's still space, but not enough to run a Pod). Values after the peak means that you reached the limit of Pods on that and you can't schedule more Pods on that node.
97
98
99
100