Which VM size should I choose as AKS node?

There are many nuances when choosing a node size in Azure Kubernetes Services, and not all of them are obvious. I decided to write a short post to discuss the most important ones. At the end of the day, this is going to be a trade-off: some factors are going to drive you towards larger nodes, and others towards smaller ones. Depending what is more important for you the decision would fall to one side or the other.

Why would I choose a large node?

The first one is performance: a larger node size is going to give you more performance. Not only in terms of CPU and memory, but as well in terms of I/O. Each VM size in Azure has a limit on I/O, which gets higher as the nodes grow larger. If your workloads need a high level of I/O, larger nodes will give you that edge. See Azure VM Sizes for more details.

Another not so evident one is the amount of data disks that each VM size can hold. This number will limit how many disk-based persistent volumes you have per node. If you are using disk-based PVs profusely, you might want to have larger node sizes.

The next reason why you might want larger nodes is to limit control plane overhead: each node will have to run certain control plane pods for the correct cluster operation. Having many nodes will mean that you will have many instances of those control plane pods.

Note that if you are using the Azure CNI plugin on large node sizes, you will probably want to increase the number of pods per node, which defaults to 30. Otherwise, no matter how much free CPU or memory your nodes have, they will not be able to host more than 30 pods. This parameter is configurable only at cluster creation time.

Why would I choose a small node?

The most obvious is the granularity when scaling: the smaller the node, the smoother your scalability curve will be. For example, if you have a 2-node cluster, scaling out will mean adding 50% more capacity. But if you have a 10-node cluster, scaling out will only add 10% more capacity.

Another related factor to consider is your node failure blast radius. If you have a 2-node cluster and you lose one node, you are losing 50% of your cluster capacity. But if you have a 10-node cluster, you would only lose 10% of the cluster.

A not so obvious one is that certain resources such as ephemeral SNAT ports for outbound conections for the public Internet are allocated per node, and per default it is 1,024 simultaneous connections to the same target IP. This number is configurable though, see here for more details.

Additionally, you want to have a certain minimum number of nodes per Availability Zone (AZ) in certain situations. For example, if you are using disk Persistent Volumes, you want to restrict pods to certain AZs so that they don’t try to mount disks in a different AZ. So you would then typically have 2-3 nodes per AZ, or 6-9-node cluster sizes.

So what do I do?

As you can see above, there are many factors influencing the decision, and it greatly depends on the resource requirements of your workload. If you still don’t know what to do, here a possible first approximation:

  • Start with 4-core VMs such as the Standard_D4s_v4.
  • Consider larger VMs if:
    • Your cluster would grow too large (for example, larger than 15-20 nodes)
    • Your workloads require more I/O performance per node
    • You need more than 8 data disks per node

Did I forget any other factor to consider when sizing nodes? Please let me know via comments!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: