Which Azure network design is cheaper?

If you have been reading some of my blog posts, you probably know that I have been working on Azure networking for a while. Part of that work has consisted of helping customers to create network architectures based on their requirements. Last week I got a similar ask from a colleague for a large-scale hub-and-spoke design, and I decided to do something I hadn’t done before: calculate the price of each option.

You might (rightfully) wonder why the heck I wouldn’t have done that before. The answer is easy: laziness. Azure networking pricing model can be complex, and comparing different designs to each other is often an apples-to-oranges discussion. On top of that, the cost depends on the traffic flows that every organization has (Cynthia Treger has blogged about data transfer costs in Azure), and very often the details about expected flows are not available at all, so you cannot calculate the final price.

Still, I thought it would be interesting to take some assumptions, so I opened up the second best solution to every problem (of course I mean Excel) and I started putting numbers in a spreadsheet to generate cost comparison charts like this one. Please don’t focus too much on the columns here, the goal is giving you an idea of what topics we will dive in the rest of the post:

Let me give you a quick summary of what I learned:

You shouldn’t neglect the cost of data processing and traffic peering in your calculations, since it can be a significant chunk of your overall bill.
From that perspective, designs with transit VNets will be more expensive, especially when you have heavy VNet-to-VNet flows.
The structure of Virtual WAN traffic processing cost makes it cheaper than self-managed hub-and-spoke for many scenarios.

Disclaimer: price is not everything!

I would like to highlight that this post is all about pricing. The fact that one solution is more expensive doesn’t mean that nobody should pick it up. You need to weigh its added value against its price, and decide whether it is worth it.

The different architectures evaluated here vary in functionality and flexibility, but mostly in their operational overhead. Price is going to be an important factor to consider in the decision of which one is the best for your organization, but it shouldn’t be the only one. Especially for a foundational technology such as networking, that could impact positively or negatively the rest of your environment.

But don’t let me stall any longer and let’s start our trip into the rabbit hole!

The design options

The initial question is simple enough: which Azure networking architecture is best for around 3,000 VNets over two regions with Azure Firewall and ExpressRoute? Different options exist for large-scale hub-and-spoke environments, don’t miss Adam Stuart’s blog on that topic. Out of those, I selected four options as the most attractive.

The first one is the traditional customer-managed hub-and-spoke environment, where you connect all of the hubs together via (global) VNet peering. The maximum number of spokes per environment is 500, since that is the maximum number of peerings per VNet. Until recently, the bottleneck was the number of UDR routes per route table (more concretely, the route table associated to the GatewaySubnet), but that number has been raised from 400 to 600.

The second option is very similar, but using Virtual WAN. Here the limit of spokes is 600, as stated in the documentation:

The third option is an indirect spoke design. The idea is to consolidate the ExpressRoute gateways on a core layer. This core layer, provided by Virtual WAN, will also interconnect all spoke blocks between each other eliminating the need for the full mesh between the hub VNets:

And the fourth option is using Azure Virtual Network Manager to create a full-mesh between the spokes in every spoke block and the hubs, and to manage routing in the GatewaySubnets. From a topology perspective identical to option 1, but with different scalability numbers, since with AVNM both the peerings and the static routes can go up to 1,000:

Cost analysis with 3,000 spokes

So let’s start with the first cost analysis. I configured the calculation parameters to 3,000 spokes over 2 regions, and I set these traffic flow parameters:

100 MB per month between every two spokes.
1 GB per month from every spoke to onprem.
10 GB per month from onprem to every spoke.

The last parameter (traffic from onprem to Azure) will impact the size of the ExpressRoute gateways, since traffic in the opposite direction (from Azure to onprem) bypasses the gateways (with the exception of private endpoints, but I didn’t consider that here).

I also left some margin in the maximum number of spokes per block:

For option 1 (hub and spoke) I used 490 instead of 500, to leave room for the peerings between hub virtual networks.
For option 2 (VWAN) I used 590 instead of 600.
For option 3 (indirect spokes) I used 490 instead of 500, to save 10 peerings in every transit VNet for additional purpose (actually 9, one peering is consumed by the connection to Virtual WAN).
Finally, for option 4 (AVNM) I used 990 instead of 1,000, also to leave some room for additional routes or peerings.

The first block of costs are going to be the VNet peerings. They are identical in options 1, 2 and 4, but in Virtual WAN you don’t pay for the peering side at the hub:

	1-HnS	2-VWAN	3-HnS+VWAN	4-AVNM
VNet peerings all spokes	18,654	9,327	18,654	18,654

You might argue that the (lack of) VWAN peering costs is absorbed by the hub data processing, but let’s table that discussion for a bit later.

The second block of costs are the firewall costs. Here you need to break them down in the fixed and variable (per GB) components:

	1-HnS	2-VWAN	3-HnS+VWAN	4-AVNM
FW bandwidth	27,528	26,064	26,424	19,824
FW fixed	7,300	5,475	7,300	3,650
FW total	34,828	31,539	33,724	23,474

As you can see, the variable costs because of bandwidth are similar in all designs except for AVNM, since the full-mesh interconnection of spokes inside of one block removes some traffic from reaching the firewall. The small variances across options 1, 2 and 3 are due to the size of the spoke blocks: the smaller the spoke blocks, the more flows need to traverse two firewalls instead of just one.

The lower cost of the AVNM option also for the fixed costs is easy to understand: if you have larger blocks (1,000 VNets), you will have fewer firewalls.

Let’s move on with the trickiest part, that I call the inter-block costs. These costs will depend on the topology:

	1-HnS	2-VWAN	3-HnS+VWAN	4-AVNM
Transit-to-vhub peering			7,860
HnS H2H peering (intra-region)	3,375			1,530
HnS H2H peering (cross-region)	4,500			4,500
VWAN H2H transfer (intra-region)		0	6,060
VWAN H2H transfer (cross-region)		9,000	27,000
Interblock total	7,875	9,000	40,920	6,030

The first row (transit-to-vhub) is only relevant for option 3, the indirect spoke design.
Options 1 (hub and spoke) and 4 (AVNM) have the same structure, where you pay for traffic traversing the peerings between the hubs. These peerings can be local or global, so you need to differentiate between intra-region and cross-region. AVNM has less intra-region traffic between blocks, because the blocks are larger, but exactly the same cross-region traffic.
For option 2 you don’t pay for intra-region hub-to-hub traffic, or at least I haven’t found a price for that. You do pay for cross-region data transfer, but not for the vHub data processing, since traffic goes through the firewalls and not through the virtual hub routers.
In option 3 you pay for both vHub data processing and cross-region data transfer at the VWAN layer

You can see how the indirect spoke option goes very badly here. Now that we are talking about virtual hubs, let’s have a look at their actual cost:

	1-HnS	2-VWAN	3-HnS+VWAN	4-AVNM
vhub fixed		1,095	365
vhub RUs		438	146
Virtual hub total	0	1,533	511	0

We only have virtual hub costs in options 2 and 3, because the other options do not include Virtual WAN. Option 3 has only two virtual hubs (one per region), so the costs are lower. However, in absolute terms these are mostly negligible compared to the cost of traffic that we saw earlier.

We are almost there, let’s look at the costs for the ExpressRoute gateways, which are calculated differently in hub-and-spoke (per gateway) and Virtual WAN (scale units and connection units):

	1-HnS	2-VWAN	3-HnS+VWAN	4-AVNM
ERGW SKU	ErGw1AZ			ErGw1AZ
ERGW cost	2108.24			1054.12
VWAN ER scale units per hub		1	1
VWAN ER conn. per hub		2	2
ER GW total	2,108.24	2,277.60	759.20	1,054.12

Here again the indirect spoke model is the cheapest, since it only has gateways in 2 hubs. After comes option 4, since it reduces the number of spoke blocks due to the higher 1,000 VNet limit, and finally options 1 and 2 pretty close. Still, these numbers are not going to make a dent in the overall costs.

And finally, the AVNM component, which is pretty straight forward (and significant):

	1-HnS	2-VWAN	3-HnS+VWAN	4-AVNM
AVNM costs	0	0	0	43,858

So what’s the final verdict? Here you go:

The firewall costs are roughly the same in all options, slightly less for AVNM (due to the larger blocks and to the mesh between the spokes).
The inter-block traffic cost makes option 3 (indirect spokes) unattractive, although it has operational advantages over options 1 and 2.
AVNM costs are also significant. They can be justified by the automation that AVNM offers, since it greatly reduces the operational cost of managing the network. Additionally, AVNM brings additional value such as UDR management and security admin rules, which need to be factored in the comparison.
Traditional Virtual WAN is cheaper than hub-and-spoke, mostly due to the lower cost of the spoke VNet peerings. Besides, it has lower administrative overhead than the customer-managed hub-and spoke, so I would say it is the clear winner for this senario.

Other scenarios

As I mentioned at the top of the post, this analysis greatly depends on the input parameters. What about if there is absolutely no VNet-to-VNet traffic in the design, as opposed to the 100 MB of traffic between the spokes I used earlier? Here is what you get:

The cost difference between the first three options almost disappears. The choice between them would probably come down to the complexity, functionality and operational overhead.

You might be asking yourself, what if I don’t have 2 regions, but 6? Let’s try that, leaving the rest of the parameters unchanged (3,000 VNets, no V2V traffic, 1GB spoke-to-onprem per spoke, 10GB onprem-to-spoke per spoke):

Option 2 (Virtual WAN) still in the (pricing) lead!

What if you significantly increase the traffic to/from onprem, for example because you have bandwidth-intensive private endpoints? We can increase those parameters from 1GB/10GB to 10GB/100GB and go back to 2 regions, with no significant relative difference across the options, but all of them getting more expensive in absolute terms:

You can see that the pricing for each option has raised again, but Virtual WAN stays as the most cost-effective choice.

So far we haven’t seen any scenario where AVNM comes out cheaper. The full mesh between spokes that AVNM provides helps to keep spoke-to-spoke traffic outside of the firewall, so AVNM will be cheaper when there is a lot of it. If we try 1800 spokes spread over 2 regions with 10GB of monthly traffic between every pair of spokes, this is what we get:

Wrapping up

Congratulations for getting down here, this was a long post! The learning lesson for me out of this exercise is that you cannot ignore your traffic patterns when comparing different networking designs in Azure. Feel free to use the spreadsheet to simulate your own traffic, and if you happen to find a bug or you have an improvement suggestion, please let me know!

Disclaimer: price is not everything!

The design options

Cost analysis with 3,000 spokes

Other scenarios

Wrapping up

Share this:

Related

Leave a comment Cancel reply