I remember in the early days the Core/Distribution/Access model was the bible to design Local Area Networks. It provided a great way to “separate complexity from complexity”, one of my favorite architecture design principles, while at the same time it allowed to collapse different tiers together to save costs. And the best of all it is that it wasn’t specific to Cisco, it was a model you could apply when building networks with any hardware vendor. If you don’t know what I am talking about, here a picture cortesy of Cisco Press:
Alas along came Clos networks (apologies, I mean “fabrics”) and the need to push functionality to the edge for better scalability, with hypervisor-based overlays taking this approach to the extreme by even removing functionality from the traditional network, moving it into the server virtualization layer. Hence the Core/Distribution/Access model fell into oblivion for many folks, especially if you were focused on Data Center networks design.
Fast forward 5 years: public cloud networking is in! However, there is no common network architecture paradigm for public cloud networking. Not even for a single cloud provider! Every cloud network design is hand-crafted, and this process reminds me more to the pet than to the cattle model. It is difficult comparing network designs to each other, and I see many architects who are not aware of the trade-offs they make when they choose a certain design option over another one.
Would it be possible to apply a similar design pattern than the old Core/Distribution/Access design to public cloud networking architectures? Here we go!
Multi-cloud network design tiers
Obviously the public cloud network tiers need to be different, but they still need to have specific functions, following the principle “separate complexity from complexity”. This is the way I look at it:
- Access: this is where you connect your VMs/instances to VNets/VPCs. It provides fabric-based traffic segmentation (NACLs, NSGs) and private connectivity to managed services (Private Link).
- Regional Aggregation: it provides regional network-centric services such as firewalling, Internet access or DNS. Ideally it should summarize the routes of different VNets/VPCs to the next layer
- Core: the main functionality of this layer is providing inter-regional communication, and optionally networking slicing, for designs that require some level of multitenancy. This is the goal of services such as Azure Virtual WAN, AWS Cloud WAN or Google Network Connectivity Center
- Hybrid Aggregation: here is where you connect your onprem sites or other clouds, either via dedicated connections (ExpressRoute, Direct Connect, etc), VPN or 3rd-party appliances that will typically provide some sort of SD-WAN functionality.
Here is what the model looks like when applied to Azure and AWS (my AWS knowledge is quite limited, so I would appreciate any feedback about the diagram:
“Ah, this looks like an awful lot of boxes, my cloud vendor is trying to sell me as much stuff as it gets”. No, that is not it. The purpose of the model is to identify the different functional requirements that your network design needs to fulfill. If that is too many boxes, you can collapse tiers together, knowing that you are deviating from the principle “separate complexity from complexity”, and hence your design would be operationally more expensive. Long story short, you can collapse tiers trading cost by complexity.
For example, a relatively typical collapsed design is when the Core and Hybrid Aggregation are collapsed into the Regional Aggregation level. This is the design that we commonly know in Azure as the “Do-It-Yourself Hub and Spoke”, since you need to implement the functionality of the Core tier in your regional aggregation, hence making your design cheaper but more complex:
You could take your collapsing further, and merge everything into the Regional Aggregation layer. There are not so many benefits from this design, and rather disadvantages: at least in Azure, having a single VNet per region will mean that all your VMs are in the same subscription. However, it is perfectly possible following our networking design architecture model:
By the way, the fully-collapsed model in AWS wouldn’t impose the same restrictions than in Azure, since AWS has the concept of “Shared VPC” spanning multiple accounts. The corollary is that you still need to consider the particularities of your specific cloud when you do your design.
Collapsing into the Core
The Core services provided by the public cloud providers (the ones I am aware of are AWS Cloud WAN, Azure Virtual WAN and Google Network Connectivity Center) offer some network services, so it might be that if your requirements around this topic are not too sophisticated. In that case, you can collapse the Regional Aggregation layer into the core. For example, at the time of this writing Azure Virtual WAN doesn’t support multi-regional firewalling. Hence, if you don’t need firewalling in Azure (because you do network segmentation with NSGs), your design might look like this:
With time, these network core services like Virtual WAN will enhance their functionality, so core-collapsed designs will become more and more frequent, depending on the network services that are required in each project.
In AWS this design is probably the most obvious one when using Cloud WAN, since Cloud WAN is very much based on TGW technology. Hence, in the majority of Cloud WAN diagrams you will probably see a picture similar to this one:
This is the way I structure my network architecture sessions, and I thought it might help others. Network architecture is one of those topics where everybody has their own opinion, and everybody might be right at the same time, so I would really love your comments around this. Thanks for reading!