Amongst the many Ignite announcements this year, my favourite is the new Azure Route Server, in public preview now, since it has the potential to dramatically change how networks are built in Azure. If you are thinking “here he comes with his BGP thing again”… You are right! Let me explain:
In public cloud there are typically two ways of doing things: the “managed” way, and the “DIY” (Do-It-Yourself) way. The managed way is architected to simplify operations, while in the DIY way you are willing to take on some complexity that you trade for a higer flexibility. While I am a fan of the KISS principle, I have seen many customers going the DIY way in networking, because keeping all options open is of paramount importance for them in this area.
So far, if you wanted to go the DIY way with Azure Networking there were some serious limitations. For example, if you wanted to deploy your own firewall, you needed to rely on everybody configuring User-Defined Routes to send traffic to that firewall. Or if you deployed your own VPN appliance, you had no way of injecting your site-to-site or point-to-site prefixes into the Virtual Network, again relying on static UDRs. Let me go over some of those use cases a bit more in detail, and how the Azure Route Server will help here.
Say you deploy the firewall of your vendor of choice. How are you going to attract traffic to it? You need to configure UDRs in all of your subnets. Let me repeat that: all of your subnets. If you forget one subnet, or if somebody overwrites your UDRs, chances are some traffic unexpectedly bypasses your firewall.
Enter Route Server: your firewall appliance might advertise some routes to the Route Server (even a 0.0.0.0/0), and those routes would be plumbed into each and every subnet of both the VNet where the Route Server is deployed (typically your hub VNet), as well as in the directly-connected spokes. So no chance for accidental misconfigurations! (although you might couple that with some policies/RBAC configs to prevent people from adding UDRs, since those would override routes learnt from the Route Server).
Easier Active/Passive NVA Clusters
There are essentially two ways of building an NVA cluster in Azure: either using Azure Load Balancers, or UDRs. The UDR way is mostly used in active/passive clusters, where you have an UDR pointing to the active NVA. “Something” would monitor the active NVA, and change that UDR to point to the passive NVA if need be.
There are two main drawbacks of this design: what should be that “something” that monitors the availability of the primary NVA? You could have external components (which should be redundant too), or an internal agent in the secondary NVA, but that introduces additional complexity. The second problem is the time it takes to detect the failure of the primary NVA, send the Azure API call to change the UDR, and have the change propagated to the platform, which often result in convergence times over two minutes.
But now we can influence the VNet routing without UDRs! The primary NVA could send a preferred route to the Azure Route Server, and the secondary NVA a worse route for the same prefixes, for example spiced with some AS-path-prepending. In a normal scenario the primary NVA route would attract the traffic, but when that route disappears, the secondary route kicks in in a matter of seconds, with no agents or API calls involved.
VPN to ExpressRoute Transit Routing
This has been a limitation in many situations, where for example you want P2S users connected over VPN on Azure to get access to corporate resources over ExpressRoute. This functionality is available in Virtual WAN, but to my knowledge it is not available when using standalone Azure Virtual Network Gateways for VPN.
With the Azure Route Server you could terminate those P2S tunnels in the appliance of your favorite vendor, and announce those prefixes to the ExpressRoute gateway. The VPN NVA would learn the ExpressRoute prefixes from the Route Server as well, effectively providing transit routing between VPN and ExpressRoute.
In the paragraph above I have used the example of P2S because I see it quite often, the same would be valid for S2S tunnels though, where you have some branches connected over VPN, some over ExpressRoute, and you need branch-to-branch connectivity.
No More Static Routes in your VPN NVA
Another configuration piece which needed to be manually mantained is the static routing in VPN NVAs, let me explain. When you have a Site-To-Site VPN, the Azure side oftens advertises the Azure prefixes to the on-premises side. Up to now there was no way for the Azure side to “learn” which prefixes belong to Azure, so this was a static configuration. Whenever a new spoke was added to the setup, somebody had to go and add a static route to the NVA.
However, now that VPN NVA can learn all Azure VNet prefixes (both hubs and spokes) from the Azure Route Server dynamically, so no need for any manual interactions when spokes are added, changed or removed. For example when adding spokes, the VPN NVA will learn the new route and advertise it to the on-premises side automatically.
No More Dummy VNets for Indirect Spokes
If you know what I am talking about, you might already be clapping. The problem here is that ExpressRoute gateways will only advertise to onprem the prefixes of the VNet where those gateways are, as well of those of direcly peered VNets. If you have VNets more than one peering hop away, today there is no easy way of advertising those prefixes.
One trick is creating “dummy VNets” peered to the hub with the prefixes you want to advertise, and UDRs in the GatewaySubnet to make sure that traffic is not black-holed to those dummy VNets. This is quite a nice example of “duct tape networking”, and not something I would do in production.
But guess what: you could have an NVA appliance in your hub that advertises the prefixes of the indirect spokes to ExpressRoute via the Azure Route Server. You still need UDRs in your indirect spokes, but getting rid of those pesky dummy VNets is quite a nice thing.
Example: Multi-Region NVA Designs
An example of some of the use cases described above are multi-region designs. I am really curious to see how people are going to use it, but here is a variant I think would be feasible without too much effort:
Please note that the overall diagram is surperficial on purpose, things like ExpressRoute Global Reach or Site-to-Site tunnels and not represented to keep it “simple” (it already has a lot going on).
So what do you think? Do you have any specific case I did not cover? Any experience with Azure Route Server already?