You want to use AS-path as your virtual hub routing preference

Wow, that was a long title. Let me give you another one: if you haven’t tested your High Availability (HA) or Disaster Recovery (DR) plans, you shouldn’t rely on them. This is of course regardless of whether your infrastructure runs on your premises, on public cloud, or anywhere else.

In this post I am going to describe a redundancy configuration that in principle should work, and yet it does not. A customer using it did test it during their project implementation phase, what allowed them to introduced a correction to the configuration (configuring Virtual WAN hub routing preference to use AS-path lengths). Otherwise they would have found this out during an actual outage. Not the best time to learn new things about Azure.

What are we talking about here?

Mostly about this statement:

For any reason, if the VPN connection becomes the primary medium for the virtual hub to learn routes from (e.g failover scenarios between ExpressRoute and VPN), unless the VPN site has a longer AS Path length, the virtual hub will continue to share VPN learned routes with the ExpressRoute gateway. This causes the Microsoft Edge routers to prefer VPN routes over on-premises routes.

Virtual WAN FAQ

But let’s go step by step. We are talking about ExpressRoute. It is one of the ways you can connect to your workloads deployed in Azure Virtual Networks over private lines, which gives you guaranteed bandwidth as compared to other types of hybrid connectivity such as site-to-site IPsec tunnels over the public Internet.

Each ExpressRoute “circuit” (that is the name of the service unit for ExpressRoute) consists of two “connections”, a primary and a secondary one (more to this in Designing for high availability with ExpressRoute). Each of those connections are terminated in a separate physical router at Microsoft’s side (and ideally at the customer’s side too, although that is your own business). So you could say “hey, if there are two connections, that is already redundant enough, so I don’t need extra redundancy, right?”.

As usual, it depends on your requirements. If this connection is not critical, that might be fine. If on the other hand the ExpressRoute connectivity is critical, meaning that there would be substantial financial damage to your organization if those two connections would go down at the same time, for example if the whole ExpressRoute location (not owned by Microsoft) went off, you do want to plan for the worst case.

There are two primary ways of getting redundancy:

  • You could get a second ExpressRoute circuit on a different ExpressRoute location: this would be very interesting if your on-premises infrastructure gravitates around two different locations as well, and in this case you could load balance traffic across both of those ExpressRoute circuits.
  • Or if this is only for the worst case scenario, you could just configure IPsec over the Internet and make it so that this tunnel only kicks in when both ExpressRoute connections in your circuit are down.

In this post I will describe how you need to set the Hub Routing Preference of Virtual WAN to “ASPath” so that redundancy between ExpressRoute and VPN works correctly. Otherwise, failover from ExpressRoute to VPN will work just fine in the unlikely case of a full ExpressRoute outage, but connectivity will not fail back to ExpressRoute once it is restored.

What does it look like?

If you are using Virtual WAN in Azure, this is what the topology might look like (or at least this is what my testbed looks like):

The MSEE (Microsoft Enterprise Edge) device represents the pair of routers where your ExpressRoute connections would be terminated. They are not in an Azure Region, but in an ExpressRoute location co-located in facilities owned by data center providers such as Equinix or Interxion (more information about ExpressRoute locations can be found in ExpressRoute peering locations and connectivity partners).

In Virtual WAN you can have a look at the effective routes in the route table, to make sure that everything is working as expected. You can use the Azure portal or any other way you prefer. In my case, I will use Azure CLI:

❯ az network vhub get-effective-routes --resource-type RouteTable --resource-id $hub1_default_rt_id -g $rg -n hub1 --query 'value[].{Prefix:addressPrefixes[0],ASPath:asPath,NextHopType:nextHopType,NextHop:nextHops[0],Origin:route.Origin}
' -o table | awk '{ gsub(/\/subscriptions\/'$subscription_id'\/resourceGroups\/'$rg'\/providers\/Microsoft.Network\//,""); print }'
Prefix          ASPath              NextHopType                 NextHop                                                              Origin
--------------  ------------------  --------------------------  -------------------------------------------------------------------  ---------------------------------------------------------------
172.16.0.0/24   12076-133937-16550  ExpressRouteGateway         expressRouteGateways/hub1ergw                                        expressRouteGateways/hub1ergw
10.1.1.0/24                         Virtual Network Connection  virtualHubs/hub1/hubVirtualNetworkConnections/spoke11-westcentralus  virtualHubs/hub1/hubVirtualNetworkConnections/spoke11-westcentralus
10.1.2.0/24                         Virtual Network Connection  virtualHubs/hub1/hubVirtualNetworkConnections/spoke12                virtualHubs/hub1/hubVirtualNetworkConnections/spoke12

One remark here is that if your virtual hub is secured, the effective routes in the default route table will not tell you much, since both the ExpressRoute and VPN connections will not be propagating to it. In that case, you want to check the routes in your ExpressRoute circuit directly.

The AS Path you are seeing in the previous output for the route from ExpressRoute contains the 12076 (corresponding to the Azure MSEE routers), 133937 (owned by Megaport, I am using their Megaport Cloud Router to connect to ExpressRoute), and 16550 (which belongs to Google, since I am “faking” my on-premises environment in a Google VPC).

You can see how the 172.16.0.0/24 route is preferred over ExpressRoute gateway. This is because the hub’s routing preference is configured to always prefer the ExpressRoute gateway if the same route comes from multiple sources:


❯ az network vhub list -g $rg -o table
AddressPrefix    AllowBranchToBranchTraffic    HubRoutingPreference    Location       Name    PreferredRoutingGateway    ProvisioningState    ResourceGroup    RoutingState    Sku       VirtualRouterAsn
---------------  ----------------------------  ----------------------  -------------  ------  -------------------------  -------------------  ---------------  --------------  --------  ------------------
192.168.0.0/23   False                         ExpressRoute            westcentralus  hub1    ExpressRoute               Succeeded            vwan             Provisioned               65515

More to the Hub Routing Preference later though.

Failover

Alright, let’s bring down that ExpressRoute connection. You can do this in multiple ways, but I just disable the BGP connection to the on-premises routers. The ExpressRoute route should disappear, and the VPN one should kick in. Let’s inspect the effective routes in Virtual WAN again:

❯ az network vhub get-effective-routes --resource-type RouteTable --resource-id $hub1_default_rt_id -g $rg -n hub1 --query 'value[].{Prefix:addressPrefixes[0],ASPath:asPath,NextHopType:nextHopType,NextHop:nextHops[0],Origin:routeOrigin}
' -o table | awk '{ gsub(/\/subscriptions\/'$subscription_id'\/resourceGroups\/'$rg'\/providers\/Microsoft.Network\//,""); print }'
Prefix          ASPath             NextHopType                 NextHop                                                             Origin
--------------  -----------------  --------------------------  ------------------------------------------------------------------  -------------------------------------------------------------------
172.16.0.0/24   65501              VPN_S2S_Gateway             vpnGateways/hubvpn1                                                  vpnGateways/hubvpn1
10.1.1.0/24                        Virtual Network Connection  virtualHubs/hub1/hubVirtualNetworkConnections/spoke11-westcentralus  virtualHubs/hub1/hubVirtualNetworkConnections/spoke11-westcentralus
10.1.2.0/24                        Virtual Network Connection  virtualHubs/hub1/hubVirtualNetworkConnections/spoke12-westcentralus  virtualHubs/hub1/hubVirtualNetworkConnections/spoke12-westcentralus

Boom, the VPN route took over, fantastic! As a minor detail, you see that the on-premises VPN device is using the ASN 65001 to interact with BGP. Before you ask: no, I haven’t tested this with static routing on the VPN side, that would be a nice one to catch up with when I get time (a scarce resource).

Failback

Let’s now restore the ExpressRoute connectivity. If you wait a few minutes… and then a few minutes more… Eventually you will get tired of waiting, because that VPN route is not going anywhere: failback to to ExpressRoute is broken, and the routes with a healthy ExpressRoute will look exactly as they did in the previous section during the failover.

Give me some AS-Path preference

We will repeat the test, but this time we will change the hub routing preference to “ASPath”. This will tell Virtual WAN not to prefer ExpressRoute over VPN routes (or vice versa) blindly, but just to select the route with the shortest AS path length (as any other BGP router would normally do).

Note that for backwards compatibility reason, this is not the default in Virtual WAN, but it is definitely what I would recommend you to use. Our virtual hub is now configured for AS-Path routing preference:

❯ az network vhub list -g $rg -o table
AddressPrefix    AllowBranchToBranchTraffic    HubRoutingPreference    Location       Name    PreferredRoutingGateway    ProvisioningState    ResourceGroup    RoutingState    Sku       VirtualRouterAsn
---------------  ----------------------------  ----------------------  -------------  ------  -------------------------  -------------------  ---------------  --------------  --------  ------------------
192.168.0.0/23   False                         ASPath                  westcentralus  hub1    ExpressRoute               Succeeded            vwan             Provisioned               65515

I have modified my VPN on-premises device to prepend its routes, to make sure that they are not preferred to ExpressRoute during normal operations. Hence, before breaking ExpressRoute, everything will look exactly the same. However, when ExpressRoute connectivity disappears, we will see the VPN route with the prepending:

❯ az network vhub get-effective-routes --resource-type RouteTable --resource-id $hub1_default_rt_id -g $rg -n hub1 --query 'value[].{Prefix:addressPrefixes[0],ASPath:asPath,NextHopType:nextHopType,NextHop:nextHops[0],Origin:routeOrigin}
' -o table | awk '{ gsub(/\/subscriptions\/'$subscription_id'\/resourceGroups\/'$rg'\/providers\/Microsoft.Network\//,""); print }'
Prefix          ASPath             NextHopType                 NextHop                                                              Origin
--------------  -----------------  --------------------------  -------------------------------------------------------------------  -------------------------------------------------------------------
172.16.0.0/24   65501-65501-65501  VPN_S2S_Gateway             vpnGateways/hubvpn1                                                  vpnGateways/hubvpn1
10.1.1.0/24                        Virtual Network Connection  virtualHubs/hub1/hubVirtualNetworkConnections/spoke11-westcentralus  virtualHubs/hub1/hubVirtualNetworkConnections/spoke11-westcentralus
10.1.2.0/24                        Virtual Network Connection  virtualHubs/hub1/hubVirtualNetworkConnections/spoke12                virtualHubs/hub1/hubVirtualNetworkConnections/spoke12

Now we bring back ExpressRoute, and sure enough, the route from ExpressRoute takes over again:

❯ az network vhub get-effective-routes --resource-type RouteTable --resource-id $hub1_default_rt_id -g $rg -n hub1 --query 'value[].{Prefix:addressPrefixes[0],ASPath:asPath,NextHopType:nextHopType,NextHop:nextHops[0],Origin:routeOrigin}
' -o table | awk '{ gsub(/\/subscriptions\/'$subscription_id'\/resourceGroups\/'$rg'\/providers\/Microsoft.Network\//,""); print }'
Prefix          ASPath              NextHopType                 NextHop                                                              Origin
--------------  ------------------  --------------------------  -------------------------------------------------------------------  -----------------------------------------------------------------
172.16.0.0/24   12076-133937-16550  ExpressRouteGateway         expressRouteGateways/hub1ergw                                        expressRouteGateways/hub1ergw
10.1.1.0/24                         Virtual Network Connection  virtualHubs/hub1/hubVirtualNetworkConnections/spoke11-westcentralus  virtualHubs/hub1/hubVirtualNetworkConnections/spoke11-westcentralus
10.1.2.0/24                         Virtual Network Connection  virtualHubs/hub1/hubVirtualNetworkConnections/spoke12                virtualHubs/hub1/hubVirtualNetworkConnections/spoke12

Conclusion

Well, there are three of them here:

  • Whatever you do, be sure to test your resiliency mechanism on a regular basis.
  • If you happen to use Azure Virtual WAN, I would suggest you configure your hubs to use ASPath routing preference.
  • If you have a similar design but using Azure Route Server with branch-to-branch connectivity enabled instead of Virtual WAN, make sure you test your configuration.

2 thoughts on “You want to use AS-path as your virtual hub routing preference

  1. Omoruyi

    Great read!! Thanks for testing this out.

    Like

    1. Glad you liked it!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: