In a previous blog post I have described the features of the new Azure Route Server I am most excited about, as well as a possible setup to create a hub and spoke design with firewall NVAs (Network Virtual Appliance) across multiple regions here. In this one I will focus on how to integrate the topology with ExpressRoute, and how NVAs can advertise and learn routes with ExpressRoute.
For this purpose I have enhanced the multi-region design with an ExpressRoute private peering connecting the Azure environment to an on-premises network, represented by the prefix 1.2.3.0/24.

The first thing to do is to enable the Route Server to interact with the ExpressRoute gateway, which you can do by setting the “allow branch to branch traffic” flag. In our case, I already did it:
az network routeserver show -n $hub1_rs_name -g $rg --query allowBranchToBranchTraffic
true
First, let’s remember that our Azure Route Server is actually comprised of two different instance, each with its own IP address:
az network routeserver show -n $hub1_rs_name -g $rg --query virtualRouterIps -o tsv
10.1.0.4
10.1.0.5
Now that we know the Route Server’s IP addresses, when you show the BGP neighbors of the ExpressRoute gateways you will recognize them. By the way, the 10.1.3.0/24 is the GatewaySubnet gateway, so the 10.1.3.4 and 10.1.3.5 are the actual ExpressRoute gateway addresses:
az network vnet-gateway list-bgp-peer-status -n er1 -g $rg -o table
Neighbor ASN State ConnectedDuration RoutesReceived MessagesSent MessagesReceived
---------- ----- --------- ------------------- ---------------- -------------- ------------------
10.1.3.4 12076 Connected 01:43:11.4004661 2 139 144
10.1.3.5 12076 Connected 01:42:45.9564815 6 138 144
10.1.0.4 65515 Connected 01:43:19.0347253 4 350 144
10.1.0.5 65515 Connected 01:43:18.9878490 4 471 144
Note that the Azure Route Server will not show the additional BGP peerings with the ExpressRoute gateways. In our case, it is only showing the peering configured to the NVA, but nothing about the ExpressRoute gateways:
az network routeserver peering list --vrouter-name hub1rs -g $rg -o table
Name PeerAsn PeerIp ProvisioningState ResourceGroup
------- --------- -------- ------------------- ---------------
hub1nva 65001 10.1.1.4 Succeeded routeserver
To refresh, let’s see the prefixes that the Route Server is learning from the NVA: it is two summary routes, one per each hub. 10.1.0.0/16 contains the prefixes for hub1 and it is originated by the NVA in hub1 (ASN 65001), and 10.2.0.0/16 is originated by the NVAs in hub2 (ASN 65002):
az network routeserver peering list-learned-routes -n hub1nva --vrouter-name $hub1_rs_name -g $rg --query 'RouteServiceRole_IN_0' -o table
LocalAddress Network NextHop SourcePeer Origin AsPath Weight
-------------- ----------- --------- ------------ -------- ----------- --------
10.1.0.4 10.1.0.0/16 10.1.1.4 10.1.1.4 EBgp 65001 32768
10.1.0.4 10.2.0.0/16 10.1.1.4 10.1.1.4 EBgp 65001-65002 32768
And sure enough, both routes are forwarded by the Route Server to the gateways, and appear in the ExpressRoute gateway BGP table. Note that the ASN of the Route Server (65515) is not visible in the AS path of these routes, which is a bit weird. We will live for that for the time being:
az network vnet-gateway list-learned-routes -n er1 -g $rg -o table Network Origin SourcePeer AsPath Weight NextHop --------------- -------- ------------ ------------ -------- --------- 10.1.0.0/20 Network 10.1.3.13 32768 10.1.0.0/20 IBgp 10.1.0.4 32768 10.1.0.4 10.1.0.0/20 IBgp 10.1.0.5 32768 10.1.0.5 10.1.16.0/24 IBgp 10.1.0.4 32768 10.1.0.4 [0/1653] 10.1.16.0/24 IBgp 10.1.0.5 32768 10.1.0.5 10.1.17.0/24 Network 10.1.3.13 32768 10.1.17.0/24 IBgp 10.1.0.4 32768 10.1.0.4 10.1.17.0/24 IBgp 10.1.0.5 32768 10.1.0.5 169.254.1.88/30 EBgp 10.1.3.4 12076-133937 32779 10.1.3.4 169.254.1.88/30 EBgp 10.1.3.5 12076-133937 32779 10.1.3.5 10.2.0.0/16 IBgp 10.1.0.4 65001-65002 32768 10.1.1.4 10.2.0.0/16 IBgp 10.1.0.5 65001-65002 32768 10.1.1.4 10.1.0.0/16 IBgp 10.1.0.4 65001 32768 10.1.1.4 10.1.0.0/16 IBgp 10.1.0.5 65001 32768 10.1.1.4 169.254.1.92/30 EBgp 10.1.3.4 12076-133937 32779 10.1.3.4 169.254.1.92/30 EBgp 10.1.3.5 12076-133937 32779 10.1.3.5 1.2.3.0/24 EBgp 10.1.3.5 12076-133937 32779 10.1.3.5 10.1.0.0/20 EBgp 10.1.3.4 12076-12076 32779 10.1.3.12 10.1.16.0/24 EBgp 10.1.3.4 12076-12076 32779 10.1.3.12 10.1.17.0/24 EBgp 10.1.3.4 12076-12076 32779 10.1.3.12 1.2.3.0/24 IBgp 10.1.0.4 12076-133937 32768 10.1.3.5 1.2.3.0/24 IBgp 10.1.0.5 12076-133937 32768 10.1.3.5
This is great! We have announced the 10.2.0.0/16 route to the ExpressRoute gateway, to provide connectivity to indirect spokes, what before having the Route Server was very difficult to do.
Now if you look at other routes in the 10.1.x.x range, you can see that the VNet prefixes of hub1 are already advertised by the Route Server to the ExpressRoute gateway (10.1.0.0/20, 10.1.16.0/24 and 10.1.17.0/24), so we would not have to advertise the 10.1.0.0/16 summary. However we cannot just remove it from NVA1, because that is the summary installed in the effective routes of the spoke virtual machines.
So how to advertise a route to the Route Server to be plumbed in the spokes, but not be advertised to ExpressRoute gateways? We can use well-known BGP communities, such as “no-advertise”. When tagging routes with this community, we are informing the receiving router to not advertise them to its eBGP or iBGP speakers. “No-advertise” is just a friendly name, since BGP communities are numeric. The number for the “no-advertise” BGP community is 65535:65282. The following configuration snippet shows how the NVA in hub1 has been configured to mark the 10.1.0.0/16 route with the “no-advertise” community when sending it to the Azure Route Server:
filter TO_RS {
# Drop long prefixes
if ( net ~ [ 0.0.0.0/0{30,32} ] ) then { reject; }
# Do not export to ER/VPN
if ( net = 10.1.0.0/16 ) then {
bgp_community.add((65535,65282));
accept;
}
# Rest of routes
else accept;
}
And now you can see that the 10.1.0.0/16 summary is not in the ExpressRoute gateway BGP table any more.
az network vnet-gateway list-learned-routes -n er1 -g $rg -o table
Network Origin SourcePeer AsPath Weight NextHop
--------------- -------- ------------ ------------ -------- ---------
10.1.0.0/20 Network 10.1.3.12 32768
10.1.0.0/20 IBgp 10.1.0.4 32768 10.1.0.4
10.1.0.0/20 IBgp 10.1.0.5 32768 10.1.0.5
10.1.16.0/24 Network 10.1.3.12 32768
10.1.16.0/24 IBgp 10.1.0.4 32768 10.1.0.4
10.1.16.0/24 IBgp 10.1.0.5 32768 10.1.0.5
10.1.17.0/24 Network 10.1.3.12 32768
10.1.17.0/24 IBgp 10.1.0.4 32768 10.1.0.4
10.1.17.0/24 IBgp 10.1.0.5 32768 10.1.0.5
169.254.1.88/30 EBgp 10.1.3.4 12076-133937 32779 10.1.3.4
169.254.1.88/30 EBgp 10.1.3.5 12076-133937 32779 10.1.3.5
10.2.0.0/16 IBgp 10.1.0.4 65001-65002 32768 10.1.1.4
10.2.0.0/16 IBgp 10.1.0.5 65001-65002 32768 10.1.1.4
169.254.1.92/30 EBgp 10.1.3.4 12076-133937 32779 10.1.3.4
169.254.1.92/30 EBgp 10.1.3.5 12076-133937 32779 10.1.3.5
1.2.3.0/24 EBgp 10.1.3.5 12076-133937 32779 10.1.3.5
10.1.0.0/20 EBgp 10.1.3.5 12076-12076 32779 10.1.3.13
10.1.16.0/24 EBgp 10.1.3.5 12076-12076 32779 10.1.3.13
10.1.17.0/24 EBgp 10.1.3.5 12076-12076 32779 10.1.3.13
We have explored the prefixes that ExpressRoute gateways learn from NVAs, and how to control them. What about the opposite direction? Here the list of prefixes that the ExpressRoute gateways advertise to one of the Route Server instances:
az network vnet-gateway list-advertised-routes -n er1 -g $rg --peer 10.1.0.4 -o table
Network NextHop Origin AsPath Weight
--------------- --------- ---------- ------------ --------
10.1.0.0/20 10.1.3.13 Igp 0
10.1.16.0/24 10.1.3.13 Igp 0
10.1.17.0/24 10.1.3.13 Igp 0
169.254.1.88/30 10.1.3.4 Incomplete 12076-133937 0
169.254.1.92/30 10.1.3.4 Incomplete 12076-133937 0
1.2.3.0/24 10.1.3.5 Incomplete 12076-133937 0
Other than the 10.1.x.x prefixes (which are the local prefixes of the hub, and hence useless to the Azure Route Server because it already knows it), we have other three prefixes: the /30 routes are for the transit subnets between MSEE and the customer, and as we will see they are not plumbed into the effective routes. And then we have our 1.2.3.0/24 subnet, which is our on-premises network.
If we now look the routes that are advertised from the Azure Route Server to the NVA, we do see the onprem network 1.2.3.0/24, but not the /30 transit subnets, as it should be:
az network routeserver peering list-advertised-routes -n hub1nva --vrouter-name hub1rs -g $rg --query 'RouteServiceRole_IN_0' -o table
LocalAddress Network NextHop Origin AsPath Weight
-------------- ------------ --------- ---------- ------------------ --------
10.1.0.4 10.1.0.0/20 10.1.0.4 Igp 65515 0
10.1.0.4 10.1.16.0/24 10.1.0.4 Igp 65515 0
10.1.0.4 10.1.17.0/24 10.1.0.4 Igp 65515 0
10.1.0.4 1.2.3.0/24 10.1.0.4 Incomplete 65515-12076-133937 0
In the NVA we can have a deeper look at the route. Other than the BGP community 65517:65517 (which I don’t know what it means), you see the AS path as having the correct AS path, and the Route Server’s IP as next hop:
bird> show route protocol rs0 all
10.1.0.0/20 via 10.1.1.1 on eth0 [rs0 23:20:43 from 10.1.0.4] * (100/?) [AS65515i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path: 65515
BGP.next_hop: 10.1.0.4
BGP.local_pref: 100
1.2.3.0/24 via 10.1.1.1 on eth0 [rs0 23:20:43 from 10.1.0.4] * (100/?) [AS133937?]
Type: BGP unicast univ
BGP.origin: Incomplete
BGP.as_path: 65515 12076 133937
BGP.next_hop: 10.1.0.4
BGP.local_pref: 100
BGP.community: (65517,65517)
10.1.16.0/24 via 10.1.1.1 on eth0 [rs0 23:20:43 from 10.1.0.4] * (100/?) [AS65515i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path: 65515
BGP.next_hop: 10.1.0.4
BGP.local_pref: 100
10.1.17.0/24 via 10.1.1.1 on eth0 [rs0 23:20:43 from 10.1.0.4] * (100/?) [AS65515i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path: 65515
BGP.next_hop: 10.1.0.4
BGP.local_pref: 100
Note that the Route Server is plumbing the 1.2.3.0/24 in the effective routes of the NICs in the directly peered spokes, such as spoke11. But the way, the next hop (10.2.146.35 in this case) is an IP address in Microsoft’s IP address space, nothing you will find in your VNet:
az network nic show-effective-route-table --ids $spoke11_vm_nic_id -o table
Source State Address Prefix Next Hop Type Next Hop IP
--------------------- ------- ---------------- --------------------- -------------
Default Active 10.1.16.0/24 VnetLocal
Default Active 10.1.0.0/20 VNetPeering
VirtualNetworkGateway Active 10.2.0.0/16 VirtualNetworkGateway 10.1.1.4
VirtualNetworkGateway Active 10.1.0.0/16 VirtualNetworkGateway 10.1.1.4
VirtualNetworkGateway Active 1.2.3.0/24 VirtualNetworkGateway 10.2.146.35
Default Active 0.0.0.0/0 Internet
Default Active 10.0.0.0/8 None
Default Active 100.64.0.0/10 None
Default Active 192.168.0.0/16 None
Default Active 25.33.80.0/20 None
Default Active 25.41.3.0/25 None
If we shift our focus to hub2, we can see that our NVAs there are learning the on-premises prefixes from the NVA in hub1:
bird> show route protocol hub1
10.1.0.0/16 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]
10.1.0.0/20 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65515i]
10.1.0.5/32 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]
1.2.3.0/24 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS133937?]
10.1.16.0/24 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65515i]
10.1.0.4/32 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]
10.1.17.0/24 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65515i]
192.168.0.2/32 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]
192.168.0.6/32 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]
But if we look at the effective routes in spoke21, we do not find the 1.2.3.0/24 prefix anywhere:
az network nic show-effective-route-table --ids $spoke21_vm_nic_id -o table
Source State Address Prefix Next Hop Type Next Hop IP
--------------------- ------- ---------------- --------------------- -------------
Default Active 10.2.16.0/24 VnetLocal
Default Active 10.2.0.0/20 VNetPeering
VirtualNetworkGateway Active 10.2.0.0/16 VirtualNetworkGateway 10.2.1.4
VirtualNetworkGateway Active 10.1.0.0/16 VirtualNetworkGateway 10.2.1.4
Default Active 0.0.0.0/0 Internet
Default Active 10.0.0.0/8 None
Default Active 100.64.0.0/10 None
Default Active 192.168.0.0/16 None
Default Active 25.33.80.0/20 None
Default Active 25.41.3.0/25 None
If we look at the routes that the Route Server in hub2 is learning from the NVA, 1.2.3.0/24 is not there either!
az network routeserver peering list-learned-routes -n hub2nva --vrouter-name $hub2_rs_name -g $rg --query 'RouteServiceRole_IN_0' -o table
LocalAddress Network NextHop SourcePeer Origin AsPath Weight
-------------- ----------- --------- ------------ -------- ----------- --------
10.2.0.4 10.2.0.0/16 10.2.1.4 10.2.1.4 EBgp 65002 32768
10.2.0.4 10.1.0.0/16 10.2.1.4 10.2.1.4 EBgp 65002-65001 32768
Why? Let’s have a closer look at the 1.2.3.0/24 prefix in the NVA2 in hub2:
bird> show route 1.2.3.0/24 all
1.2.3.0/24 via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS133937?]
Type: BGP unicast univ
BGP.origin: Incomplete
BGP.as_path: 65001 65515 12076 133937
BGP.next_hop: 192.168.0.1
BGP.local_pref: 100
BGP.community: (65517,65517)
Since this route came from the Route Server in hub1, it includes the ASN 65515 in the list. But guess what, 65515 is the ASN of the Route Server in hub2, so it will drop it following BGP loop prevention mechanism. If we want the Route Server to learn this prefix, we should remove 65515 from the AS path. In bird (what we are using in the Linux-based NVAs for BGP) there is an easy way to remove certain ASNs from the AS path:
filter TO_RS {
# Drop long prefixes
if ( net ~ [ 0.0.0.0/0{30,32} ] ) then { reject; }
if ( net = 1.2.3.0/24 ) then {
bgp_path.delete(65515);
accept;
}
else accept;
}
Now, the Route Server does learn the 1.2.3.0/24 route from the NVA:
az network routeserver peering list-learned-routes -n hub2nva --vrouter-name $hub2_rs_name -g $rg --query 'RouteServiceRole_IN_0' -o table
LocalAddress Network NextHop SourcePeer Origin AsPath Weight
-------------- ----------- --------- ------------ -------- ------------------------ --------
10.2.0.4 10.2.0.0/16 10.2.1.4 10.2.1.4 EBgp 65002 32768
10.2.0.4 10.1.0.0/16 10.2.1.4 10.2.1.4 EBgp 65002-65001 32768
10.2.0.4 1.2.3.0/24 10.2.1.4 10.2.1.4 EBgp 65002-65001-12076-133937 32768
And the route gets injected into the spoke’s effective routes, with the next hop being the NVA in hub2:
az network nic show-effective-route-table --ids $spoke21_vm_nic_id -o table
Source State Address Prefix Next Hop Type Next Hop IP
--------------------- ------- ---------------- --------------------- -------------
Default Active 10.2.16.0/24 VnetLocal
Default Active 10.2.0.0/20 VNetPeering
VirtualNetworkGateway Active 10.2.0.0/16 VirtualNetworkGateway 10.2.1.4
VirtualNetworkGateway Active 10.1.0.0/16 VirtualNetworkGateway 10.2.1.4
VirtualNetworkGateway Active 1.2.3.0/24 VirtualNetworkGateway 10.2.1.4
Default Active 0.0.0.0/0 Internet
Default Active 10.0.0.0/8 None
Default Active 100.64.0.0/10 None
Default Active 192.168.0.0/16 None
Default Active 25.33.80.0/20 None
Default Active 25.41.3.0/25 None
Summary
In this blog we explored how routes are advertised from NVAs to ExpressRoute to provide connectivity to remote spokes for example, and even how to prevent some of those routes to be advertised with the “no-advertise” community.
We also saw how routes from ExpressRoute gateways can be transported via BGP to indirect spokes (spokes not directly connected to the VNet where the ExpressRoute gateway is), for which removing the 65515 ASN from the AS path is going to be required.
Hi José, thanks for your posts, they are very useful and thay are very well explained. I’ve a question for you: you mention the new way for announcing bgp routes to on-prem to indirect spokes, what was difficult before, you mentioned in your last article the “dummy vnet trick”, but: is there any other officially and supported solution? (from Azure side) or the unique way is modifying on-prem routers?
Thanks again!
LikeLike
Thanks, glad you enjoy the posts! The only possibility I know with customer-managed vnets outside of the Route Server is the “dummy vnet” trick. Static routing in the onprem routers will not work, because the MSEE will not know the indirect spoke prefixes. Virtual WAN does support a topology with indirect spokes, since they have something very similar to the Route Server in the Microsoft-managed hubs.
LikeLike
BTW, for such a detailed discussion, the best would probably be opening a FastTrack engagement: https://azure.microsoft.com/en-us/programs/azure-fasttrack/#overview
LikeLike
thank you for your clarification!
LikeLike
[…] In some situations customers will combine the role of VPN termination and firewalling in the same NVA. However, I haven’t seen this pattern very often, since achieving an active/active high availibility design in that scenario can be quite challenging. Hence I will not cover it in this post, but it should be similar to the design described in Azure Route Server multi-region design and Connecting your NVAs to ExpressRoute with Azure Route Server. […]
LikeLike
[…] time ago I posted a blog commenting on a possible design for interconnecting multiple Azure regions by means of Network […]
LikeLike
Hi Jose! I assume this design requires the VNG to have BGP enabled, correct? I mean, it seems obvious, but wanted to be sure.
LikeLike
Sorry, dumb question. This is an expressroute VNG, not VPN.
LikeLike
Ha ha yes, ER and BGP is like peanut butter and jelly: always together 🙂
LikeLike
Hi Jose,
Great and deep article on Azure Route Server (ARS).
I have a quite similar use case on my side, whit a multi region network containing each one a hub (Azure LB+3rd Party NVA). Each region is connected with the On-Premise network (similar to your initial post 2021-03-03).
For the initial stage of the project, I don’t require ARS. I used to declare each Spokes CIDR in a dedicated route table and attached to the ER GW subnet.
Now, I have to consider the complete failure of one the Hub (especially the NVA) and still have communications from on-premise to the spokes (failed hub) and between regions spokes.
Questions:
1- The design you presented can be applied in case of NVA (Hub1) failure? i.e. the spokes in Hub1 can still communicate with the spokes in Hub2? What will be the network path in this case?
2- If yes, I have also to deal with the costing, so, avoiding using additional resources, such as ARS. Do you know if we can have an alternative of using it?
3a- In your article dating from the 2021-08-19 you mentioned the way to use Azure Route Server without overlay (not possible to match my scenario as I’m limited in NIC modifications on my NVA). I have another use case, is it possible (still with ARS) to don’t have to activate BGP on NVA (not the case in my current architecture – only static route on NVA)?
3b- Can we have an alternative of Global VNet peering between Hubs like VPN IPSec? And also find a solution for the NVA configures without BGP routing?
4- Last question, what about a design with both Azure regions connected to on-premise network with a unique ER circuit? can this scenario simplify the networking configurations and avoid the ARS (still my concern :)))?
It is shame that I cannot share my diagrams which will be much better to understand the alternatives I’m trying to present.
Thanks again for your articles.
Regards
Fatih
LikeLike
Wow those are a lot of good questions!
On 1) you could dual home each spoke to both hubs. The question here is how you configure routing. UDRs are probably not an option because too static, so you could configure a 2nd ARS in each region to inject the routes.
2) I dont see how… You cannot use the dummy VNet trick, since it would overlap with the spoke prefixes 😦 An exception is if you dont have ExpressRoute but only S2S VPN, then you can get away with static routing
3a) see above…
3b) sure, that is always an option, but it is typically associated with performance degradation (latency and/or bandwidth)
4) You would still want a fallback to cover the outage of a whole ER location, such as an IPsec VPN. So at first sight, I don’t see how it would reduce complexity in a significant way…
LikeLike