Connecting your NVAs to ExpressRoute with Azure Route Server

In a previous blog post I have described the features of the new Azure Route Server I am most excited about, as well as a possible setup to create a hub and spoke design with firewall NVAs (Network Virtual Appliance) across multiple regions here. In this one I will focus on how to integrate the topology with ExpressRoute, and how NVAs can advertise and learn routes with ExpressRoute.

For this purpose I have enhanced the multi-region design with an ExpressRoute private peering connecting the Azure environment to an on-premises network, represented by the prefix 1.2.3.0/24.

Topology with ExpressRoute, Route Server and NVAs across 2 regions

The first thing to do is to enable the Route Server to interact with the ExpressRoute gateway, which you can do by setting the “allow branch to branch traffic” flag. In our case, I already did it:

az network routeserver show -n $hub1_rs_name -g $rg --query allowBranchToBranchTraffic
true

First, let’s remember that our Azure Route Server is actually comprised of two different instance, each with its own IP address:

az network routeserver show -n $hub1_rs_name -g $rg --query virtualRouterIps -o tsv
10.1.0.4
10.1.0.5

Now that we know the Route Server’s IP addresses, when you show the BGP neighbors of the ExpressRoute gateways you will recognize them. By the way, the 10.1.3.0/24 is the GatewaySubnet gateway, so the 10.1.3.4 and 10.1.3.5 are the actual ExpressRoute gateway addresses:

az network vnet-gateway list-bgp-peer-status -n er1 -g $rg -o table
Neighbor    ASN    State      ConnectedDuration    RoutesReceived    MessagesSent    MessagesReceived
----------  -----  ---------  -------------------  ----------------  --------------  ------------------
10.1.3.4    12076  Connected  01:43:11.4004661     2                 139             144
10.1.3.5    12076  Connected  01:42:45.9564815     6                 138             144
10.1.0.4    65515  Connected  01:43:19.0347253     4                 350             144
10.1.0.5    65515  Connected  01:43:18.9878490     4                 471             144

Note that the Azure Route Server will not show the additional BGP peerings with the ExpressRoute gateways. In our case, it is only showing the peering configured to the NVA, but nothing about the ExpressRoute gateways:

az network routeserver peering list --vrouter-name hub1rs -g $rg -o table
Name     PeerAsn    PeerIp    ProvisioningState    ResourceGroup
-------  ---------  --------  -------------------  ---------------
hub1nva  65001      10.1.1.4  Succeeded            routeserver

To refresh, let’s see the prefixes that the Route Server is learning from the NVA: it is two summary routes, one per each hub. 10.1.0.0/16 contains the prefixes for hub1 and it is originated by the NVA in hub1 (ASN 65001), and 10.2.0.0/16 is originated by the NVAs in hub2 (ASN 65002):

az network routeserver peering list-learned-routes -n hub1nva --vrouter-name $hub1_rs_name -g $rg --query 'RouteServiceRole_IN_0' -o table
LocalAddress    Network      NextHop    SourcePeer    Origin    AsPath       Weight
--------------  -----------  ---------  ------------  --------  -----------  --------
10.1.0.4        10.1.0.0/16  10.1.1.4   10.1.1.4      EBgp      65001        32768
10.1.0.4        10.2.0.0/16  10.1.1.4   10.1.1.4      EBgp      65001-65002  32768

And sure enough, both routes are forwarded by the Route Server to the gateways, and appear in the ExpressRoute gateway BGP table. Note that the ASN of the Route Server (65515) is not visible in the AS path of these routes, which is a bit weird. We will live for that for the time being:

az network vnet-gateway list-learned-routes -n er1 -g $rg -o table                                                                                                                                    Network          Origin    SourcePeer    AsPath        Weight    NextHop                                                                                                                                ---------------  --------  ------------  ------------  --------  ---------                                                                                                                              10.1.0.0/20      Network   10.1.3.13                   32768                                                                                                                                            10.1.0.0/20      IBgp      10.1.0.4                    32768     10.1.0.4                                                                                                                               10.1.0.0/20      IBgp      10.1.0.5                    32768     10.1.0.5    
10.1.16.0/24     IBgp      10.1.0.4                    32768     10.1.0.4                                                                                                                       [0/1653]
10.1.16.0/24     IBgp      10.1.0.5                    32768     10.1.0.5
10.1.17.0/24     Network   10.1.3.13                   32768
10.1.17.0/24     IBgp      10.1.0.4                    32768     10.1.0.4
10.1.17.0/24     IBgp      10.1.0.5                    32768     10.1.0.5
169.254.1.88/30  EBgp      10.1.3.4      12076-133937  32779     10.1.3.4
169.254.1.88/30  EBgp      10.1.3.5      12076-133937  32779     10.1.3.5
10.2.0.0/16      IBgp      10.1.0.4      65001-65002   32768     10.1.1.4
10.2.0.0/16      IBgp      10.1.0.5      65001-65002   32768     10.1.1.4
10.1.0.0/16      IBgp      10.1.0.4      65001         32768     10.1.1.4
10.1.0.0/16      IBgp      10.1.0.5      65001         32768     10.1.1.4
169.254.1.92/30  EBgp      10.1.3.4      12076-133937  32779     10.1.3.4
169.254.1.92/30  EBgp      10.1.3.5      12076-133937  32779     10.1.3.5
1.2.3.0/24       EBgp      10.1.3.5      12076-133937  32779     10.1.3.5
10.1.0.0/20      EBgp      10.1.3.4      12076-12076   32779     10.1.3.12
10.1.16.0/24     EBgp      10.1.3.4      12076-12076   32779     10.1.3.12
10.1.17.0/24     EBgp      10.1.3.4      12076-12076   32779     10.1.3.12
1.2.3.0/24       IBgp      10.1.0.4      12076-133937  32768     10.1.3.5
1.2.3.0/24       IBgp      10.1.0.5      12076-133937  32768     10.1.3.5

This is great! We have announced the 10.2.0.0/16 route to the ExpressRoute gateway, to provide connectivity to indirect spokes, what before having the Route Server was very difficult to do.

Now if you look at other routes in the 10.1.x.x range, you can see that the VNet prefixes of hub1 are already advertised by the Route Server to the ExpressRoute gateway (10.1.0.0/20, 10.1.16.0/24 and 10.1.17.0/24), so we would not have to advertise the 10.1.0.0/16 summary. However we cannot just remove it from NVA1, because that is the summary installed in the effective routes of the spoke virtual machines.

So how to advertise a route to the Route Server to be plumbed in the spokes, but not be advertised to ExpressRoute gateways? We can use well-known BGP communities, such as “no-advertise”. When tagging routes with this community, we are informing the receiving router to not advertise them to its eBGP or iBGP speakers. “No-advertise” is just a friendly name, since BGP communities are numeric. The number for the “no-advertise” BGP community is 65535:65282. The following configuration snippet shows how the NVA in hub1 has been configured to mark the 10.1.0.0/16 route with the “no-advertise” community when sending it to the Azure Route Server:

filter TO_RS {
      # Drop long prefixes
      if ( net ~ [ 0.0.0.0/0{30,32} ] ) then { reject; }
      # Do not export to ER/VPN
      if ( net = 10.1.0.0/16 ) then {
           bgp_community.add((65535,65282));
           accept;
      }
      # Rest of routes
      else accept;
}

And now you can see that the 10.1.0.0/16 summary is not in the ExpressRoute gateway BGP table any more.

az network vnet-gateway list-learned-routes -n er1 -g $rg -o table
Network          Origin    SourcePeer    AsPath        Weight    NextHop
---------------  --------  ------------  ------------  --------  ---------
10.1.0.0/20      Network   10.1.3.12                   32768
10.1.0.0/20      IBgp      10.1.0.4                    32768     10.1.0.4
10.1.0.0/20      IBgp      10.1.0.5                    32768     10.1.0.5
10.1.16.0/24     Network   10.1.3.12                   32768
10.1.16.0/24     IBgp      10.1.0.4                    32768     10.1.0.4
10.1.16.0/24     IBgp      10.1.0.5                    32768     10.1.0.5
10.1.17.0/24     Network   10.1.3.12                   32768
10.1.17.0/24     IBgp      10.1.0.4                    32768     10.1.0.4
10.1.17.0/24     IBgp      10.1.0.5                    32768     10.1.0.5
169.254.1.88/30  EBgp      10.1.3.4      12076-133937  32779     10.1.3.4
169.254.1.88/30  EBgp      10.1.3.5      12076-133937  32779     10.1.3.5
10.2.0.0/16      IBgp      10.1.0.4      65001-65002   32768     10.1.1.4
10.2.0.0/16      IBgp      10.1.0.5      65001-65002   32768     10.1.1.4
169.254.1.92/30  EBgp      10.1.3.4      12076-133937  32779     10.1.3.4
169.254.1.92/30  EBgp      10.1.3.5      12076-133937  32779     10.1.3.5
1.2.3.0/24       EBgp      10.1.3.5      12076-133937  32779     10.1.3.5
10.1.0.0/20      EBgp      10.1.3.5      12076-12076   32779     10.1.3.13
10.1.16.0/24     EBgp      10.1.3.5      12076-12076   32779     10.1.3.13
10.1.17.0/24     EBgp      10.1.3.5      12076-12076   32779     10.1.3.13

We have explored the prefixes that ExpressRoute gateways learn from NVAs, and how to control them. What about the opposite direction? Here the list of prefixes that the ExpressRoute gateways advertise to one of the Route Server instances:

 az network vnet-gateway list-advertised-routes -n er1 -g $rg --peer 10.1.0.4 -o table
Network          NextHop    Origin      AsPath        Weight
---------------  ---------  ----------  ------------  --------
10.1.0.0/20      10.1.3.13  Igp                       0
10.1.16.0/24     10.1.3.13  Igp                       0
10.1.17.0/24     10.1.3.13  Igp                       0
169.254.1.88/30  10.1.3.4   Incomplete  12076-133937  0
169.254.1.92/30  10.1.3.4   Incomplete  12076-133937  0
1.2.3.0/24       10.1.3.5   Incomplete  12076-133937  0

Other than the 10.1.x.x prefixes (which are the local prefixes of the hub, and hence useless to the Azure Route Server because it already knows it), we have other three prefixes: the /30 routes are for the transit subnets between MSEE and the customer, and as we will see they are not plumbed into the effective routes. And then we have our 1.2.3.0/24 subnet, which is our on-premises network.

If we now look the routes that are advertised from the Azure Route Server to the NVA, we do see the onprem network 1.2.3.0/24, but not the /30 transit subnets, as it should be:

az network routeserver peering list-advertised-routes -n hub1nva --vrouter-name hub1rs -g $rg --query 'RouteServiceRole_IN_0' -o table
LocalAddress    Network       NextHop    Origin      AsPath              Weight
--------------  ------------  ---------  ----------  ------------------  --------
10.1.0.4        10.1.0.0/20   10.1.0.4   Igp         65515               0
10.1.0.4        10.1.16.0/24  10.1.0.4   Igp         65515               0
10.1.0.4        10.1.17.0/24  10.1.0.4   Igp         65515               0
10.1.0.4        1.2.3.0/24    10.1.0.4   Incomplete  65515-12076-133937  0

In the NVA we can have a deeper look at the route. Other than the BGP community 65517:65517 (which I don’t know what it means), you see the AS path as having the correct AS path, and the Route Server’s IP as next hop:

bird> show route protocol rs0 all
10.1.0.0/20        via 10.1.1.1 on eth0 [rs0 23:20:43 from 10.1.0.4] * (100/?) [AS65515i]
        Type: BGP unicast univ
        BGP.origin: IGP
        BGP.as_path: 65515
        BGP.next_hop: 10.1.0.4
        BGP.local_pref: 100
1.2.3.0/24         via 10.1.1.1 on eth0 [rs0 23:20:43 from 10.1.0.4] * (100/?) [AS133937?]
        Type: BGP unicast univ
        BGP.origin: Incomplete
        BGP.as_path: 65515 12076 133937
        BGP.next_hop: 10.1.0.4
        BGP.local_pref: 100
        BGP.community: (65517,65517)
10.1.16.0/24       via 10.1.1.1 on eth0 [rs0 23:20:43 from 10.1.0.4] * (100/?) [AS65515i]
        Type: BGP unicast univ
        BGP.origin: IGP
        BGP.as_path: 65515
        BGP.next_hop: 10.1.0.4
        BGP.local_pref: 100
10.1.17.0/24       via 10.1.1.1 on eth0 [rs0 23:20:43 from 10.1.0.4] * (100/?) [AS65515i]
        Type: BGP unicast univ
        BGP.origin: IGP
        BGP.as_path: 65515
        BGP.next_hop: 10.1.0.4
        BGP.local_pref: 100

Note that the Route Server is plumbing the 1.2.3.0/24 in the effective routes of the NICs in the directly peered spokes, such as spoke11. But the way, the next hop (10.2.146.35 in this case) is an IP address in Microsoft’s IP address space, nothing you will find in your VNet:

az network nic show-effective-route-table --ids $spoke11_vm_nic_id -o table
Source                 State    Address Prefix    Next Hop Type          Next Hop IP
---------------------  -------  ----------------  ---------------------  -------------
Default                Active   10.1.16.0/24      VnetLocal
Default                Active   10.1.0.0/20       VNetPeering
VirtualNetworkGateway  Active   10.2.0.0/16       VirtualNetworkGateway  10.1.1.4
VirtualNetworkGateway  Active   10.1.0.0/16       VirtualNetworkGateway  10.1.1.4
VirtualNetworkGateway  Active   1.2.3.0/24        VirtualNetworkGateway  10.2.146.35
Default                Active   0.0.0.0/0         Internet
Default                Active   10.0.0.0/8        None
Default                Active   100.64.0.0/10     None
Default                Active   192.168.0.0/16    None
Default                Active   25.33.80.0/20     None
Default                Active   25.41.3.0/25      None

If we shift our focus to hub2, we can see that our NVAs there are learning the on-premises prefixes from the NVA in hub1:

bird> show route protocol hub1
10.1.0.0/16        via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]
10.1.0.0/20        via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65515i]
10.1.0.5/32        via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]
1.2.3.0/24         via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS133937?]
10.1.16.0/24       via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65515i]
10.1.0.4/32        via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]
10.1.17.0/24       via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65515i]
192.168.0.2/32     via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]
192.168.0.6/32     via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS65001i]

But if we look at the effective routes in spoke21, we do not find the 1.2.3.0/24 prefix anywhere:

az network nic show-effective-route-table --ids $spoke21_vm_nic_id -o table
Source                 State    Address Prefix    Next Hop Type          Next Hop IP
---------------------  -------  ----------------  ---------------------  -------------
Default                Active   10.2.16.0/24      VnetLocal
Default                Active   10.2.0.0/20       VNetPeering
VirtualNetworkGateway  Active   10.2.0.0/16       VirtualNetworkGateway  10.2.1.4
VirtualNetworkGateway  Active   10.1.0.0/16       VirtualNetworkGateway  10.2.1.4
Default                Active   0.0.0.0/0         Internet
Default                Active   10.0.0.0/8        None
Default                Active   100.64.0.0/10     None
Default                Active   192.168.0.0/16    None
Default                Active   25.33.80.0/20     None
Default                Active   25.41.3.0/25      None

If we look at the routes that the Route Server in hub2 is learning from the NVA, 1.2.3.0/24 is not there either!

az network routeserver peering list-learned-routes -n hub2nva --vrouter-name $hub2_rs_name -g $rg --query 'RouteServiceRole_IN_0' -o table
LocalAddress    Network      NextHop    SourcePeer    Origin    AsPath       Weight
--------------  -----------  ---------  ------------  --------  -----------  --------
10.2.0.4        10.2.0.0/16  10.2.1.4   10.2.1.4      EBgp      65002        32768
10.2.0.4        10.1.0.0/16  10.2.1.4   10.2.1.4      EBgp      65002-65001  32768

Why? Let’s have a closer look at the 1.2.3.0/24 prefix in the NVA2 in hub2:

bird> show route 1.2.3.0/24 all
1.2.3.0/24         via 192.168.0.1 on vxlan0 [hub1 23:20:43] * (100/0) [AS133937?]
        Type: BGP unicast univ
        BGP.origin: Incomplete
        BGP.as_path: 65001 65515 12076 133937
        BGP.next_hop: 192.168.0.1
        BGP.local_pref: 100
        BGP.community: (65517,65517)

Since this route came from the Route Server in hub1, it includes the ASN 65515 in the list. But guess what, 65515 is the ASN of the Route Server in hub2, so it will drop it following BGP loop prevention mechanism. If we want the Route Server to learn this prefix, we should remove 65515 from the AS path. In bird (what we are using in the Linux-based NVAs for BGP) there is an easy way to remove certain ASNs from the AS path:

filter TO_RS {
      # Drop long prefixes
      if ( net ~ [ 0.0.0.0/0{30,32} ] ) then { reject; }
      if ( net = 1.2.3.0/24 ) then {
           bgp_path.delete(65515);
           accept;
      }
      else accept;
}

Now, the Route Server does learn the 1.2.3.0/24 route from the NVA:

 az network routeserver peering list-learned-routes -n hub2nva --vrouter-name $hub2_rs_name -g $rg --query 'RouteServiceRole_IN_0' -o table
LocalAddress    Network      NextHop    SourcePeer    Origin    AsPath                    Weight
--------------  -----------  ---------  ------------  --------  ------------------------  --------
10.2.0.4        10.2.0.0/16  10.2.1.4   10.2.1.4      EBgp      65002                     32768
10.2.0.4        10.1.0.0/16  10.2.1.4   10.2.1.4      EBgp      65002-65001               32768
10.2.0.4        1.2.3.0/24   10.2.1.4   10.2.1.4      EBgp      65002-65001-12076-133937  32768

And the route gets injected into the spoke’s effective routes, with the next hop being the NVA in hub2:

az network nic show-effective-route-table --ids $spoke21_vm_nic_id -o table
Source                 State    Address Prefix    Next Hop Type          Next Hop IP
---------------------  -------  ----------------  ---------------------  -------------
Default                Active   10.2.16.0/24      VnetLocal
Default                Active   10.2.0.0/20       VNetPeering
VirtualNetworkGateway  Active   10.2.0.0/16       VirtualNetworkGateway  10.2.1.4
VirtualNetworkGateway  Active   10.1.0.0/16       VirtualNetworkGateway  10.2.1.4
VirtualNetworkGateway  Active   1.2.3.0/24        VirtualNetworkGateway  10.2.1.4
Default                Active   0.0.0.0/0         Internet
Default                Active   10.0.0.0/8        None
Default                Active   100.64.0.0/10     None
Default                Active   192.168.0.0/16    None
Default                Active   25.33.80.0/20     None
Default                Active   25.41.3.0/25      None

Summary

In this blog we explored how routes are advertised from NVAs to ExpressRoute to provide connectivity to remote spokes for example, and even how to prevent some of those routes to be advertised with the “no-advertise” community.

We also saw how routes from ExpressRoute gateways can be transported via BGP to indirect spokes (spokes not directly connected to the VNet where the ExpressRoute gateway is), for which removing the 65515 ASN from the AS path is going to be required.

11 thoughts on “Connecting your NVAs to ExpressRoute with Azure Route Server

  1. Alberto

    Hi José, thanks for your posts, they are very useful and thay are very well explained. I’ve a question for you: you mention the new way for announcing bgp routes to on-prem to indirect spokes, what was difficult before, you mentioned in your last article the “dummy vnet trick”, but: is there any other officially and supported solution? (from Azure side) or the unique way is modifying on-prem routers?

    Thanks again!

    Like

    1. Thanks, glad you enjoy the posts! The only possibility I know with customer-managed vnets outside of the Route Server is the “dummy vnet” trick. Static routing in the onprem routers will not work, because the MSEE will not know the indirect spoke prefixes. Virtual WAN does support a topology with indirect spokes, since they have something very similar to the Route Server in the Microsoft-managed hubs.

      Like

      1. BTW, for such a detailed discussion, the best would probably be opening a FastTrack engagement: https://azure.microsoft.com/en-us/programs/azure-fasttrack/#overview

        Like

  2. Alberto

    thank you for your clarification!

    Like

  3. […] In some situations customers will combine the role of VPN termination and firewalling in the same NVA. However, I haven’t seen this pattern very often, since achieving an active/active high availibility design in that scenario can be quite challenging. Hence I will not cover it in this post, but it should be similar to the design described in Azure Route Server multi-region design and Connecting your NVAs to ExpressRoute with Azure Route Server. […]

    Like

  4. […] time ago I posted a blog commenting on a possible design for interconnecting multiple Azure regions by means of Network […]

    Like

  5. Kyle

    Hi Jose! I assume this design requires the VNG to have BGP enabled, correct? I mean, it seems obvious, but wanted to be sure.

    Like

    1. Kyle

      Sorry, dumb question. This is an expressroute VNG, not VPN.

      Like

      1. Ha ha yes, ER and BGP is like peanut butter and jelly: always together 🙂

        Like

  6. Fatih

    Hi Jose,

    Great and deep article on Azure Route Server (ARS).
    I have a quite similar use case on my side, whit a multi region network containing each one a hub (Azure LB+3rd Party NVA). Each region is connected with the On-Premise network (similar to your initial post 2021-03-03).
    For the initial stage of the project, I don’t require ARS. I used to declare each Spokes CIDR in a dedicated route table and attached to the ER GW subnet.
    Now, I have to consider the complete failure of one the Hub (especially the NVA) and still have communications from on-premise to the spokes (failed hub) and between regions spokes.

    Questions:
    1- The design you presented can be applied in case of NVA (Hub1) failure? i.e. the spokes in Hub1 can still communicate with the spokes in Hub2? What will be the network path in this case?
    2- If yes, I have also to deal with the costing, so, avoiding using additional resources, such as ARS. Do you know if we can have an alternative of using it?
    3a- In your article dating from the 2021-08-19 you mentioned the way to use Azure Route Server without overlay (not possible to match my scenario as I’m limited in NIC modifications on my NVA). I have another use case, is it possible (still with ARS) to don’t have to activate BGP on NVA (not the case in my current architecture – only static route on NVA)?
    3b- Can we have an alternative of Global VNet peering between Hubs like VPN IPSec? And also find a solution for the NVA configures without BGP routing?
    4- Last question, what about a design with both Azure regions connected to on-premise network with a unique ER circuit? can this scenario simplify the networking configurations and avoid the ARS (still my concern :)))?

    It is shame that I cannot share my diagrams which will be much better to understand the alternatives I’m trying to present.

    Thanks again for your articles.
    Regards
    Fatih

    Like

    1. Wow those are a lot of good questions!
      On 1) you could dual home each spoke to both hubs. The question here is how you configure routing. UDRs are probably not an option because too static, so you could configure a 2nd ARS in each region to inject the routes.
      2) I dont see how… You cannot use the dummy VNet trick, since it would overlap with the spoke prefixes 😦 An exception is if you dont have ExpressRoute but only S2S VPN, then you can get away with static routing
      3a) see above…
      3b) sure, that is always an option, but it is typically associated with performance degradation (latency and/or bandwidth)
      4) You would still want a fallback to cover the outage of a whole ER location, such as an IPsec VPN. So at first sight, I don’t see how it would reduce complexity in a significant way…

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: