ExpressRoute Global Reach under the covers

After some questions in my previous blog post CLI-based analysis of an ExpressRoute private peering I decided to write an addition that includes what Expressroute Global Reach looks like for the CLI lover. In essence, Global Reach allows to use Microsoft’s backbone network for onprem-to-onprem communication. But how does it do it exactly?

I have setup the following test bed with three locations (US, Europe and Australia, from left to right):

Test bed with three locations

In essence, Global Reach will allow the three on-premises sites (ASNs 65001, 65002 and 65003) to communicate to each other, and to see how routes are being propagated I am injecting a 1.2.3.4/32 test route from one of the sites.

Spoiler alert: there are some things I am not sure I understand fully about what is going on between the routers, so chances are that this post lets you with some open questions. My apologies for that in advance!

Setup before Global Reach

Before setting up Global Reach, every MSEE router has three BGP adjacencies: one with the onprem router (the Megaport Cloud Router in the test bed), and two with the connected ExpressRoute Gateway on ASN 65515. For example, for the Sydney circuit (peered to onprem ASN 65003):

$ az network express-route list-route-tables-summary -g $rg -n $er_circuit_name --path primary --peering-name AzurePrivatePeering --query value -o table

Neighbor       V    AsProperty    UpDown    StatePfxRcd
-------------  ---  ------------  --------  -------------
169.254.36.41  4    65003         11:09:15  3
192.168.3.12   4    65515         00:04:26  1
192.168.3.13   4    65515         00:04:27  1

And the learnt routes will be the local VNet in the same region, the transit networks to the onprem routers, and whatever the onprem routers advertise (none in this case):

$ az network express-route list-route-tables -g $rg --peering-name AzurePrivatePeering --query value -o table --path primary -n er-Sydney

Network           NextHop        LocPrf    Weight    Path
----------------  -------------  --------  --------  -----------
169.254.36.44/30  169.254.36.41            0         65003 ?
192.168.3.0       192.168.3.12             0         65515
192.168.3.0       192.168.3.13*            0         65515
192.168.3.0       169.254.36.41            0         65003 12076

I am not to sure of the last entry means: it seems to be the VNet route (192.168.3.0/24) advertised to onprem and reflected back, but the MSEE’s primary connection seems to learn it (even if it actually shouldn’t, as far as I remember BGP’s loop prevention mechanisms). Interestingly enough this doesn’t happen in the secondary connection:

$ az network express-route list-route-tables -g $rg --peering-name AzurePrivatePeering --query value -o table --path secondary -n er-Sydney

Network           NextHop        LocPrf    Weight    Path
----------------  -------------  --------  --------  -------
169.254.36.40/30  169.254.36.45            0         65003 ?
192.168.3.0       192.168.3.13             0         65515
192.168.3.0       192.168.3.12*            0         65515

I think this is due to the fact that the two MSEE routers (primary and secondary) do not talk BGP between each other, so when only one of them knows one route, under certain circumstances the other one will learn that route reflected from the onprem routers. But let’s ignore this tidbit of knowledge here, since it is not our main concern for this blog.

In Frankfurt we will have similar tables, only with the additional route 1.2.3.4/32 that is being advertised from onprem on the primary connection (and not the secondary, because of a limitation of my test bed):

$ az network express-route list-route-tables -g $rg --peering-name AzurePrivatePeering --query value -o table --path primary -n er-Frankfurt

Network            NextHop         LocPrf    Weight    Path
-----------------  --------------  --------  --------  -----------
1.2.3.4/32         169.254.33.209            0         65001 ?
169.254.33.212/30  169.254.33.209            0         65001 ?
192.168.1.0        192.168.1.13              0         65515
192.168.1.0        192.168.1.12*             0         65515
192.168.1.0        169.254.33.209            0         65001 12076

$ az network express-route list-route-tables -g $rg --peering-name AzurePrivatePeering --query value -o table --path secondary -n er-Frankfurt

Network            NextHop         LocPrf    Weight    Path
-----------------  --------------  --------  --------  -------
169.254.33.208/30  169.254.33.213            0         65001 ?
192.168.1.0        192.168.1.12              0         65515
192.168.1.0        192.168.1.13*             0         65515

Global Reach connects edge routers in two locations over BGP

After having stood up the individual regions with their respective circuits, now I will create the Global Reach connection between the two circuits with the IP prefix 172.16.21.0/29 for the link IP addresses. This prefix is where the individual IP addresses for the 2 new BGP adjacencies will be taken. I use this little script:

circuit1_name=er-Frankfurt
circuit2_name=er-Sydney
circuit2_id=$(az network express-route show -n $circuit2_name -g $rg -o tsv --query id)
ip_range=172.16.31.0/29
az network express-route peering connection create -g $rg --circuit-name $circuit1_name --peering-name AzurePrivatePeering \
    -n "${circuit1_name}-to-${circuit2_name}" --peer-circuit $circuit2_id --address-prefix $ip_range

The Global Reach link shows up as a connection attached to the private peering, as expected:

$ az network express-route peering connection list -g $rg -o table --peering-name AzurePrivatePeering --circuit-name $circuit1_name

AddressPrefix    CircuitConnectionStatus    Name                       ProvisioningState    ResourceGroup
---------------  -------------------------  -------------------------  -------------------  ---------------
172.16.31.0/29   Connected                  er-Frankfurt-to-er-Sydney  Succeeded            ertest

This is what the BGP neighbor table looks like in the Frankfurt primary edge router now, which includes the iBGP peering to Sydney (with an IP address out of the supplied range 172.16.31.0/29) on ASN 12076:

❯ az network express-route list-route-tables-summary -g $rg --peering-name AzurePrivatePeering --query value -o table --path primary -n er-Frankfurt

Neighbor        V    AsProperty    UpDown    StatePfxRcd
--------------  ---  ------------  --------  -------------
169.254.33.209  4    65001         1d01h     4
172.16.31.3     4    12076         00:02:53  3
192.168.1.12    4    65515         1d00h     1
192.168.1.13    4    65515         1d00h     1

And similarly, the secondary circuit in Frankfurt is equally connected to the secondary router in Sydney:

$ az network express-route list-route-tables-summary -g $rg --peering-name AzurePrivatePeering --query value -o table --path secondary -n er-Frankfurt

This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
Neighbor        V    AsProperty    UpDown    StatePfxRcd
--------------  ---  ------------  --------  -------------
169.254.33.213  4    65001         1d01h     5
172.16.31.4     4    12076         00:06:20  4
192.168.1.12    4    65515         1d00h     1
192.168.1.13    4    65515         1d00h     1

Of course, the Sydney routers see the other side of the BGP peerings:

$ az network express-route list-route-tables-summary -g $rg --peering-name AzurePrivatePeering --query value -o table --path primary -n er-Sydney

Neighbor       V    AsProperty    UpDown    StatePfxRcd
-------------  ---  ------------  --------  -------------
169.254.36.41  4    65003         11:56:47  3
172.16.31.1    4    12076         00:08:56  4
192.168.3.12   4    65515         00:51:58  1
192.168.3.13   4    65515         00:51:59  1

$ az network express-route list-route-tables-summary -g $rg --peering-name AzurePrivatePeering --query value -o table --path secondary -n er-Sydney

Neighbor       V    AsProperty    UpDown    StatePfxRcd
-------------  ---  ------------  --------  -------------
169.254.36.45  4    65003         11:57:20  6
172.16.31.2    4    12076         00:09:23  3
192.168.3.12   4    65515         00:52:39  1
192.168.3.13   4    65515         00:52:38  1

As you can see, out of the 172.16.31.0/29 range four IP addresses have been configured in the routers: .1 and .2 in Frankfurt (the first circuit in the command that created the Global Reach connection), and .3 and .4 in Sydney (the second circuit).

The edge routers exchange routes over BGP

We can have a look at the prefixes learnt by the Frankfurt primary router, and you can see there are three new routes: the transfer networks in Sydney ER circuit, plus the VNet connected to Sydney private peering (192.168.3.0/24). The prefixes coming over Global Reach are easily recognized, since they are marked with a local preference (either 10 for routes learnt over BGP or the default 100 for routes local to the advertising routers):

$ az network express-route list-route-tables -g $rg --peering-name AzurePrivatePeering --query value -o table --path primary -n er-Sydney

Network            NextHop        LocPrf    Weight    Path
-----------------  -------------  --------  --------  -----------
1.2.3.4/32         172.16.31.1    10        0         65001 ?
169.254.33.208/30  172.16.31.1    100       0         65001 ?
169.254.33.212/30  172.16.31.1    100       0         65001 ?
169.254.36.44/30   169.254.36.41            0         65003 ?
192.168.1.0        172.16.31.1    10        0         65515
192.168.3.0        192.168.3.12             0         65515
192.168.3.0        192.168.3.13*            0         65515
192.168.3.0        169.254.36.41            0         65003 12076

And these are the ones in the secondary router in Sydney. Note that the 1.2.3.4/32 prefix is there, but without a local preference. And coming from onprem (65003)! I think this is again the reflection phenomenon we saw earlier, happening when only one side of the MSEEs knows a specific prefix, so again I will ask you to ignore that bit:

$ az network express-route list-route-tables -g $rg --peering-name AzurePrivatePeering --query value -o table --path secondary -n er-Sydney

Network            NextHop        LocPrf    Weight    Path
-----------------  -------------  --------  --------  -------------
1.2.3.4/32         169.254.36.45            0         65003 12076 ?
169.254.33.208/30  172.16.31.2    100       0         65001 ?
169.254.33.208/30  169.254.36.45            0         65003 12076 ?
169.254.33.212/30  172.16.31.2    100       0         65001 ?
169.254.33.212/30  169.254.36.45            0         65003 12076 ?
169.254.36.40/30   169.254.36.45            0         65003 ?
192.168.1.0        172.16.31.2    10        0         65515
192.168.1.0        169.254.36.45            0         65003 12076
192.168.3.0        192.168.3.13             0         65515
192.168.3.0        192.168.3.12*            0         65515

Local Preference 10 is used for remote networks

We already mentioned that the routes sent over the Global Reach BGP peering have a Local Preference, either 10 or 100:

  • Local routes to the remote router (the /30 transit networks) are marked with the local preference 100
  • Routes learnt over BGP by the remote router, such as VNets propagated from the ExpressRoute gateways (192.168.1.0/24 in Frankfurt) or onprem prefixes (1.2.3.4/32), are advertised with local preference 10 over Global Reach connections

If you don’t remember what Local Preference does, you can have a quick look at Cisco’s doc BGP Best Path Selection Algotithm (I used to have a printout of that page hang on my office wall, many moons ago when I worked as network admin). In essence, it is the second attribute to be checked (just after the weight, and way before the AS path length), the highest preference wins, and its default value is 100.

Let’s finish the peerings and connect Frankfurt to Dallas, and Dallas to Sydney, so that we complete the triangle. Let’s have a look again at Sydney’s primary connection:

$ az network express-route list-route-tables -g $rg --peering-name AzurePrivatePeering --query value -o table --path primary -n er-Sydney

Network            NextHop        LocPrf    Weight    Path
-----------------  -------------  --------  --------  -----------
1.2.3.4/32         172.16.31.1    10        0         65001 ?
169.254.33.208/30  172.16.31.1    100       0         65001 ?
169.254.33.212/30  172.16.31.1    100       0         65001 ?
169.254.34.128/30  172.16.31.19   100       0         65001 ?
169.254.34.132/30  172.16.31.19   100       0         65001 ?
169.254.36.44/30   169.254.36.41            0         65003 ?
192.168.1.0        172.16.31.1    10        0         65515
192.168.2.0        172.16.31.19   10        0         65515
192.168.3.0        192.168.3.12             0         65515
192.168.3.0        192.168.3.13*            0         65515
192.168.3.0        169.254.36.41            0         65003 12076

You can see that the Dallas network 192.168.2.0/24 is now in Sydney too. Again, the routes learnt over Global Reach (except the /30 transit networks) will be marked with Local Preference 10, which is worse than the default of 100 of locally-learnt BGP routes.

Let’s verify by advertising the 1.2.3.4/32 as well from Sydney. Both routes will appear now in the BGP route table, but the default local preference of 100 will force the router in Sydney to prefer the locally advertised route from ASN 65003 to the remote route:

$ az network express-route list-route-tables -g $rg --peering-name AzurePrivatePeering --query value -o table --path primary -n er-Sydney

Network            NextHop        LocPrf    Weight    Path
-----------------  -------------  --------  --------  -----------
1.2.3.4/32         169.254.36.41            0         65003 ?
1.2.3.4/32         172.16.31.1    10        0         65001 ?
169.254.33.208/30  172.16.31.1    100       0         65001 ?
169.254.33.212/30  172.16.31.1    100       0         65001 ?
169.254.34.128/30  172.16.31.19   100       0         65001 ?
169.254.34.132/30  172.16.31.19   100       0         65001 ?
169.254.36.44/30   169.254.36.41            0         65003 ?
192.168.1.0        172.16.31.1    10        0         65515
192.168.2.0        172.16.31.19   10        0         65515
192.168.3.0        192.168.3.12             0         65515
192.168.3.0        192.168.3.13*            0         65515
192.168.3.0        169.254.36.41            0         65003 12076

There are some weird AS paths I am not sure I understand there (like why would the remote 1.2.3.4/32 route only contain “65001”, and not “65001 12076”?), but I will leave that for a later post.

Conclusion

This wast probably my longest post about the simplest technology: Global Reach is just a regular iBGP adjacency between MSEE routers, that enbles onprem-to-onprem traffic.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: