If you have ever used Azure, you probably have used one of these Virtual Network Gateways too: whether it is to connect your branches and headquarters with Azure via IPsec VPN or ExpressRoute, or to provide connectivity to your mobile workers or external partners through Point-to-Site VPNs. In this post I will go deep on how routing in VNGs works, especially when there is a firewall in the way.
You might already know the easy pattern of inserting an Azure Firewall in branch-to-VNet flows in hub and spoke networks, I will cover it in case you don’t. Additionally, I will cover how to insert the Azure Firewall in certain branch-to-branch flows, both VPN-to-VPN and VPN-to-ExpressRoute, and show you what NOT to do.
Starting point
This is the starting design of a journey: a basic hub and spoke architecture where ExpressRoute, P2S and S2S VPN connections have been deployed:

As you can see, there is a firewall deployed in the hub VNet, but the traffic flows bypass the firewall completely. The reason is because the VPN and ER gateways know how to reach the spokes (since they are directly peered), and the spokes know how to reach on-prem prefixes, since the gateways plumb their routes in the spoke VMs. For example, this is how the effective route table would look like in one of the spokes:
❯ az network nic show-effective-route-table -n spoke1-vmVMNic -g $rg -o table Source State Address Prefix Next Hop Type Next Hop IP --------------------- ------- ---------------- --------------------- ------------- Default Active 10.0.1.0/24 VnetLocal Default Active 192.168.0.0/24 VNetPeering VirtualNetworkGateway Active 172.19.128.0/17 VirtualNetworkGateway 192.168.0.15 VirtualNetworkGateway Active 172.19.0.0/17 VirtualNetworkGateway 192.168.0.14 VirtualNetworkGateway Active 172.17.0.0/16 VirtualNetworkGateway 10.3.129.72 VirtualNetworkGateway Active 172.18.0.0/16 VirtualNetworkGateway 192.168.0.14 VirtualNetworkGateway Active 172.18.0.0/16 VirtualNetworkGateway 192.168.0.15 Default Active 0.0.0.0/0 Internet
This is because the spokes have been peered with the hub with the combo settings “Allow Gateway Transit” and “Use Remote Gateways”. By the way, you might have noticed that the P2S range (172.19.0.0/16) has been automatically split in two, this is because the gateway is deployed in active/active mode, so each of the gateway instances takes half of the IP range allocated to P2S users.
Coming back to the VNet peerings, the following screenshots show the settings of the two unidirectional peerings between the hub and spoke1, and between spoke1 and the hub:

Get me that firewall in the way
To put the firewall in the packet flow we need to modify routing at both the gateways and the spokes. In Azure you modify routing by attaching route tables to the subnets, so this is what we will do:

What we are doing in the GatewaySubnet is overriding the system routes injected by the VNet peering, so that the next hop is the firewall, and in the spokes we just disable gateway propagation, so that our 0.0.0.0/0 static route is the only active route.
Now the effective routes in the spoke VMs look like this (notice that the routes coming from the Virtual Network Gateways are gone:
❯ az network nic show-effective-route-table -o table -g $rg -n spoke1-vmVMNic Source State Address Prefix Next Hop Type Next Hop IP -------- ------- ---------------- ---------------- ------------- Default Active 10.0.1.0/24 VnetLocal Default Active 192.168.0.0/24 VNetPeering User Active 93.104.181.60/32 Internet Default Invalid 0.0.0.0/0 Internet User Active 0.0.0.0/0 VirtualAppliance 192.168.0.68
We have a challenge in this scenario though, and that is that VPN users (both P2S and S2S) cannot access the ExpressRoute location. That is because the VPN and ExpressRoute don’t exchange BGP routes. For example, if we look at the route table in the ExpressRoute circuit, we will see that there are no VPN routes there:
❯ az network express-route list-route-tables -g $rg -n $er_circuit_name --path primary --peering-name AzurePrivatePeering --query value -o table This command is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus LocPrf Network NextHop Path Weight -------- ------------------ -------------- -------------- -------- 10.0.1.0/24 192.168.0.13 65515 0 10.0.1.0/24 192.168.0.12* 65515 0 10.0.2.0/24 192.168.0.13 65515 0 10.0.2.0/24 192.168.0.12* 65515 0 169.254.31.48/29 169.254.147.97 133937 ? 0 169.254.147.100/30 169.254.147.97 133937 ? 0 172.17.0.0 169.254.147.97 133937 16550 ? 0 192.168.0.0 192.168.0.13 65515 0 192.168.0.0 192.168.0.12* 65515 0
Azure Route Server to the rescue!
Well, if the VPN and ExpressRoute gateways won’t talk to each other, we can get a common friend of theirs: the Azure Route Server. Just by deploying an Azure Route Server into the VNet with the branch-to-branch setting on, it will act as route reflector between both gateways:

Let’s have a look now at the routing in the ExpressRoute circuit:
❯ az network express-route list-route-tables -g $rg -n $er_circuit_name --path primary --peering-name AzurePrivatePeering --query value -o table LocPrf Network NextHop Path Weight -------- ------------------ -------------- -------------- -------- 10.0.1.0/24 192.168.0.12 65515 0 10.0.1.0/24 192.168.0.13* 65515 0 10.0.2.0/24 192.168.0.12 65515 0 10.0.2.0/24 192.168.0.13* 65515 0 169.254.31.48/29 169.254.147.97 133937 ? 0 169.254.147.100/30 169.254.147.97 133937 ? 0 172.17.0.0 169.254.147.97 133937 16550 ? 0 172.18.0.0 169.254.147.97 133937 12076 0 172.18.0.0 192.168.0.12 65515 65111 0 172.18.0.0 192.168.0.13 65515 65111 0 172.19.0.0/17 169.254.147.97 133937 12076 0 172.19.0.0/17 192.168.0.12 65515 0 172.19.0.0/17 192.168.0.13* 65515 0 172.19.128.0/17 169.254.147.97 133937 12076 0 172.19.128.0/17 192.168.0.12 65515 0 172.19.128.0/17 192.168.0.13* 65515 0 192.168.0.0 192.168.0.12 65515 0 192.168.0.0 192.168.0.13* 65515 0
Magic, we now have all those VPN prefixes (172.18.0.0/16 and the two halves of 172.19.0.0/16). You might notice that for these prefixes there is a third one that is reflected by the on-premises router (with AS path 133937 12076). This is due to the fact that the on-premises router is not configured with eBGP multipath, and hence it gives back the prefixes on the non-preferred link. Fun fact: the ExpressRoute routers (called Microsoft Enterprise Edge or MSEEs) are configured with allow-as in
, meaning that they will take these advertisements although normally they wouldn’t, since their own ASN 12076 is present in the AS path.
But enough of routing details, now we have connectivity between VPN and onprem! However, you might notice that it doesn’t go through the firewall. How to fix this? Again route tables to the rescue!
Forcing VPN-to-ER through the firewall
This one is easier than it looks. The only thing that we need to do is to configure additional routes in our GatewaySubnet, like this:

Note that in the diagram I configured two route table icons, one next to each gateway. However, the same route table applied to the GatewaySubnet
will affect both the VPN and the ExpressRoute gateways.
When the VPN gateway sends any packet towards the ExpressRoute gateway, it will hit the UDR for 172.17.0.0/16
and that packet will be re-routed to the Azure Firewall. The Azure Firewall will then send it on to the ExpressRoute connection. For the return traffic the pattern is similar.
For example, here the firewall logs are showing connections between the S2S branch (172.18.0.68) and ExpressRoute (172.17.0.2), and between a P2S client (172.19.128.2) and ExpressRoute (172.17.0.2):
❯ az monitor log-analytics query -w $logws_customerid --analytics-query $query_netrule -o tsv PrimaryResult 2023-02-06T11:32:15.727592Z TCP request from 172.18.0.68:58004 to 172.17.0.2:22. Action: Allow. Policy: myazfw-policy. Rule Collection Group: ruleset01. Rule Collection: AllowTraffic. Rule: AllowAll PrimaryResult 2023-02-06T11:32:30.369918Z TCP request from 172.18.0.68:52450 to 172.17.0.2:22. Action: Allow. Policy: myazfw-policy. Rule Collection Group: ruleset01. Rule Collection: AllowTraffic. Rule: AllowAll PrimaryResult 2023-02-06T11:33:59.087885Z TCP request from 172.19.128.2:57864 to 172.17.0.2:22. Action: Allow. Policy: myazfw-policy. Rule Collection Group: ruleset01. Rule Collection: AllowTraffic. Rule: AllowAll
There is some additional control plane traffic between the VPN client (172.19.128.2) and the gateways (172.19.128.1 in this case) that will hit the firewall too, so you should make sure to allow that traffic in your rules:
PrimaryResult 2023-02-06T11:34:54.153586Z UDP request from 172.19.128.2:51729 to 172.19.128.1:443. Action: Allow. Policy: myazfw-policy. Rule Collection Group: ruleset01. Rule Collection: AllowTraffic. Rule: AllowAll
However, what if you wanted to filter traffic between the P2S users and the S2S branches? Since they are in the same gateway, Azure routing will not help, since those packets will be routed inside of the VPN gateway without even touching the VNet. We would need to have different gateways to terminate P2S and S2S connections, but unfortunately you can only have a single VPN gateway in any given VNet.
Private Endpoints might be an issue
Please be careful when implementing filtering between gateways in the same VNet, since it is not a well-documented design. For example, there might be issues with private endpoint traffic. With the above configuration and an Azure Blob Storage private endpoint in spoke 1 (IP address 10.0.1.68
), access to the private endpoint seems to be broken from ExpressRoute.
From my S2S branch everything seems to work fine: resolution of the storage account’s FQDN to the private IP address works fine:
jose@s2svm:~$ nslookup mystorageaccount.blob.core.windows.net Server: 127.0.0.53 Address: 127.0.0.53#53 Non-authoritative answer: Name: mystorageaccount.blob.core.windows.net Address: 10.0.1.68 jose@s2svm:~$ curl https://mystorageaccount.blob.core.windows.net/test/helloworld.txt? Hello world!
However, from a machine in ExpressRoute the exact same thing won’t work:
jose@vm:~$ nslookup mystorageaccount.blob.core.windows.net Server: 127.0.0.53 Address: 127.0.0.53#53 Non-authoritative answer: Name: mystorageaccount.blob.core.windows.net Address: 10.0.1.68 jose@vm:~$ curl https://mystorageaccount.blob.core.windows.net/test/helloworld.txt? curl: (28) Failed to connect to storage13498.blob.core.windows.net port 443: Connection timed out
I am not sure what is the difference between ER and VPN gateways here, but if you are using UDRs in the GatewaySubnet, make sure you test your private endpoint traffic. If you need to filter traffic between VPN and ER, and you need access to private endpoints from ER, there are a couple of options you might want to consider:
- Enable network policies in your private endpoint subnet. This will force the ER gateway (as well as the VPN gateway) to send traffic from on-premises through the firewall, which will eliminate the issue
- Deploy your VPN gateways and ER gateways in different VNets, so that you don’t need “weird” routes (meaning here routes that include the onprem ranges) in the GatewaySubnet. See the next section for that!
Filtering between P2S and S2S
The solution to the previous problem where we want to filter between two different VPN connections, or when the VPN gateway and the ER gateway should be in separate VNets, is creating another VPN gateway in a different VNet:

We need to inject the new IP pool for P2S users (172.16.0.0/16) into ExpressRoute, and the easiest way of doing that is using our Azure Route Server for that. An additional route in the ExpressRoute gateway will be required so that traffic goes to our firewall, plus some routes in the new VPN gateway.
You might notice that traffic runs asymmetrically here: while the left VPN gateway sends packets straight to the firewall, the firewall cannot use the VPN gateway as next hop. The reason is that the firewall already has gateways in its own VNet, so it won’t understand that there is a gateway in a peered VNet. Actually, if you tried to configure the VNet peering between the top VNets with those “Allow Gateway Transit” and “Use Remote Gateways” settings, you would get an error message of the like of “Sorry, I already have my own gateway, I cannot use a remote one”.
And sure enough, we can see logs in our firewall for traffic going between the P2S clients and both the S2S branch and ExpressRoute:
❯ az monitor log-analytics query -w $logws_customerid --analytics-query $query_netrule -o tsv | grep 172.16.0.2 | grep -v 443 PrimaryResult 2023-02-06T12:09:50.781928Z TCP request from 172.16.0.2:61220 to 172.18.0.68:22. Action: Allow. Policy: myazfw-policy. Rule Collection Group: ruleset01. Rule Collection: AllowTraffic. Rule: AllowAll PrimaryResult 2023-02-06T12:10:07.571775Z TCP request from 172.16.0.2:61233 to 172.17.0.2:22. Action: Allow. Policy: myazfw-policy. Rule Collection Group: ruleset01. Rule Collection: AllowTraffic. Rule: AllowAll
Hence, the traffic from the firewall to the 172.16.0.0/16 range needs to go first through the NVAs. You probably want to deploy at least two of them for redundancy. Whether you include a load balancer or not doesn’t really matter, since these NVAs are completely stateless and won’t bother about the asymmetric routing.
What does the VPN client see?
Azure VPN gateways have a very fancy feature that allows you to advertise custom routes to the P2S client. For example, in the left VPN gateway, in the P2S profile I have configured these routes:

When I connect with my OpenVPN client, I will see those routes too, which will be plumbed into my OS to get connectivity. Additionally you see 192.168.1.0/24
(the VNet range) and 172.16.0.0/16
(the P2S IP pool range):

Don’t do this!
Could we make the traffic symmetric though? If we really wanted, we could. We already discussed that the firewall doesn’t know that there is a VPN gateway at the other side, but what if we just tell it to forward it to the VPN gateway’s IP address?
The problem is that this IP address is not documented anywhere (if you try to use the BGP IP address it won’t work), so you would have to guess. An educated guess might conclude that it could be the first IP address in the GatewaySubnet, and indeed if you configured this route, traffic would work fine:

However, I wouldn’t recommend you to use this approach. First, it is not documented or supported. Second, if the IP address of the gateway changed, you would be blackholing your traffic without an easy way to fix it (I tried to nmap the GatewaySubnet with port scans, but I couldn’t find any indications of which IP addresses the gateway takes).
Be careful with dummy VNets!
Read the above as “don’t do this either”.
Since we are testing out hacks, why not trying the dummy VNet trick? Essentially, this is what we used to do before we got the Route Server: creating a “dummy VNet” (or “ghost VNet”, “summarization VNet”, “advertising VNet”, whatever you want to call it) whose only purpose is forcing the ER gateway to advertise its prefixes. Something like this:

The first bad thing that will happen is that the S2S tunnels will break. I am not sure why, but with the NVA I am using, as soon as I start sending out a prefix from the VPN gateway that overlaps with onprem, the IPsec tunnels go down.
For P2S the tunnels won’t break, but I haven’t been able to make the flows P2S-ExpressRoute work (P2S-Azure works just fine).
And since this solution is not really recommended, opening up a support ticket will not be of much help, since the answer you would get is probably “go and deploy an Azure Route Server”. Which is the answer I would give you as well.
Adding up
We have seen different techniques to bend traffic from on-premises through an Azure Firewall, both for onprem-to-Azure as well as onprem-to-onprem. Depending on your scenario and your requirements, the solution might be very easy, or potentially you might end up having to deal with Azure Route Server or even Network Virtual Appliances.
And whatever you do, my recommendation is to stay away from hacks (or what I call “duct tape networking”).
Did I miss anything? Please let me know in the comments!
Have you thought of moving ER GW and VPN Gateway outside the hub? It would increase the cost of the extra hop to route to on-prem, but would remove the need for adding specific UDR’s for spokes and private endpoints. Basically, as gateway spoke.
LikeLike
So what would stay in the hub? Just the Azure Firewall? How would the ER gateway advertise the spoke prefixes?
LikeLike
Was thinking firewall, plus other common services, such as DNS. But, I overlooked the requirement of the ER GW Vnet being peered to spokes in order for it to advertise spoke routes. Just finding it a pain to have to add routes from the gateway subnet pointing to firewalls to avoid having spoke traffic bypass the firewalls. Thanks for the reply.
LikeLike
Don’t get me wrong, it is a valid design. For VPN GWs with static routing or SDWAN appliances, a couple of UDRs will be enough. For BGP VPN or ER, you will need either Route Server or the “advertising VNet” trick.
LikeLike
Is it possible to do the “Filtering between P2S and S2S” topology without the Route Server? it’s just too expensive…
If so, how would the UDR’s look on both sides of the firewall?
LikeLike
Yes. If you take ER out of the picture, and you put each gw (p2s and s2s) in different VNets, you can certainly do that. Look at the relevant routes for the left P2S and the S2S in the article.
LikeLike