VNet peering settings, those familiar strangers

Hey everybody! In this post I would like to talk about some of the settings that you can configure in VNet Peerings, and how those actually work. Even if you have been using VNet peerings for years now, I bet I have some surprises for you.

TL;DR:

Do not rely in the VirtualNetwork service tag to filter traffic, it can contain more prefixes than you might think
Do not rely in the AllowForwardedTraffic peering flag to drop traffic, in many situations it will just not work

And now let’s get into it. First of all, here my basic setup, nothing fancy:

Testbed

For simplicity reasons I haven’t put in the diagram the route-table associated to the GatewaySubnet, which sends traffic from onprem to the firewall NVA (otherwise you would have asymmetric traffic for onprem-Azure).

And here the VNet peering setting values:

Peering	AllowFwdedTraffic	AllowVNetAccess	UseRemoteGW	AllowGWTransit
hub-spoke1	False	True	False	True
hub-spoke2	False	True	False	True
spoke1-hub	True	True	True	False
spoke2-hub	True	True	True	False

VNet peering settings

For a detailed description of what each of these settings do, please refer to the Azure doc Create a Virtual Network peering.

Let’s explain why these settings are the most usual ones for a hub and spoke architecture:

UseRemoteGateway in the spoke-side peerings, so that the VPN Gateway or ExpressRoute Gateway advertise the spoke prefixes over BGP. In theory you wouldn’t need this if you were using IPsec tunnels with static routing.
AllowGatewayTransit in the hub-side peerings, for the same reason
AllowVirtualNetworkAccess in all peerings, so that the VMs can reach each other (hub to spoke, spoke to hub)
AllowForwardedTraffic in the spoke-side peerings, so that they take traffic what is not originated inside the hub vnet (like from other spokes).

By the way, I am using the names with which these settings show up in the API, not necessarily how they appear in the Azure portal when you configure VNet peerings. It shouldn’t be too difficult to relate those names to the portal though.

Verifying everything works

The spokes have connectivity to the branch machine (172.16.200.132):

jose@spoke1vm:~$ ping 172.16.200.132 -c 5
PING 172.16.200.132 (172.16.200.132) 56(84) bytes of data.
64 bytes from 172.16.200.132: icmp_seq=1 ttl=63 time=8.58 ms
64 bytes from 172.16.200.132: icmp_seq=2 ttl=63 time=7.60 ms
64 bytes from 172.16.200.132: icmp_seq=3 ttl=63 time=8.60 ms
64 bytes from 172.16.200.132: icmp_seq=4 ttl=63 time=8.12 ms
64 bytes from 172.16.200.132: icmp_seq=5 ttl=63 time=8.23 ms

--- 172.16.200.132 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 7.608/8.230/8.605/0.373 ms

The branch is seeing the actual IP of the spoke as source (no NAT in between performed by the NVA):

jose@branch:~$ sudo tcpdump -n icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
12:24:23.909811 IP 10.11.0.4 > 172.16.200.132: ICMP echo request, id 11949, seq 1, length 64
12:24:23.909860 IP 172.16.200.132 > 10.11.0.4: ICMP echo reply, id 11949, seq 1, length 64
12:24:24.911962 IP 10.11.0.4 > 172.16.200.132: ICMP echo request, id 11949, seq 2, length 64
12:24:24.911995 IP 172.16.200.132 > 10.11.0.4: ICMP echo reply, id 11949, seq 2, length 64
12:24:25.913946 IP 10.11.0.4 > 172.16.200.132: ICMP echo request, id 11949, seq 3, length 64
12:24:25.913978 IP 172.16.200.132 > 10.11.0.4: ICMP echo reply, id 11949, seq 3, length 64
12:24:26.913760 IP 10.11.0.4 > 172.16.200.132: ICMP echo request, id 11949, seq 4, length 64
12:24:26.913793 IP 172.16.200.132 > 10.11.0.4: ICMP echo reply, id 11949, seq 4, length 64
12:24:27.915808 IP 10.11.0.4 > 172.16.200.132: ICMP echo request, id 11949, seq 5, length 64
12:24:27.915841 IP 172.16.200.132 > 10.11.0.4: ICMP echo reply, id 11949, seq 5, length 64

Aditionally, when going to the public Internet from the spokes, the traffic is going through the NVA, and consequently sourced by the public IP address of the NVA:

jose@spoke1vm:~$ curl -s4 ifconfig.co
52.148.241.51

❯ az network public-ip list -o table -g $rg
Name         ResourceGroup    Location    Zones    Address        AddressVersion    AllocationMethod    IdleTimeoutInMinutes    ProvisioningState
-----------  ---------------  ----------  -------  -------------  ----------------  ------------------  ----------------------  -------------------
spoke1-pip   vpntest          westeurope           40.118.3.153   IPv4              Dynamic             4                       Succeeded
spoke2-pip   vpntest          westeurope           52.174.2.58    IPv4              Dynamic             4                       Succeeded
nvavm-pip    vpntest          westeurope           52.148.241.51  IPv4              Dynamic             4                       Succeeded

AllowVirtualNetworkAccess

The first of the options we are going to look at is “AllowVirtualNetworkAccess”. The way this works is by adding the peered VNet’s prefix to the VirtualNetwork service tag, and letting the Network Security Group (NSG).

You can have a look at the actual values behind the service tags using the Azure CLI (az network nic list-effective-nsg) or the portal. Let’s have a look at the effective rules for one of the spokes:

0.0.0.0/0 included in Virtual Network service tag

Oh, so VirtualNetwork is 0.0.0.0/0? Why? The reason is because of the UDR (User-Defined Route) applied to the VM’s subnet, where all traffic (0.0.0.0/0) is sent to the NVA. The VirtualNetwork service tag will inherit prefixes from different sources, as far as I know these ones:

The local VNet prefix
Remote VNet prefixes (if configured with AllowVirtualNetworkAccess = True)
Prefixes injected into the VNet by a local or remote Virtual Network Gateway (VPN or ER), Route Server or Virtual WAN
Prefixes configured in a local UDR pointing to a next-hop of type NVA

So what would happen if we disable the flag AllowVirtualNetworkAccess? Let’s try and disable it in the peering from spoke1 to the hub:

az network vnet peering update -n spoke1tohub -g $rg --vnet-name spoke1 --set allowVirtualNetworkAccess=false

Now we can check again the effective rules:

10.1.0.0/16 excluded from VirtualNetwork service tag

Wow, what is going on here? It’s simple: Azure removed the remote VNet’s prefix from the VirtualNetwork service tag. Since we modified the spoke side, the remote VNet is the hub, 10.1.0.0/16. If you look closely at the list of prefixes contained in the VirtualNetwork service tag, you will realize that they contain all of the 32-bit IP address space except 10.1.0.0/16 (the hub VNet prefix). Let’s see if it works, if the spoke1 can still reach the hub:

jose@spoke1vm:~$ nc -vz 10.1.1.4 80
nc: connect to 10.1.1.4 port 80 (tcp) failed: Connection timed out

The NSG actually is the mechanism that dropped the traffic. It can be checked having a look at the NSG flow logs, which I inspect using the Python script that I published in the get_nsg_logs repository:

❯ python3 ~/repos/get_nsg_logs/get_nsg_logs.py --account-name $storage_account_name --display-hours 1 --display-direction both --version 2 --display-allowed --display-minutes 20 | grep 10.11.0.4 | grep 10.1.1.4
2021-06-18T12:52:28.4856075Z SPOKE1VMNSG DefaultRule_DenyAllOutBound D O 10.11.0.4 tcp 36432 10.1.1.4 22 B src2dst: / dst2src: /

As you can see, there were a couple of connection attempts that were blocked by the Network Security Group.

Let’s try now the opposite direction, and access from the hub to the spoke. And let’s try with two different protocols: HTTP (port 80) and SSH (port 22):

jose@nva:~$ nc -vz 10.11.0.4 80
nc: connect to 10.1.1.4 port 80 (tcp) failed: Connection timed out

jose@nva:~$ nc -vz 10.11.0.4 22
Connection to 10.11.0.4 22 port [tcp/ssh] succeeded!

Alright, port 80 didn’t work as expected (even if I have a web server running on each VM), but why did port 22 work? Let’s have a look at the flow logs again:

❯ python3 ~/repos/get_nsg_logs/get_nsg_logs.py --account-name $storage_account_name --display-hours 2 --display-direction both --version 2 --display-allowed --display-minutes 10 | grep 10.11.0.4 | grep 10.1.1.4
2021-06-18T13:32:28.7193938Z SPOKE1VMNSG DefaultRule_DenyAllInBound D I 10.1.1.4 tcp 44336 10.11.0.4 80 B src2dst: / dst2src: /
2021-06-18T13:32:28.7193938Z SPOKE1VMNSG UserRule_default-allow-ssh A I 10.1.1.4 tcp 35434 10.11.0.4 22 B src2dst: / dst2src: /

As you can see, the access attempt on port 80 was correctly blocked by the NSG, since the packet ended up in the default inbound deny rule. However, there seems to be a explicit rule to allow all SSH attempts from any address (never do this at home!). In other words, disabling the AllowVirtualNetworkAccess flag doesn’t work if you have generic NSG rules that are too broad.

If we revert the AllowVirtualNetworkAccess setting to true, we can see that the NSG will allow the traffic on port 80 from the spoke to the hub with one of the default rules:

az network vnet peering update -n spoke1tohub -g $rg --vnet-name spoke1 --set allowVirtualNetworkAccess=true

jose@spoke1vm:~$ nc -vz 10.1.1.4 80
Connection to 10.1.1.4 80 port [tcp/http] succeeded!

❯ python3 ~/repos/get_nsg_logs/get_nsg_logs.py --account-name $storage_account_name --display-hours 2 --display-direction both --version 2 --display-allowed --display-minutes 5 | grep 10.11.0.4 | grep 10.1.1.4
2021-06-18T12:59:28.5441412Z SPOKE1VMNSG DefaultRule_AllowVnetOutBound A O 10.11.0.4 tcp 49604 10.1.1.4 80 B src2dst: / dst2src: /

The Internet Service Tag

We had a look at the VirtualNetwork service tag, but there is another interesting tag used in NSG default rules: Internet. Let´s have a look at what it contains:

Internet service tag (1)

Internet service tag (2)

Internet service tag (3)

As you can see in the precent screenshot, it contains 246 prefixes. You can have a look at the prefix list, let me remark some interesting facts:

10.0.0.0/8 (blue arrow of the screenshots above) is missing, even if the 10.0.0.0/8 is not configured anywhere (although the hub is 10.1.0.0/16, and the spokes are 10.11.0.0/16 and 10.12.0.0/16 respectively)
172.16.0.0/12 is missing too, it should be right over the entry with the green arrow. Out of this range the only IPs in our testbed is the subnet 172.16.200.0/24 for the on-premises
And by now it is no surprise that 192.168.0.0/16 is not there either (even if no
APIPA range 169.254.0.0/16 is not covered either, it should be before the entry highlighted by the red arrow
I am not showing it in the screenshots, but 100.64.0.0/10 (RFC 6598) is not covered either.
The last first octet covered is 211. From 212.0.0.0/8 onwards all those prefixes (including multicast ones) are not included in the Internet service tag. Refer to https://www.iana.org/assignments/ipv4-address-space/ipv4-address-space.xhtml for more information about the official allocation of each prefix.

There are probably more missing, but the corollary is that the Internet service tag doesn’t seem to take into consideration the VNet peering configuration settings, but is statically configured to cover IANA’s public IP address space.

AllowForwardedTraffic

As described by the Azure documentation, this flag serves to “allow traffic forwarded by a network virtual appliance in a virtual network (that didn’t originate from the virtual network) to flow to this virtual network through a peering“. The tool tip in the portal says “This setting allows traffic forwarded from hub (traffic not originated from inside hub) into spoke1“. Pretty clear, isn’t it? Let’s find out.

Since our peerings have this flag set to true, we expect spokes to be able to talk to each other:

jose@spoke1vm:~$ ping 10.12.0.4 -c 5
PING 10.12.0.4 (10.12.0.4) 56(84) bytes of data.
64 bytes from 10.12.0.4: icmp_seq=1 ttl=63 time=4.61 ms
64 bytes from 10.12.0.4: icmp_seq=2 ttl=63 time=4.06 ms
64 bytes from 10.12.0.4: icmp_seq=3 ttl=63 time=3.95 ms
64 bytes from 10.12.0.4: icmp_seq=4 ttl=63 time=5.31 ms
64 bytes from 10.12.0.4: icmp_seq=5 ttl=63 time=4.33 ms

--- 10.12.0.4 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 3.952/4.456/5.317/0.495 ms

Let’s check that the NVA is not source-NATting, by running tcpdump in the destination VM in spoke 2:

jose@spoke2vm:~$ sudo tcpdump -n icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:11:16.974155 IP 10.11.0.4 > 10.12.0.4: ICMP echo request, id 27140, seq 1, length 64
09:11:16.974203 IP 10.12.0.4 > 10.11.0.4: ICMP echo reply, id 27140, seq 1, length 64
...

Fantastic! Now we can disable AllowForwardedTraffic in the spoke peerings:

❯ az network vnet peering update -n spoke1tohub -g $rg --vnet-name spoke1 --set AllowForwardedTraffic=false
❯ az network vnet peering update -n spoke2tohub -g $rg --vnet-name spoke2 --set AllowForwardedTraffic=false

And try the ping again:

jose@spoke1vm:~$ ping 10.12.0.4 -c 5
PING 10.12.0.4 (10.12.0.4) 56(84) bytes of data.
64 bytes from 10.12.0.4: icmp_seq=1 ttl=63 time=4.61 ms
64 bytes from 10.12.0.4: icmp_seq=2 ttl=63 time=4.06 ms
64 bytes from 10.12.0.4: icmp_seq=3 ttl=63 time=3.95 ms
64 bytes from 10.12.0.4: icmp_seq=4 ttl=63 time=5.31 ms
64 bytes from 10.12.0.4: icmp_seq=5 ttl=63 time=4.33 ms

--- 10.12.0.4 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 3.952/4.456/5.317/0.495 ms

Still working! Why? According to the official docs and the Azure portal tool tip, this shouldn’t be happening, and the spokes should not see each other, since they are receiving traffic from the hub NVA coming from a different range (the other spoke) than the directly peered VNet (the hub), and we disabled forwarded traffic.

I am not sure of the underlying reason, but I suspect that there is some dependency with the deployment of Virtual Network Gateways (in this case of VPN type) in the hub. To verify my theory, let’s disable UseRemoteGateways in the peering spoke2tohub (at the spoke side):

az network vnet peering update -n spoke2tohub -g $rg --vnet-name spoke2 --set useRemoteGateways=false

And test again, from each side:

jose@spoke1vm:~$ ping 10.12.0.4 -c 5
PING 10.12.0.4 (10.12.0.4) 56(84) bytes of data.

--- 10.12.0.4 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4103ms

jose@spoke2vm:~$ ping 10.11.0.4 -c 5
PING 10.11.0.4 (10.11.0.4) 56(84) bytes of data.
64 bytes from 10.11.0.4: icmp_seq=1 ttl=63 time=5.73 ms
64 bytes from 10.11.0.4: icmp_seq=2 ttl=63 time=4.22 ms
64 bytes from 10.11.0.4: icmp_seq=3 ttl=63 time=5.02 ms
64 bytes from 10.11.0.4: icmp_seq=4 ttl=63 time=4.32 ms
64 bytes from 10.11.0.4: icmp_seq=5 ttl=63 time=4.53 ms

--- 10.11.0.4 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 4.222/4.767/5.734/0.563 ms

Interesting! Disabling UseRemoteGateways in spoke2 stopped the ping from spoke1 to spoke2, but didn’t affect the ping in the opposite direction. If we now disable UseRemoteGateways in spoke1, then the AllowForwardedTraffic=False setting will finally be effective:

az network vnet peering update -n spoke1tohub -g $rg --vnet-name spoke1 --set useRemoteGateways=false

jose@spoke1vm:~$ ping 10.12.0.4 -c 5
PING 10.12.0.4 (10.12.0.4) 56(84) bytes of data.

--- 10.12.0.4 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4089ms

jose@spoke1vm:~$ nc -vz 10.12.0.4 80
nc: connect to 10.12.0.4 port 80 (tcp) failed: Connection timed out

jose@spoke2vm:~$ ping 10.11.0.4 -c 5
PING 10.11.0.4 (10.11.0.4) 56(84) bytes of data.

--- 10.11.0.4 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4092ms

jose@spoke2vm:~$ nc -vz 10.11.0.4 80
nc: connect to 10.11.0.4 port 80 (tcp) failed: Connection timed out

So it looks like the AllowForwardedTraffic setting is only effective for dropping traffic if UseRemoteGateways=False. For completeness, here is what we have now:

Peering	AllowFwdedTraffic	AllowVNetAccess	UseRemoteGW	AllowGWTransit
hub-spoke1	False	True	False	True
hub-spoke2	False	True	False	True
spoke1-hub	False	True	False	False
spoke2-hub	False	True	False	False

VNet peering settings

Connectivity to on-premises

You will be wondering: by setting UseRemoteGateways=False we lost connectivity to on-premises, right? Well, something you should know is that I am using static routing in my VPN tunnels. Let’s verify:

jose@spoke1vm:~$ ping 172.16.200.132 -c 3
PING 172.16.200.132 (172.16.200.132) 56(84) bytes of data.
64 bytes from 172.16.200.132: icmp_seq=1 ttl=63 time=8.90 ms
64 bytes from 172.16.200.132: icmp_seq=2 ttl=63 time=8.08 ms
64 bytes from 172.16.200.132: icmp_seq=3 ttl=63 time=7.06 ms

--- 172.16.200.132 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 7.069/8.016/8.901/0.756 ms

jose@onprem:~$ sudo tcpdump -n icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:32:27.673530 IP 10.11.0.4 > 172.16.200.132: ICMP echo request, id 30797, seq 1, length 64
09:32:27.673566 IP 172.16.200.132 > 10.11.0.4: ICMP echo reply, id 30797, seq 1, length 64
09:32:28.675721 IP 10.11.0.4 > 172.16.200.132: ICMP echo request, id 30797, seq 2, length 64
09:32:28.675758 IP 172.16.200.132 > 10.11.0.4: ICMP echo reply, id 30797, seq 2, length 64
09:32:29.675625 IP 10.11.0.4 > 172.16.200.132: ICMP echo request, id 30797, seq 3, length 64
09:32:29.675674 IP 172.16.200.132 > 10.11.0.4: ICMP echo reply, id 30797, seq 3, length 64

So all is working! But would it still work if using BGP? With VPN gateways you have the luxury of choosing static or dynamic routing, but with ExpressRoute it will always be dynamic. Well, let me configure BGP in my onprem routers (using Cisco CSRs for testing), and see what prefixes I am getting from Azure:

branch-nva1#sh ip bgp neig 10.1.0.254 | i BGP state
  BGP state = Established, up for 00:00:34
branch-nva1#sh ip bgp neig 10.1.0.254 routes
BGP table version is 91, local router ID is 172.16.200.11
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>   10.1.0.0/16      10.1.0.254                             0 65001 i

Total number of prefixes 1

Mmh, only the hub prefix. To double-check that the UseRemoteGateways is controlling BGP advertisement, let’s enable it in one of the spokes (and remember, this will enable again spoke-to-spoke communication invalidating the setting AllowForwardedTraffic=False):

az network vnet peering update -n spoke1tohub -g $rg --vnet-name spoke1 --set useRemoteGateways=true

branch-nva1#sh ip bgp neig 10.1.0.254 routes
BGP table version is 92, local router ID is 172.16.200.11
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>   10.1.0.0/16      10.1.0.254                             0 65001 i
 *>   10.11.0.0/24     10.1.0.254                             0 65001 i

Total number of prefixes 2

You can see that after enabling UseRemoteGateways in spoke1, it enabled advertisement of the prefix of spoke 1 (10.11.0.0/16). This was a surprise for me, since I thought that BGP advertisement control depended entirely of the AllowGatewayTransit setting in the hub-side peerings (hubtospoke1 and hubtospoke1 in my example), and not on the UseRemoteGateways setting of the spoke-side peerings.

As a consequence, if you want to use AllowForwardedTraffic=False to prevent spoke-to-spoke communication, either you don’t have any connectivity to onprem, or you do it without using ExpressRoute or BGP-based VPN site-to-site tunnels. In my humble opinion too many restrictions for a “feature” to be usable.

Corollary: do not rely on the flag AllowForwardedTraffic to block spoke-to-spoke communication, instead configure some Deny rules in your Network Virtual Appliance in the hub

Conclusion

Hopefully I could explain those two initial points:

Do not rely in the VirtualNetwork service tag to filter traffic. Depending on your UDR configuration and the prefixes you advertise from on-premises, it could be that the “VirtualNetwork” service tag is equivalent to “Any” (0.0.0.0/0)
Do not use AllowForwardedTraffic peering flag to drop traffic, in many situations involving connectivity to an on-premises network it will just not work. Instead, configure Deny rules in your NVA

And that is all, I hope this post gives you a better understanding of what those familiar settings of VNet peerings actually do. Please let me know if you think I have made a mistake in my thinking process, or if you have any other comment. And in any case, thanks for reading!