What language does the Azure Gateway Load Balancer speak?

As you might have read, one of the new kids on the block in Azure Networking is the Gateway Load Balancer. You can refer to Microsoft docs for more details on what it does and why it was created, suffice to say that it is essentially a way to insert an NVA in a network flow without routing manipulation with UDRs. One of the scenarios that are targeted with this technology is sharing NVAs across different environments independent from each other, as the diagram in Gateway Load Balancer suggests.

Why this blog post? Because the official How-To guide only shows how to configure the Azure part, but not the NVA side of the house. In this post I am going to try to bring everything together, using a Linux-based NVA.

The following diagram describes the architecture we are going to have (if you want to deploy this, you can use my Azure CLI script in https://github.com/erjosito/azcli/blob/master/gwlb.azcli:

Test topology

Let’s have a look on how this works:

  • (1): The client (with a public IP 93.104.178.66) accesses the application exposed over a public Azure Load Balancer on the public IP 20.101.109.26.
  • (2): The public load balancer is configured to redirect traffic to the Gateway Load Balancer, in this example in a different VNet. Note the application VNet and the NVA VNets are not peered.
  • (3): The Gateway Load Balancer sends the packet to the NVA over a VXLAN tunnel called “External”
  • (4): The NVA will decapsulate the packet, do its NVA magic, and for legitimate traffic it will forward the packet back to the Gateway Load Balancer over a second VXLAN tunnel called “Internal”
  • (5): The Gateway Load Balancer will return the packet to the application public Load Balancer
  • (6): the packet will finally arrive to the web server, as if nothing had happened.

The Gateway Load Balancer is transparent

As a web server I am using an API that returns some interesting information such as the source IP of the request and the accessed URL, which allows to discover if there is any NAT device or reverse-proxy in the way. Let’s verify that it is all working by accessing the application from the client:

❯ curl http://20.101.109.26:8080/api/ip
{
  "my_default_gateway": "192.168.1.1",
  "my_dns_servers": "",
  "my_private_ip": "192.168.1.4",
  "my_public_ip": "51.124.253.205",
  "path_accessed": "20.101.109.26:8080/api/ip",
  "x-forwarded-for": null,
  "your_address": "93.104.178.66",
  "your_browser": "None",
  "your_platform": "None"
}

The most important information is that the client IP address (“your_address” in the previous output) is still 93.104.178.66, so no NAT has happened, and the client IP is still visible to the web server.

The Load Balancers talk to each other

The glue between the NVA and the VM is in the connection between the load balancers. This is done from the application Load Balancer frontend, which knows to which Gateway Load Balancer to forward the traffic:

❯ az network lb frontend-ip list --lb-name client1-alb -g gwlb --query '[].gatewayLoadBalancer'
[
  {
    "id": "/subscriptions/e7da9914-9b05-4891-893c-546cb7b0422e/resourceGroups/gwlb/providers/Microsoft.Network/loadBalancers/nvalb/frontendIPConfigurations/nvafrontend",
    "resourceGroup": "gwlb"
  }
]

So when the inbound traffic from the Internet hits the public frontend of the application load balancer, it knows where to send it: to the frontend configuration of the Gateway Load Balancer.

Two VXLAN tunnels

The Gateway Load Balancer has two tunnels configured in its backend pool (where the NVA is attached):

❯ az network lb address-pool tunnel-interface list --address-pool nvas --lb-name nvalb -g gwlb
Command group 'network lb address-pool tunnel-interface' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus
[
  {
    "identifier": 900,
    "port": 10800,
    "protocol": "VXLAN",
    "type": "Internal"
  },
  {
    "identifier": 901,
    "port": 10801,
    "protocol": "VXLAN",
    "type": "External"
  }
]

Some interesting things to notice:

  • Each tunnel uses a different UDP port for VXLAN: UDP 10800 in the internal tunnel, and UDP 10801 in the external one. These ports need to be different from each other, and they cannot be the standard VXLAN UDP port 4789.
  • Additionally, each tunnel will use a different Virtual Network Identifier (VNI): 900 for the internal tunnel, 901 for the external one.

The NVA, a Linux machine in this case, has the other side of these VXLAN tunnels configured:

jose@nva1:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1600 qdisc mq state UP group default qlen 1000
    link/ether 60:45:bd:8b:5f:ae brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.4/26 brd 192.168.0.63 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6245:bdff:fe8b:5fae/64 scope link
       valid_lft forever preferred_lft forever
11: vxlan900: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-client1 state UNKNOWN group default qlen 1000
    link/ether 92:b9:9a:16:1c:a1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::90b9:9aff:fe16:1ca1/64 scope link
       valid_lft forever preferred_lft forever
12: vxlan901: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-client1 state UNKNOWN group default qlen 1000
    link/ether 9e:8a:e2:70:53:a0 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9c8a:e2ff:fe70:53a0/64 scope link
       valid_lft forever preferred_lft forever
13: br-client1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 92:b9:9a:16:1c:a1 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::90b9:9aff:fe16:1ca1/64 scope link
       valid_lft forever preferred_lft forever

Again some remarks:

  • The VXLAN interfaces (vxlan900 and vxlan901) do not need an IP address
  • The only goal of the bridge “br-client1” is to send everything that comes from one tunnel to the other. In a real NVA you would want to do some traffic inspection or other operations in the packets

Let’s have a look at one of the VXLAN interfaces, to see how it is configured:

jose@nva1:~$ ip -d link show vxlan900
11: vxlan900: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-client1 state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 92:b9:9a:16:1c:a1 brd ff:ff:ff:ff:ff:ff promiscuity 1
    vxlan id 900 remote 192.168.0.5 srcport 0 0 dstport 10800 nolearning ttl inherit ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx
    bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on port_id 0x8001 port_no 0x1 designated_port 32769 designated_cost 0 designated_bridge 8000.92:b9:9a:16:1c:a1 designated_root 8000.92:b9:9a:16:1c:a1 hold_timer    0.00 message_age_timer    0.00 forward_delay_timer    0.00 topology_change_ack 0 config_pending 0 proxy_arp off proxy_arp_wifi off mcast_router 1 mcast_fast_leave off mcast_flood on neigh_suppress off group_fwd_mask 0x0 group_fwd_mask_str 0x0 vlan_tunnel off addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

The other VXLAN interface (the external one) would be configured in a similar way. The most important bit is the beginning of the last line, where we can see the VXLAN configuration for this tunnel:

  • The VNI is 900, as we already saw in the Gateway Load Balancer configuration
  • The remote IP is the private IP address of the Gateway Load Balancer
  • As source port it will used the default
  • And as destination port, UDP 10800, as we saw in the Gateway Load Balancer configuration (instead of the VXLAN default of UDP 4789)

Verifying traffic flows through the NVA

If we send another HTTP request and capture the traffic at the same time in one of the VXLAN tunnel interfaces, we will see the whole communication exchange:

jose@nva1:~$ sudo tcpdump -n -i vxlan900 host 93.104.178.66 and tcp port 8080
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vxlan900, link-type EN10MB (Ethernet), capture size 262144 bytes
21:58:39.951328 IP 93.104.178.66.52078 > 20.101.109.26.8080: Flags [S], seq 2452643535, win 64480, options [mss 1240,sackOK,TS val 4098255373 ecr 0,nop,wscale 7], length 0
21:58:39.954860 IP 20.101.109.26.8080 > 93.104.178.66.52078: Flags [S.], seq 3894921302, ack 2452643536, win 65160, options [mss 1420,sackOK,TS val 1945273291 ecr 4098255373,nop,wscale 7], length 0
21:58:39.981158 IP 93.104.178.66.52078 > 20.101.109.26.8080: Flags [.], ack 1, win 504, options [nop,nop,TS val 4098255400 ecr 1945273291], length 0
21:58:39.981198 IP 93.104.178.66.52078 > 20.101.109.26.8080: Flags [P.], seq 1:89, ack 1, win 504, options [nop,nop,TS val 4098255401 ecr 1945273291], length 88: HTTP: GET /api/ip HTTP/1.1
21:58:39.983158 IP 20.101.109.26.8080 > 93.104.178.66.52078: Flags [.], ack 89, win 509, options [nop,nop,TS val 1945273320 ecr 4098255401], length 0
21:58:41.195276 IP 20.101.109.26.8080 > 93.104.178.66.52078: Flags [P.], seq 1:18, ack 89, win 509, options [nop,nop,TS val 1945274531 ecr 4098255401], length 17: HTTP: HTTP/1.0 200 OK
21:58:41.195276 IP 20.101.109.26.8080 > 93.104.178.66.52078: Flags [FP.], seq 18:454, ack 89, win 509, options [nop,nop,TS val 1945274531 ecr 4098255401], length 436: HTTP
21:58:41.245515 IP 93.104.178.66.52078 > 20.101.109.26.8080: Flags [.], ack 18, win 504, options [nop,nop,TS val 4098256670 ecr 1945274531], length 0
21:58:41.245720 IP 93.104.178.66.52078 > 20.101.109.26.8080: Flags [F.], seq 89, ack 455, win 503, options [nop,nop,TS val 4098256670 ecr 1945274531], length 0
21:58:41.247646 IP 20.101.109.26.8080 > 93.104.178.66.52078: Flags [.], ack 90, win 509, options [nop,nop,TS val 1945274584 ecr 4098256670], length 0

Note that the whole TCP exchange is here, from the 3-way handshake to the very last FINACK. Something interesting noticing is that only the original source and destination IP addresses are present here.

And here we conclude this post, I hope you have a more clear understanding of the configuration that is required in an NVA to interact with the Azure Gateway Load Balancer, and how traffic is forwarded between different pieces.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: