Thanks for the good feedback on this blog series! Here another set of questions I have been receiving lately: how does outbound connectivity look like for AKS pods? To answer that, we will look at how that works on both the Azure CNI and kubenet AKS clusters, deployed in one virtual network.
After deploying your Azure CNI and kubenet clusters and a test VM (see part 1 and part 2 of this blog series), let us have a look first at Azure CNI.
Other posts in this series:
- Part 1: deep dive in AKS with Azure CNI in your own vnet
- Part 2: deep dive in AKS with kubenet in your own vnet, and ingress controllers
- Part 3 (this post): outbound connectivity from AKS pods
- Part 4: NSGs with Azure CNI cluster
- Part 5: Virtual Node
- Part 6: Network Policy with Azure CNI
Outbound connections for pods in Azure CNI AKS clusters
We will start with the same whereami application that we used in the part 1 of this blog series (refer to it for more details):
k apply -f ./whereami.yaml
Let us check that the pods are deployed, and that the LoadBalancer service is provisioned:
$ k get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME aks-nodepool1-26711606-0 Ready agent 18m v1.11.5 10.13.76.4 <none> Ubuntu 16.04.5 LTS 4.15.0-1036-azure docker://3.0.1 $ $ k get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE whereami-564765b89-vdkjd 1/1 Running 0 11m 10.13.76.13 aks-nodepool1-26711606-0 <none> whereami-564765b89-xcvjm 1/1 Running 0 11m 10.13.76.28 aks-nodepool1-26711606-0 <none> $ $ k get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 22m whereami LoadBalancer 10.0.134.154 104.45.15.171 80:32188/TCP 11m
After your pods are deployed, we can start testing. First, let us look at connectivity inside of the vnet by pinging our test VM:
$ k exec whereami-564765b89-vdkjd -- ping 10.13.1.4 PING 10.13.1.4 (10.13.1.4) 56(84) bytes of data. 64 bytes from 10.13.1.4: icmp_seq=1 ttl=64 time=0.676 ms 64 bytes from 10.13.1.4: icmp_seq=2 ttl=64 time=1.49 ms
10.13.1.4 is the IP address of our test VM. If you tcpdump for ICMP packets while the previous ping is running, you will see the IP address the pod is coming from:
jose@testvm:~$ sudo tcpdump -i eth0 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 22:11:58.729044 IP 10.13.76.13 > testvm: ICMP echo request, id 16, seq 33, length 64 22:11:58.729101 IP testvm > 10.13.76.13: ICMP echo reply, id 16, seq 33, length 64 22:11:59.753203 IP 10.13.76.13 > testvm: ICMP echo request, id 16, seq 34, length 64 22:11:59.753241 IP testvm > 10.13.76.13: ICMP echo reply, id 16, seq 34, length 64
As expected, the pod is coming from its own IP address, and not from the node’s IP.
What about our public IP address when going to the Internet? Let us check with the site ifconfig.co, that returns the IP address of the visitors:
$ k exec whereami-564765b89-vdkjd -- curl -s ifconfig.co 104.45.15.171
That IP address should look familiar, it is the one associated with the LoadBalancer service. This is interesting, because the pod it is coming from its own Azure IP configuration (where IP addresses for VMs) are defined, but still getting the LB’s IP address as source when browsing to the Internet
$ noderg=$(az aks show -g $rg -n $aksname_azure --query nodeResourceGroup -o tsv) $ az network lb list -g $noderg --query [].[name,sku.name] -o tsv kubernetes Basic
Something else to notice is that the Azure Load Balancer is of the Basic type. This is especially relevant for outgoing connections, since the basic Azure Load Balancer is going to allocate a limited amount of ephemeral ports per node (1,024, as described here). When AKS supports the standard Azure LB (in the meantime already supported on AKS-engine) this limit will disappear.
What about ingress controllers? Let us enable the HTTP routing addon of AKS, and deploy an app using an ingress:
$ az aks enable-addons -a http_application_routing -g $rg -n $aksname_azure $ k apply -f ./whereami-ingress.yaml
If you look at all the services (since the nginx ingress controllers are deployed in kube-system), you will see two external IP addresses for LoadBalancer-type services:
$ k get svc --all-namespaces NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 41m default whereami LoadBalancer 10.0.134.154 104.45.15.171 80:32188/TCP 29m default whereami-ingress ClusterIP 10.0.82.109 <none> 80/TCP 33s kube-system addon-http-application-routing-default-http-backend ClusterIP 10.0.209.122 <none> 80/TCP 1m kube-system addon-http-application-routing-nginx-ingress LoadBalancer 10.0.120.136 104.46.56.131 80:31474/TCP,443:30562/TCP 1m kube-system heapster ClusterIP 10.0.157.125 <none> 80/TCP 40m kube-system kube-dns ClusterIP 10.0.0.10 <none> 53/UDP,53/TCP 40m kube-system kubernetes-dashboard ClusterIP 10.0.42.100 <none> 80/TCP 40m kube-system metrics-server ClusterIP 10.0.28.34 <none> 443/TCP 40m
And we have two additional pods:
$ k get po NAME READY STATUS RESTARTS AGE whereami-564765b89-vdkjd 1/1 Running 0 31m whereami-564765b89-xcvjm 1/1 Running 0 31m whereami-ingress-7f97d7c96f-6ws82 1/1 Running 0 2m whereami-ingress-7f97d7c96f-qcf6p 1/1 Running 0 2m
Let’s see which IP address the new pods get:
$ k exec whereami-ingress-7f97d7c96f-qcf6p -- curl -s ifconfig.co 104.45.15.171
They get the public IP address associated to the first service! But that one does not have anything to do with the new pods. Why is that? Again, quoting Microsoft documentation: “When multiple public IP addresses are associated with Load Balancer Basic, any of these public IP addresses are a candidate for outbound flows, and one is selected at random“.
Outbound connectivity in Kubenet clusters
What about using kubenet? Let’s do a similar deployment as the one we just did for the Azure CNI. Refer to part 2 of this blog series for more details on how to deploy a kubenet AKS cluster in your own Vnet:
$ k config use-context kubenetcluster $ k apply -f ./whereami.yaml $ az aks enable-addons -a http_application_routing -g $rg -n $aksname_kubenet $ k apply -f ./whereami-ingress.yaml
Let us gather the usual information:
$ k get node -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME aks-nodepool1-35377101-0 Ready agent 52m v1.11.5 10.13.77.5 <none> Ubuntu 16.04.5 LTS 4.15.0-1036-azure docker://3.0.1 aks-nodepool1-35377101-1 Ready agent 52m v1.11.5 10.13.77.4 <none> Ubuntu 16.04.5 LTS 4.15.0-1036-azure docker://3.0.1 $ $ k get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE whereami-564765b89-d8rdb 1/1 Running 0 4m 192.168.1.2 aks-nodepool1-35377101-1 <none> whereami-564765b89-z7p82 1/1 Running 0 4m 192.168.0.9 aks-nodepool1-35377101-0 <none> whereami-ingress-7f97d7c96f-2lqxf 1/1 Running 0 1m 192.168.0.10 aks-nodepool1-35377101-0 <none> whereami-ingress-7f97d7c96f-66vvv 1/1 Running 0 1m 192.168.1.7 aks-nodepool1-35377101-1 <none> $ $ k get svc --all-namespaces NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE default kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 56m default whereami LoadBalancer 10.0.58.118 137.117.164.147 80:30694/TCP 4m default whereami-ingress ClusterIP 10.0.186.51 <none> 80/TCP 1m kube-system addon-http-application-routing-default-http-backend ClusterIP 10.0.254.32 <none> 80/TCP 3m kube-system addon-http-application-routing-nginx-ingress LoadBalancer 10.0.128.165 104.214.236.151 80:30058/TCP,443:30733/TCP 3m kube-system heapster ClusterIP 10.0.165.163 <none> 80/TCP 56m kube-system kube-dns ClusterIP 10.0.0.10 <none> 53/UDP,53/TCP 56m kube-system kubernetes-dashboard ClusterIP 10.0.80.140 <none> 80/TCP 56m kube-system metrics-server ClusterIP 10.0.45.29 <none> 443/TCP 56m
Now, let’s try the private IP address of the VM in the same virtual network:
$ k exec whereami-564765b89-d8rdb -- ping 10.13.1.4 PING 10.13.1.4 (10.13.1.4) 56(84) bytes of data. 64 bytes from 10.13.1.4: icmp_seq=1 ttl=63 time=0.617 ms 64 bytes from 10.13.1.4: icmp_seq=2 ttl=63 time=1.75 ms 64 bytes from 10.13.1.4: icmp_seq=3 ttl=63 time=0.797 ms
And capture traffic at the VM:
jose@testvm:~$ sudo tcpdump icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 22:50:10.358307 IP 10.13.77.4 > testvm: ICMP echo request, id 11, seq 31, length 64 22:50:10.358320 IP testvm > 10.13.77.4: ICMP echo reply, id 11, seq 31, length 64
Interesting, traffic is sourced from 10.13.77.4, the IP of the node where our pod is running).
Let’s recheck intra-cluster communication, and do the same exercise from pod to pod. Let us start the ping from one pod to the other (note that the pods are in different nodes):
$ k exec whereami-564765b89-z7p82 -- ping 192.168.1.2 PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data. 64 bytes from 192.168.1.2: icmp_seq=1 ttl=62 time=0.949 ms 64 bytes from 192.168.1.2: icmp_seq=2 ttl=62 time=0.547 ms 64 bytes from 192.168.1.2: icmp_seq=3 ttl=62 time=0.779 ms ^C
And now let’s see what we get at the other pod
$ k exec -it whereami-564765b89-d8rdb /bin/bash [root@whereami-564765b89-d8rdb /]# yum install -y tcpdump ... Complete! [root@whereami-564765b89-d8rdb /]# tcpdump icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 23:00:24.579482 IP 192.168.0.9 > whereami-564765b89-d8rdb: ICMP echo request, id 12, seq 1, length 64 23:00:24.579511 IP whereami-564765b89-d8rdb > 192.168.0.9: ICMP echo reply, id 12, seq 1, length 64 23:00:25.599114 IP 192.168.0.9 > whereami-564765b89-d8rdb: ICMP echo request, id 12, seq 2, length 64 23:00:25.599138 IP whereami-564765b89-d8rdb > 192.168.0.9: ICMP echo reply, id 12, seq 2, length 64
As you can see, no SNAT is here at play, and the pods see each other’s IP addresses. How does that work? Let’s have a look at our friend iptables in one of the nodes (you might need to configure ssh public keys with the Azure CLI command `az vm update`:
jose@aks-nodepool1-35377101-0:~$ sudo iptables -t nat -vxnL POSTROUTING Chain POSTROUTING (policy ACCEPT 1 packets, 52 bytes) pkts bytes target prot opt in out source destination 288142 18188250 KUBE-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */ 0 0 MASQUERADE all -- * !docker0 172.17.0.0/16 0.0.0.0/0 247724 15771644 MASQUERADE all -- * * 0.0.0.0/0 !192.168.0.0/16 /* kubenet: SNAT for outbound traffic from cluster */ ADDRTYPE match dst-type !LOCAL
Mmmh, the last entry is interesting. Translated to English it would be something like “Masquerade (SNAT) all traffic coming from everywhere going to outside the pod network”. This happens to be a flag you can pass to the local kubelet, called `non-masquerade-cidr`. kubelet runs in AKS as a process (not as a container), so we can have a look at it with the systemctl command (I use grep for better display in my shell, for some reason systemctl status without the grep was not breaking up the lines). I have reformatted the output in multiple lines for easier reading:
jose@aks-nodepool1-35377101-0:/etc/kubernetes$ systemctl status kubelet | grep /usr/local/bin/kubelet └─3668 /usr/local/bin/kubelet --enable-server --node-labels=node-role.kubernetes.io/agent=,kubernetes.io/role=agent,agentpool=nodepool1,storageprofile=managed,storagetier=Premium_LRS,kubernetes.azure.com/cluster=MC_akstest_kubenetcluster_westeurope --v=2 --volume-plugin-dir=/etc/kubernetes/volumeplugins --address=0.0.0.0 --allow-privileged=true --anonymous-auth=false --authorization-mode=Webhook --azure-container-registry-config=/etc/kubernetes/azure.json --cadvisor-port=0 --cgroups-per-qos=true --client-ca-file=/etc/kubernetes/certs/ca.crt --cloud-config=/etc/kubernetes/azure.json --cloud-provider=azure --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --enforce-node-allocatable=pods --event-qps=0 --eviction-hard=memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5% --feature-gates=PodPriority=true --image-gc-high-threshold=85 --image-gc-low-threshold=80 --image-pull-progress-deadline=30m --keep-terminated-pod-volumes=false --kube-reserved=cpu=69m,memory=1843Mi --kubeconfig=/var/lib/kubelet/kubeconfig --max-pods=110 --network-plugin=kubenet --node-status-update-frequency=10s --non-masquerade-cidr=192.168.0.0/16 --pod-infra-container-image=k8s.gcr.io/pause-amd64:3.1 --pod-manifest-path=/etc/kubernetes/manifests --pod-max-pids=100
I was tempted to remove some lines from the previous update, but then I decided against it, since some of the options above are quite interesting (such as the max-pods limit or the kube reservations). But that is a story for another blog.
Finally, let us verify the IP addresses that are visible in the public Internet, but you probably know by now what is coming:
$ k exec whereami-564765b89-d8rdb -- curl -s ifconfig.co 137.117.164.147 $ k exec whereami-564765b89-z7p82 -- curl -s ifconfig.co 137.117.164.147 $ k exec whereami-ingress-7f97d7c96f-2lqxf -- curl -s ifconfig.co 137.117.164.147 $ k exec whereami-ingress-7f97d7c96f-66vvv -- curl -s ifconfig.co 137.117.164.147
As in the case of the Azure CNI cluster, the first public IP of the Azure Load Balancer is picked up for Internet communication to SNAT the pods.
And hereby we conclude this chapter of this blog series, I hope you learnt something today!
[…] Part 3: outbound connectivity from AKS pods […]
LikeLike
[…] Part 3: outbound connectivity from AKS pods […]
LikeLike
[…] Part 3: outbound connectivity from AKS pods […]
LikeLike
[…] Part 3: outbound connectivity from AKS pods […]
LikeLike
[…] Part 3: outbound connectivity from AKS pods […]
LikeLike