A Day in the Life of a Packet in AKS (part 6): Network Policy

This post is a continuation from Part 5: Virtual Node. Other posts in this series:

  • Part 1: deep dive in AKS with Azure CNI in your own vnet
  • Part 2: deep dive in AKS with kubenet in your own vnet, and ingress controllers
  • Part 3: outbound connectivity from AKS pods
  • Part 4: NSGs with Azure CNI cluster
  • Part 5: Virtual Node
  • Part 6 (this one): Network Policy with Azure CNI

In part 5 we had two deployments, one in the default namespace using a LoadBalancer service, and another one in a different namespace using the nginx-based ingress controller provided by the HTTP application routing addon of AKS (again, remember that it is not recommended to use this addon for production workloads).

Network Policy

Let’s try to reach pods in the namespace “ingress” from the pods in the namespace “default”:

$ k -n ingress get pod -o wide
NAME                                   READY     STATUS    RESTARTS   AGE       IP            NODE                       NOMINATED NODE
kuard-vnode-ingress-84d8c9586f-5kf2p   1/1       Running   0          20h       10.13.76.52   aks-nodepool1-26711606-1   <none>
kuard-vnode-ingress-84d8c9586f-fgr4j   1/1       Running   0          6m53s     10.13.100.5   virtual-node-aci-linux     <none>
$ k -n default get pod -o wide
NAME                          READY     STATUS    RESTARTS   AGE       IP            NODE                       NOMINATED NODE
kuard-vnode-bd88cbf77-n8dzg   1/1       Running   0          28h       10.13.76.36   aks-nodepool1-26711606-1   <none>
kuard-vnode-bd88cbf77-rmr66   1/1       Running   0          20h       10.13.76.26   aks-nodepool1-26711606-0   <none>
$ k exec kuard-vnode-bd88cbf77-n8dzg -- wget -qO- --timeout=3 --server-response http://10.13.76.52:8080 2>&1 | grep "HTTP\/1.1 "
  HTTP/1.1 200 OK

Now let us apply a network policy. We will use this manifest for the policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: drop-inter-namespace
  namespace: ingress
spec:
  podSelector:
    matchLabels:
      app: kuard-vnode-ingress
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: TCP
      port: 8080

By the way, that namespace label is not automatically created, you need to put it there:

$ k get ns --show-labels
NAME          STATUS    AGE       LABELS
default       Active    3d1h      <none>
ingress       Active    2d15h     <none>
kube-public   Active    3d1h      <none>
kube-system   Active    3d1h      <none>
$
$ k label ns/kube-system name=kube-system
namespace/kube-system labeled
$ k get ns --show-labels
NAME          STATUS    AGE       LABELS
default       Active    3d1h      <none>
ingress       Active    2d15h     <none>
kube-public   Active    3d1h      <none>
kube-system   Active    3d1h      name=kube-system

Now we can apply the policy:

$ k apply -f ./isolate_ingress.yaml
networkpolicy.networking.k8s.io/drop-inter-namespace created
$ k exec kuard-vnode-bd88cbf77-n8dzg -- wget -qO- --timeout=3 --server-response http://10.13.76.52:8080 2>&1 | grep "HTTP\/1.1 "
$

As you can see, after applying the policy traffic is not allowed from the pod in the default namespace any more. We can have a look at the networkpolicy resource for more information (the short name for “networkpolicy” is “netpol”, if you are as lazy as I am):

$ k -n ingress get netpol
NAME                   POD-SELECTOR              AGE
drop-inter-namespace   app=kuard-vnode-ingress   6m17s
$ k -n ingress describe netpol/drop-inter-namespace
Name:         drop-inter-namespace
Namespace:    ingress
Created on:   2019-04-04 14:32:25 +0200 DST
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"networking.k8s.io/v1","kind":"NetworkPolicy","metadata":{"annotations":{},"name":"drop-inter-namespace","namespace":"ingress"},"spec":{"...
Spec:
  PodSelector:     app=kuard-vnode-ingress
  Allowing ingress traffic:
    To Port: 8080/TCP
    From:
      NamespaceSelector: name=kube-system
  Allowing egress traffic:
    <none> (Selected pods are isolated for egress connectivity)
  Policy Types: Ingress

As you can see, the ingress part of the policy allows only traffic incoming from the pods in the kube-system namespace, including the ingress nginx.

What does the Network Policy actually do? First of all, it changes a bit how routing inside of the node works. If you remember Part1, routing with Azure CNI was fairly simple: you had a layer-2 bridge azure0 interconnecting the “physical” interface of the node with the “logical” interface of the pods. If we look at it now, it is not there any more (only docker0, the bridge used by the container runtime):

jose@aks-nodepool1-26711606-0:~$ brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.0242e0150395       no

So how does routing work now with network policy? Let’s have a look at the routing table:

jose@aks-nodepool1-26711606-0:~$ route -nv
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.13.76.1      0.0.0.0         UG    0      0        0 eth0
10.13.76.0      0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.13.76.6      0.0.0.0         255.255.255.255 UH    0      0        0 caliacdcffac54f
10.13.76.7      0.0.0.0         255.255.255.255 UH    0      0        0 cali56442c847c1
10.13.76.9      0.0.0.0         255.255.255.255 UH    0      0        0 cali291386878c1
10.13.76.10     0.0.0.0         255.255.255.255 UH    0      0        0 calid6217ac985b
10.13.76.12     0.0.0.0         255.255.255.255 UH    0      0        0 cali7b64e095a5a
10.13.76.13     0.0.0.0         255.255.255.255 UH    0      0        0 cali90ac3745b59
10.13.76.14     0.0.0.0         255.255.255.255 UH    0      0        0 cali4fb414dbb9c
10.13.76.15     0.0.0.0         255.255.255.255 UH    0      0        0 cali80d9382b3e8
10.13.76.17     0.0.0.0         255.255.255.255 UH    0      0        0 cali836fd394022
10.13.76.19     0.0.0.0         255.255.255.255 UH    0      0        0 calid3b581beed8
10.13.76.20     0.0.0.0         255.255.255.255 UH    0      0        0 calicabd43ccde9
10.13.76.24     0.0.0.0         255.255.255.255 UH    0      0        0 cali21ea7a92e42
10.13.76.25     0.0.0.0         255.255.255.255 UH    0      0        0 calia9fadc8e1f1
10.13.76.26     0.0.0.0         255.255.255.255 UH    0      0        0 cali17b90f98476
10.13.76.27     0.0.0.0         255.255.255.255 UH    0      0        0 calic2613659d11
10.13.76.29     0.0.0.0         255.255.255.255 UH    0      0        0 cali7d43ea164cd
10.13.76.32     0.0.0.0         255.255.255.255 UH    0      0        0 cali1d965e35ba9
10.13.76.34     0.0.0.0         255.255.255.255 UH    0      0        0 cali2753053547f
168.63.129.16   10.13.76.1      255.255.255.255 UGH   0      0        0 eth0
169.254.169.254 10.13.76.1      255.255.255.255 UGH   0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Oh, this is very different to the implementation without Network Policy! Essentially this looks like a /32 routing system, pointing to calixxxx interfaces (“cali” is short for “calico”, the network policy model used with AKS at the time of this writing), as opposed to the layer-2 model of the Azure CNI.

Let us verify with our nginx pod. As these commands prove, the routes from the previous table point to the interfaces piped to the pod’s network namespaces:

jose@aks-nodepool1-26711606-0:~$ sudo docker ps | grep nginx
ef7a7f64efa1        quayio.azureedge.net/kubernetes-ingress-controller/nginx-ingress-controller   "/entrypoint.sh /ngi…"   2 days ago          Up 2 days                               k8s_addon-http-application-routing-nginx-ingress-controll
er_addon-http-application-routing-nginx-ingress-controller-8fx6v2r_kube-system_aa952aeb-5562-11e9-b161-9a6af760136f_0
eedee9b9618b        k8s.gcr.io/pause-amd64:3.1                                                    "/pause"                 2 days ago          Up 2 days                               k8s_POD_addon-http-application-routing-nginx-ingress-cont
roller-8fx6v2r_kube-system_aa952aeb-5562-11e9-b161-9a6af760136f_0
533ab64405f4        ebe2c7c61055                                                                  "nginx -g 'daemon of…"   3 days ago          Up 3 days                               k8s_azureproxy_kube-svc-redirect-rkhpb_kube-system_20ebfe
15-5517-11e9-b161-9a6af760136f_0
jose@aks-nodepool1-26711606-0:~$ sudo docker inspect --format '{{ .State.Pid }}' ef7a7f64efa1
17233
jose@aks-nodepool1-26711606-0:~$ sudo nsenter -t 17233 -n ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
3: eth0@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether f6:27:71:75:8d:27 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.13.76.17/32 scope global eth0
       valid_lft forever preferred_lft forever
jose@aks-nodepool1-26711606-0:~$ ip a | grep 21:
21: cali836fd394022@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
jose@aks-nodepool1-26711606-0:~$ route -nv | grep 10.13.76.17
10.13.76.17     0.0.0.0         255.255.255.255 UH    0      0        0 cali836fd394022

Wow. But actually I like this, let me explain: now it is very easy to identify the interfaces for specific pods, you just need to have a look at the routing table. For example, since we configured a Network Policy affecting the pods in our ingress namespaces, let us find out one of the “cali” interfaces:

$ k -n ingress get pod -o wide
NAME                                   READY     STATUS    RESTARTS   AGE       IP            NODE                       NOMINATED NODE
kuard-vnode-ingress-7b6868dd49-drldk   1/1       Running   0          29m       10.13.76.19   aks-nodepool1-26711606-0   <none>
kuard-vnode-ingress-7b6868dd49-q9tf4   1/1       Running   0          66m       10.13.76.6    aks-nodepool1-26711606-0   <none>
$ ssh -J $publicip 10.13.76.4
Welcome to Ubuntu 16.04.5 LTS (GNU/Linux 4.15.0-1037-azure x86_64)
[...]
jose@aks-nodepool1-26711606-0:~$ route -nv | grep 10.13.76.19
10.13.76.19     0.0.0.0         255.255.255.255 UH    0      0        0 calid3b581beed8
jose@aks-nodepool1-26711606-0:~$

Let’s look for iptables rules matching on that interface:

jose@aks-nodepool1-26711606-0:~$ sudo iptables-save | grep calid3b581beed8
:cali-fw-calid3b581beed8 - [0:0]
:cali-tw-calid3b581beed8 - [0:0]
-A cali-from-wl-dispatch-d -i calid3b581beed8 -m comment --comment "cali:sisBAcnsda1tZ1I6" -g cali-fw-calid3b581beed8
-A cali-fw-calid3b581beed8 -m comment --comment "cali:eI0X4A8BlwA0sjko" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A cali-fw-calid3b581beed8 -m comment --comment "cali:qaBPX8HNrel-_TA3" -m conntrack --ctstate INVALID -j DROP
-A cali-fw-calid3b581beed8 -m comment --comment "cali:Jq_5ut2Y35OHAh7u" -j MARK --set-xmark 0x0/0x10000
-A cali-fw-calid3b581beed8 -m comment --comment "cali:zU6Qm0dPM2KuAPnY" -j cali-pro-kns.ingress
-A cali-fw-calid3b581beed8 -m comment --comment "cali:iwM9IW2Gzp_XhmTu" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
-A cali-fw-calid3b581beed8 -m comment --comment "cali:yTxfdXovOk_HY__Q" -j cali-pro-ksa.ingress.default
-A cali-fw-calid3b581beed8 -m comment --comment "cali:xz0HLpLrFN89BguB" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
-A cali-fw-calid3b581beed8 -m comment --comment "cali:rTeHOIDtEq3OBgQ-" -m comment --comment "Drop if no profiles matched" -j DROP
-A cali-to-wl-dispatch-d -o calid3b581beed8 -m comment --comment "cali:y3O234FtkeR9ducV" -g cali-tw-calid3b581beed8
-A cali-tw-calid3b581beed8 -m comment --comment "cali:7XHdmNu1zVtvWUdb" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A cali-tw-calid3b581beed8 -m comment --comment "cali:yzbqtmTdvcucYXRk" -m conntrack --ctstate INVALID -j DROP
-A cali-tw-calid3b581beed8 -m comment --comment "cali:hHFW5V6wHLc8mcEK" -j MARK --set-xmark 0x0/0x10000
-A cali-tw-calid3b581beed8 -m comment --comment "cali:W69tA-LPeC-XSNBQ" -m comment --comment "Start of policies" -j MARK --set-xmark 0x0/0x20000
-A cali-tw-calid3b581beed8 -m comment --comment "cali:kE02v2aA1qh8HxxR" -m mark --mark 0x0/0x20000 -j cali-pi-_FSXf14qKl1ZqjRcvUXU
-A cali-tw-calid3b581beed8 -m comment --comment "cali:2HPKw4B7yxk7mqMM" -m comment --comment "Return if policy accepted" -m mark --mark 0x10000/0x10000 -j RETURN
-A cali-tw-calid3b581beed8 -m comment --comment "cali:U1Sl_cfTEErYbbZL" -m comment --comment "Drop if no policies passed packet" -m mark --mark 0x0/0x20000 -j DROP
-A cali-tw-calid3b581beed8 -m comment --comment "cali:l-FpQiC4LxIZJMMQ" -j cali-pri-kns.ingress
-A cali-tw-calid3b581beed8 -m comment --comment "cali:jptOzFu3EdNgVAjj" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
-A cali-tw-calid3b581beed8 -m comment --comment "cali:dWzr_J7EfPumyCgs" -j cali-pri-ksa.ingress.default
-A cali-tw-calid3b581beed8 -m comment --comment "cali:WxUuMRbkpCDau6q1" -m comment --comment "Return if profile accepted" -m mark --mark 0x10000/0x10000 -j RETURN
-A cali-tw-calid3b581beed8 -m comment --comment "cali:TD9zkf0Ym9GktnsU" -m comment --comment "Drop if no profiles matched" -j DROP

The most relevant rule here is the one that jumps to the firewall chain “cali-pi-_FSXf14qKl1ZqjRcvUXU”. Let’s have a look at that chain:

jose@aks-nodepool1-26711606-0:~$ sudo iptables --list-rules cali-pi-_FSXf14qKl1ZqjRcvUXU
-N cali-pi-_FSXf14qKl1ZqjRcvUXU
-A cali-pi-_FSXf14qKl1ZqjRcvUXU -p tcp -m comment --comment "cali:grRCxE4BRKEgkFyx" -m set --match-set cali40s:d0vaXDV0OjdKq6czssWe9SI src -m multiport --dports 8080 -j MARK --set-xmark 0x10000/0x10000
-A cali-pi-_FSXf14qKl1ZqjRcvUXU -m comment --comment "cali:hEdW_nNIxpnlEO6A" -m mark --mark 0x10000/0x10000 -j RETURN

Mmmht, this looks interesting, a rule matching on port 8080! It is matching as well on an IP set (“cali40s:d0vaXDV0OjdKq6czssWe9SI”). We can use the tool ipset to proceed with our investigations:

jose@aks-nodepool1-26711606-0:~$ sudo ipset list cali40s:d0vaXDV0OjdKq6czssWe9SI
Name: cali40s:d0vaXDV0OjdKq6czssWe9SI
Type: hash:net
Revision: 6
Header: family inet hashsize 1024 maxelem 1048576
Size in memory: 1304
References: 3
Number of entries: 15
Members:
10.13.76.13
10.13.76.20
10.13.76.32
10.13.76.34
10.13.76.17
10.13.76.25
10.13.76.7
10.13.76.10
10.13.76.9
10.13.76.12
10.13.76.27
10.13.76.24
10.13.76.29
10.13.76.14
10.13.76.15

Which happen to be the pods referenced by our policy, that is, all pods running in the kube-system namespace (except the ones running in the node’s network namespace with IP address 10.13.76.4):

$ k -n kube-system get pod -o wide
NAME                                                              READY     STATUS    RESTARTS   AGE       IP            NODE                       NOMINATED NODE
aci-connector-linux-7775fbcf5f-mhrjs                              1/1       Running   5          3d        10.13.76.13   aks-nodepool1-26711606-0   <none>
addon-http-application-routing-default-http-backend-8cdc9dzfwkw   1/1       Running   0          2d16h     10.13.76.34   aks-nodepool1-26711606-0   <none>
addon-http-application-routing-external-dns-6f9bb9b4bf-v4tk6      1/1       Running   0          2d16h     10.13.76.14   aks-nodepool1-26711606-0   <none>
addon-http-application-routing-nginx-ingress-controller-8fx6v2r   1/1       Running   0          2d16h     10.13.76.17   aks-nodepool1-26711606-0   <none>
azure-cni-networkmonitor-j2vlw                                    1/1       Running   0          3d1h      10.13.76.4    aks-nodepool1-26711606-0   <none>
azure-ip-masq-agent-8f7gw                                         1/1       Running   0          3d1h      10.13.76.4    aks-nodepool1-26711606-0   <none>
calico-node-snct2                                                 0/2       Pending   0          2d1h      <none>        <none>                     <none>
calico-node-zwt7d                                                 2/2       Running   0          3d1h      10.13.76.4    aks-nodepool1-26711606-0   <none>
calico-typha-74c44c79b5-99r2z                                     1/1       Running   0          3d1h      10.13.76.4    aks-nodepool1-26711606-0   <none>
calico-typha-horizontal-autoscaler-6b69cf54f-665cn                1/1       Running   0          3d1h      10.13.76.29   aks-nodepool1-26711606-0   <none>
coredns-754f947b4-4pl2g                                           1/1       Running   0          3d1h      10.13.76.9    aks-nodepool1-26711606-0   <none>
coredns-754f947b4-qbl4d                                           1/1       Running   0          47m       10.13.76.12   aks-nodepool1-26711606-0   <none>
coredns-754f947b4-tcct9                                           1/1       Running   0          3d1h      10.13.76.7    aks-nodepool1-26711606-0   <none>
coredns-autoscaler-6fcdb7d64-2tqsn                                1/1       Running   0          3d1h      10.13.76.25   aks-nodepool1-26711606-0   <none>
heapster-5fb7488d97-5rznw                                         2/2       Running   0          2d16h     10.13.76.27   aks-nodepool1-26711606-0   <none>
kube-proxy-fr62z                                                  1/1       Running   0          2d16h     10.13.76.4    aks-nodepool1-26711606-0   <none>
kube-svc-redirect-rkhpb                                           2/2       Running   0          3d1h      10.13.76.4    aks-nodepool1-26711606-0   <none>
kubernetes-dashboard-847bb4ddc6-gzddn                             1/1       Running   0          3d1h      10.13.76.15   aks-nodepool1-26711606-0   <none>
metrics-server-7b97f9cd9-92cdw                                    1/1       Running   0          3d1h      10.13.76.32   aks-nodepool1-26711606-0   <none>
omsagent-rs-ccd94f4cf-44df9                                       1/1       Running   0          3d1h      10.13.76.24   aks-nodepool1-26711606-0   <none>
omsagent-xmd6d                                                    1/1       Running   0          3d1h      10.13.76.20   aks-nodepool1-26711606-0   <none>
tunnelfront-6fff97b995-qvzfx                                      1/1       Running   0          3d1h      10.13.76.10   aks-nodepool1-26711606-0   <none>

Let’s have a look again at the relevant rules:

-A cali-tw-calid3b581beed8 -m comment --comment "cali:kE02v2aA1qh8HxxR" -m mark --mark 0x0/0x20000 -j cali-pi-_FSXf14qKl1ZqjRcvUXU
-A cali-tw-calid3b581beed8 -m comment --comment "cali:2HPKw4B7yxk7mqMM" -m comment --comment "Return if policy accepted" -m mark --mark 0x10000/0x10000 -j RETURN
-A cali-tw-calid3b581beed8 -m comment --comment "cali:U1Sl_cfTEErYbbZL" -m comment --comment "Drop if no policies passed packet" -m mark --mark 0x0/0x20000 -j DROP

So essentially the first rule goes to “cali:kE02v2aA1qh8HxxR” above is marking with the value 0x10000 all packets coming from pods in the kube-system namespace, addressed to port 8080, and sent to the interface connected to the pods.

The second rule returns to the normal packet flow matching on the mark of the previous rule. That is, if the policy was accepted and the packet was compliant.

The second one will drop the packet, but only if it has a 0x20000 mark. The bit 0x20000 is only set if there is a network policy, to respect the default permit any any.

And that makes the walkthrough around Calico network policy for AKS clusters using advanced networking (Azure CNI in your own Vnet).

4 thoughts on “A Day in the Life of a Packet in AKS (part 6): Network Policy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: