Cilium Network Policy in AKS

If you are following the Azure Kubernetes Service space, I am sure you noticed that Azure CNI powered by Cilium is Generally Available. But is this a big thing? What does it mean for you?

Well, yes, it is big indeed. It is like changing the wheels of your car to new ones: Cilium is working with an improved network data plane powered by eBPF, read more about it in https://cilium.io/.

The first thing you might notice when deploying a cluster with the cilium data plane you need to use the cilium network policy (the other options “azure” and “calico” are not available). And this is a good thing!

To start with, Cilium network policy support the traditional Kubernetes Network Policies as well as Cilium Network Policies. These Cilium Network Policies (for all lazy people like me, the short name is “cnp”) are similar to k8s network policies, but with advanced functionality.

You can use your usual kubectl commands with CNPs: you can “get” them, “describe” them, etc:

❯ k get cnp
NAME         AGE
api-netpol   43m

But I am getting ahead of myself. What I did was deploying a cluster with Azure CNI powered by Cilium, and deploying a sample application on top.

From then on, you can start configuring your policy.

Give me those packet drops!

Something interesting to note with Cilium is that there is a daemon set with some control pods:

❯ k get ds -n kube-system cilium
NAME     DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
cilium   2         2         2       2            2           kubernetes.io/os=linux   30h

These Cilium pods are very interesting. For example, you can get the packets going or leaving from a machine. Let me describe a not-so-imaginary situation: you configure a network policy, but your app still doesn’t work. You don’t know what traffic is being dropped. Who you wanna call? Cilium!

First, you need to find out in which node your pod is running. You can do it manually, but of course I had to flex my jsonquery skills:

label="run=api"
workload_node_name=$(kubectl get pods -l $label -o jsonpath='{.items[0].spec.nodeName}')
cilium_pod_name=$(kubectl -n kube-system get pods -l k8s-app=cilium -o jsonpath="{.items[?(@.spec.nodeName=='$workload_node_name')].metadata.name}")

The next bit is finding out the Cilium endpoint ID for your pod. Let me introduce you to another Kubernetes object: the Cilium Endpoint or cep:

❯ k get cep
NAME                   ENDPOINT ID   IDENTITY ID   INGRESS ENFORCEMENT   EGRESS ENFORCEMENT   VISIBILITY POLICY   ENDPOINT STATE   IPV4          IPV6
api-747777d77b-p2p9v   644           23553                     ready            10.13.80.36   
web-55ff7ffc7d-brldm   567           4383                      ready            10.13.80.9    

Of course, let put this in a variable, just because we can:

apipod_id=$(k get cep -l run=api -o jsonpath='{.items[0].status.id}')
echo "API pod has ID $apipod_id"

Now we have everything we need, we can exec into the Cilium pod and run the command cilium monitor:

❯ kubectl -n kube-system exec -ti $cilium_pod_name -- cilium monitor --from $apipod_id --type drop

Defaulted container "cilium-agent" out of: cilium-agent, install-cni-binaries (init), mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init), systemd-networkd-overrides (init), block-wireserver (init)
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (Policy denied) flow 0x20b20f97 to endpoint 644, file bpf_lxc.c line 2001, , identity world->23553: 168.63.129.16:58305 -> 10.13.80.36:8080 tcp SYN
xx drop (Policy denied) flow 0x41cdc33f to endpoint 644, file bpf_lxc.c line 2001, , identity remote-node->23553: 10.13.76.5:63185 -> 10.13.80.36:8080 tcp SYN
xx drop (Policy denied) flow 0x20b20f97 to endpoint 644, file bpf_lxc.c line 2001, , identity world->23553: 168.63.129.16:58305 -> 10.13.80.36:8080 tcp SYN
xx drop (Policy denied) flow 0x3d9a15e1 to endpoint 644, file bpf_lxc.c line 2001, , identity remote-node->23553: 10.13.76.5:63218 -> 10.13.80.36:8080 tcp SYN
xx drop (Policy denied) flow 0x1df13815 to endpoint 644, file bpf_lxc.c line 2001, , identity world->23553: 168.63.129.16:58332 -> 10.13.80.36:8080 tcp SYN
xx drop (Policy denied) flow 0xa94917ed to endpoint 0, file bpf_lxc.c line 1182, , identity 23553->world: 10.13.80.36:58215 -> 13.71.193.32:1433 tcp SYN
xx drop (Policy denied) flow 0x3d9a15e1 to endpoint 644, file bpf_lxc.c line 2001, , identity remote-node->23553: 10.13.76.5:63218 -> 10.13.80.36:8080 tcp SYN

Great! You can see that we have some traffic coming to our pod on port 8080 that we hadn’t considered, plus the outbound connection to the database that I had completely forgotten in my network policy. Now that I know, I can go and fix this!

Without this kind of visibility it used to be really hard working with network policy, now network policies are just easier to debug.

Logs, logs and logs

You are probably logging stuff from your Kubernetes cluster. “Stuff” meaning two types of logs (see Monitor AKS with Azure Monitor for more details):

  • Logs from the containers running in your cluster
  • Logs from the Kubernetes control plane managed by Microsoft

The fact that the Cilium containers run in the cluster give us visibility into certain things. For example, imagine I apply a cilium network policy with these egress rules (straight from Cilium Network Policy examples):

  egress:
    - toEndpoints:
      - matchLabels:
          "k8s:io.kubernetes.pod.namespace": kube-system
          "k8s:k8s-app": kube-dns
      toPorts:
        - ports:
           - port: "53"
             protocol: ANY
          rules:
            dns:
              - matchPattern: "*"

The policy will apply apparently successfully, and even if I describe it, I wouldn’t see anything wrong:

❯ k describe cnp
Name:         api-netpol
Namespace:    default
Labels:       
Annotations:  
API Version:  cilium.io/v2
Kind:         CiliumNetworkPolicy
Metadata:
  Creation Timestamp:  2023-06-15T12:43:34Z
  Generation:          5
  Managed Fields:
    API Version:  cilium.io/v2
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:egress:
        f:endpointSelector:
          .:
          f:matchLabels:
            .:
            f:run:
        f:ingress:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2023-06-16T09:46:06Z
  Resource Version:  814303
  UID:               ddd4a267-38c1-411a-883f-3f52a176cfc5
Spec:
  Egress:
    To Endpoints:
      Match Labels:
        k8s:io.kubernetes.pod.namespace:  kube-system
        k8s:k8s-app:                      kube-dns
    To Ports:
      Ports:
        Port:      53
        Protocol:  ANY
      Rules:
        Dns:
          Match Pattern:  *
  Endpoint Selector:
    Match Labels:
      Run:  api
  Ingress:
    From Endpoints:
      Match Labels:
        Run:  web
    To Ports:
      Ports:
        Port:      8080
        Protocol:  TCP
Events:            

And yet, I did a mistake, because Cilium network policy in AKS doesn’t support L7 rules, as described in Configure Azure CNI Powered by Cilium in AKS – Limitations. How can I find this out? Easy, let’s have a look at the logs of the Cilium container (I had set my $cilium_pod_name variable above, check previous paragraphs in this post for that):

❯ k logs -n kube-system $cilium_pod_name
level=warning msg="Error parsing new CiliumNetworkPolicy rule" error="Invalid CiliumNetworkPolicy spec: L7 policy is not supported since L7 proxy is not enabled"

After removing those offending lines, this is what a successful policy application looks like:

❯ k logs -n kube-system $cilium_pod_name
level=info msg="Imported CiliumNetworkPolicy" ciliumNetworkPolicyName=api-netpol k8sApiVersion= k8sNamespace=default subsys=k8s-watcher
level=info msg="Policy imported via API, recalculating..." policyAddRequest=52000582-e74b-473f-ba4d-008d29803785 policyRevision=21 subsys=daemon

That’s it for today

I hope that in this post I could give you a quick glimpse of two quick wins that will make your life easier when working with Network Policies in AKS.

There are many other interesting topics around Cilium network policy, such as the differences between CiliumNetworkPolicies and k8s plain NetworkPolicies, how to troubleshoot the deployment of the policy, or learning how to write advanced Cilium Network Policies, but I will leave that for another day.

Leave a comment