A day in the life of a packet in Azure Redhat Openshift (part 1)

I have been wanting to look into this for a while now, and I finally found a good excuse to do it. You might have read my series of posts on AKS networking, the goal of this is doing something similar with Azure Redhat Openshift (ARO).

This is part 1 of a blog series around networking in Azure Redhat Openshift. Other posts in the series:

First things first, what is this ARO thing? In short, it is a managed Openshift offering, where you get a fresh Openshift 4 cluster ready to take your application deployments. The assumption here is that you already have one of those running, otherwise feel free to check the doc Create an ARO4 cluster.

Alright, we have a cluster now, let us deploy a sample app. To test networking I use a small pod I developed to troubleshoot connectivity to a database. As database I will use SQL Server, just because I can:

project_name=project1
sql_password=yoursupersecurepassword
# Create project
oc new-project $project_name
# Create pods for DB and API
oc new-app --docker-image erjosito/sqlapi:0.1 -e "SQL_SERVER_FQDN=server.${project_name}.svc.cluster.local" -e "SQL_SERVER_USERNAME=sa" -e "SQL_SERVER_PASSWORD=${sql_password}"
oc new-app --docker-image mcr.microsoft.com/mssql/server:2019-latest -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=${sql_password}"
# Expose API with a svc (internal ALB)
oc expose dc sqlapi --port 8080 --type=LoadBalancer --name=sqlapilb --dry-run -o yaml | awk '1;/metadata:/{ print "  annotations:\n    service.beta.kubernetes.io/azure-load-balancer-internal: \"true\"" }' | oc create -f -
# Exposing ClusterIP Svc over a route
oc expose svc sqlapilb

OK so what have we done here? First we create a project (oc new-project). Then we deploy the pods out of Docker images (oc new-app). Note that we pass a couple of parameters to the API so that it can find the database. Then we create a new service to expose the API internally (we don’t really need this, but it will come handy later one), and finally we expose the service via a route (think of routes as of Kubernetes ingresses). Let’s have a look at the pods:

kubectl get pod -o wide
NAME              READY   STATUS      RESTARTS   AGE     IP            NODE                                   NOMINATED NODE   READINESS GATES
server-1-deploy   0/1     Completed   0          54m     10.131.0.32   aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
server-1-ppl25    1/1     Running     0          54m     10.128.2.24   aro2-p8bjm-worker-northeurope3-wl4vw   <none>           <none>
sqlapi-1-8jgx8    1/1     Running     0          40m     10.131.0.40   aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
sqlapi-1-deploy   0/1     Completed   0          40m     10.131.0.39   aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
sqlapi-pod        1/1     Running     0          3h30m   192.168.0.6   aro2-p8bjm-worker-northeurope2-rbxzc   <none>           <none>

As you can see we have a bunch of pods, with internal IP addresses (10.x.y.z). We will come back to this in a minute. But now let’s check out our services and our ingress, I mean, our route:

kubectl get svc
NAME       TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)          AGE
server     ClusterIP      172.30.72.7     <none>         1433/TCP         57m
sqlapi     ClusterIP      172.30.82.94    <none>         8080/TCP         43m
sqlapilb   LoadBalancer   172.30.226.18   192.168.0.11   8080:30039/TCP   6h50m
kubectl get route
NAME       HOST/PORT                                               PATH   SERVICES   PORT   TERMINATION   WILDCARD
sqlapilb   sqlapilb-project1.apps.m50kgrxk.northeurope.aroapp.io          sqlapilb   8080                 None

In the services you might have noticed the server and sqlapi ones, created by the command oc new-app, as well as the sqlapilb, that was created with the oc expose command.

But let’s go back to the pods’ IP addresses. Where are this coming from? Which network plugin is Openshift using? Enter the operators. In Openshift, everything has its operator. You can think of an operator as the smart colleague you would ask to do something complicated. In this case, let’s ask the network operator (BTW you can find these commands in the Openshift documentation for the Networking Operator):

oc describe network.config/cluster
Name:         cluster
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         Network
Metadata:
  Creation Timestamp:  2020-05-27T06:09:35Z
  Generation:          2
  Resource Version:    1898
  Self Link:           /apis/config.openshift.io/v1/networks/cluster
  UID:                 16792e9b-cb6f-4ffa-9dee-57d09e05bc92
Spec:
  Cluster Network:
    Cidr:         10.128.0.0/14
    Host Prefix:  23
  External IP:
    Policy:
  Network Type:  OpenShiftSDN
  Service Network:
    172.30.0.0/16
Status:
  Cluster Network:
    Cidr:               10.128.0.0/14
    Host Prefix:        23
  Cluster Network MTU:  1450
  Network Type:         OpenShiftSDN
  Service Network:
    172.30.0.0/16
Events:  <none>

OK, there is quite a bit to unpack there. Let’s start with the technology itself: the network type is defined to be “OpenShiftSDN”. The Openshift SDN plugin leverages VXLAN to encapsulate traffic between nodes, this can be more apparent looking a bit deeper:

oc get clusternetworks.network.openshift.io -o yaml
apiVersion: v1
items:
- apiVersion: network.openshift.io/v1
  clusterNetworks:
  - CIDR: 10.128.0.0/14
    hostSubnetLength: 9
  hostsubnetlength: 9
  kind: ClusterNetwork
  metadata:
    creationTimestamp: "2020-05-27T06:10:34Z"
    generation: 1
    name: default
    ownerReferences:
    - apiVersion: operator.openshift.io/v1
      blockOwnerDeletion: true
      controller: true
      kind: Network
      name: cluster
      uid: da3cf28f-2ec6-4ccd-9c51-ffc0f5897be2
    resourceVersion: "1774"
    selfLink: /apis/network.openshift.io/v1/clusternetworks/default
    uid: c74b6a66-99ff-492d-90fa-a615a84c337e
  mtu: 1450
  network: 10.128.0.0/14
  pluginName: redhat/openshift-ovs-networkpolicy
  serviceNetwork: 172.30.0.0/16
  vxlanPort: 4789
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

There you can see some interesting information such as the reduced MTU size (due to the VXLAN encapsulation overhead), as well as the UDP port used to encapsulate the packets between nodes. If you are not familiar with VXLAN, it means that the nodes build tunnels between each other, and traffic between the pods will flow inside of those tunnels. Hence Azure will only see the tunnel traffic where source and destination IP addresses are those of the nodes and not the pods’ addresses, such as the case of the kubenet plugin in ACI. As a consequence, there is no route table required like with kubenet.

Let’s continue with the IP ranges: you can see the Service Network being 172.30.0.0/16, which is the range where our services got their ClusterIP addresses (see the output of the command “oc get svc” above).

That was easy, the next one will be a bit trickier: you see the cluster network being 10.128.0.0/14. Those were the addresses of our pods alright, but how is this range partitioned across nodes? If you look in the node definition with “oc describe node” you will not see the allocated network, as you might have expected. But the network operator comes to the rescue again, with a Custom Resource Definition that will tell us exactly what we want to know:

kubectl get hostsubnets.network.openshift.io
NAME                                   HOST                                   HOST IP        SUBNET          EGRESS CIDRS   EGRESS IPS
aro2-p8bjm-master-0                    aro2-p8bjm-master-0                    192.168.0.40   10.129.0.0/23
aro2-p8bjm-master-1                    aro2-p8bjm-master-1                    192.168.0.38   10.128.0.0/23
aro2-p8bjm-master-2                    aro2-p8bjm-master-2                    192.168.0.39   10.130.0.0/23
aro2-p8bjm-worker-northeurope1-qt8l7   aro2-p8bjm-worker-northeurope1-qt8l7   192.168.0.4    10.131.0.0/23
aro2-p8bjm-worker-northeurope2-rbxzc   aro2-p8bjm-worker-northeurope2-rbxzc   192.168.0.6    10.129.2.0/23
aro2-p8bjm-worker-northeurope3-wl4vw   aro2-p8bjm-worker-northeurope3-wl4vw   192.168.0.5    10.128.2.0/23  

You can see that each node gets a /23 range, that should be more than enough for a few pods. We could verify that having a look at the pods in one of the nodes. Let us take node 1, that got the range 10.131.0.0/23 (in case you have forgotten your binary, that would be any address like 10.131.0.x or 10.131.1.x):

kubectl get pod -A -o wide | grep aro2-p8bjm-worker-northeurope1-qt8l7
kuard1                                                  kuard-amd64-1-deploy                                              0/1     Completed   0          10h    10.131.0.19    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
kuard1                                                  kuard-amd64-1-qqdqm                                               1/1     Running     0          10h    10.131.0.20    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
kuard2                                                  kuard-amd64-1-deploy                                              0/1     Completed   0          10h    10.131.0.21    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
kuard2                                                  kuard-amd64-1-vvvlw                                               1/1     Running     0          10h    10.131.0.22    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-azure-logging                                 mdsd-2qxr2                                                        4/4     Running     0          10h    10.131.0.8     aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-cluster-node-tuning-operator                  tuned-8vk6r                                                       1/1     Running     0          10h    192.168.0.4    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-dns                                           dns-default-m5wkx                                                 2/2     Running     0          10h    10.131.0.12    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-image-registry                                image-registry-599645fd69-shsgf                                   1/1     Running     0          10h    10.131.0.11    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-image-registry                                node-ca-6xqt6                                                     1/1     Running     0          10h    10.131.0.4     aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-ingress                                       router-default-cf4d7b6d5-xlkxx                                    1/1     Running     0          10h    10.131.0.18    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-machine-config-operator                       machine-config-daemon-4xwjt                                       2/2     Running     0          10h    192.168.0.4    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-marketplace                                   certified-operators-f686bcd89-dtvd9                               1/1     Running     0          10h    10.131.0.10    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-marketplace                                   community-operators-599d7b4d5f-twnmq                              1/1     Running     0          10h    10.131.0.6     aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-marketplace                                   redhat-operators-69c6f6bbfd-7nfp9                                 1/1     Running     0          33m    10.131.0.31    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-monitoring                                    alertmanager-main-0                                               3/3     Running     0          10h    10.131.0.17    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-monitoring                                    kube-state-metrics-55f99f49d7-lm8xb                               3/3     Running     0          10h    10.131.0.2     aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-monitoring                                    node-exporter-mz26r                                               2/2     Running     0          10h    192.168.0.4    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-monitoring                                    openshift-state-metrics-8f859d745-jx84p                           3/3     Running     0          10h    10.131.0.7     aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-multus                                        multus-v8srq                                                      1/1     Running     0          10h    192.168.0.4    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-sdn                                           ovs-frcjl                                                         1/1     Running     0          10h    192.168.0.4    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
openshift-sdn                                           sdn-rlqgc                                                         1/1     Running     0          10h    192.168.0.4    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>
project1                                                sqlapi-1-4mb78                                                    1/1     Running     0          138m   10.131.0.30    aro2-p8bjm-worker-northeurope1-qt8l7   <none>           <none>

And fair enough, most IP addresses are 10.131.0.x, check! But wait a second, there are other pods with the IP address 192.168.0.4? And all of them have the same IP address? We will discuss that in a future post, but if you really want to know now it has to do with an attribute called “hostNetwork” that you can set on each pod, and assigns to it the IP address of the node itself.

Let us wrap it up here for the first post, in the next one we will dive into how pods talk to each other, and how connectivity flows to and from the Internet.

6 thoughts on “A day in the life of a packet in Azure Redhat Openshift (part 1)

  1. Srini

    Thank you for the series. It is very useful to get such insights into ARO SDN. Can you please share the sample app that you have used in your series, it will be useful for testing purposes.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: