I have been wanting to look into this for a while now, and I finally found a good excuse to do it. You might have read my series of posts on AKS networking, the goal of this is doing something similar with Azure Redhat Openshift (ARO).
This is part 1 of a blog series around networking in Azure Redhat Openshift. Other posts in the series:
- Part 1: Intro and SDN Plugin
- Part 2: Internet and Intra-cluster Communication
- Part 3: Inter-Project and Vnet Communication
- Part 4: Private Link and DNS
- Part 5: Private and Public routers
First things first, what is this ARO thing? In short, it is a managed Openshift offering, where you get a fresh Openshift 4 cluster ready to take your application deployments. The assumption here is that you already have one of those running, otherwise feel free to check the doc Create an ARO4 cluster.
Alright, we have a cluster now, let us deploy a sample app. To test networking I use a small pod I developed to troubleshoot connectivity to a database. As database I will use SQL Server, just because I can:
project_name=project1 sql_password=yoursupersecurepassword # Create project oc new-project $project_name # Create pods for DB and API oc new-app --docker-image erjosito/sqlapi:0.1 -e "SQL_SERVER_FQDN=server.${project_name}.svc.cluster.local" -e "SQL_SERVER_USERNAME=sa" -e "SQL_SERVER_PASSWORD=${sql_password}" oc new-app --docker-image mcr.microsoft.com/mssql/server:2019-latest -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=${sql_password}" # Expose API with a svc (internal ALB) oc expose dc sqlapi --port 8080 --type=LoadBalancer --name=sqlapilb --dry-run -o yaml | awk '1;/metadata:/{ print " annotations:\n service.beta.kubernetes.io/azure-load-balancer-internal: \"true\"" }' | oc create -f - # Exposing ClusterIP Svc over a route oc expose svc sqlapilb
OK so what have we done here? First we create a project (oc new-project
). Then we deploy the pods out of Docker images (oc new-app
). Note that we pass a couple of parameters to the API so that it can find the database. Then we create a new service to expose the API internally (we don’t really need this, but it will come handy later one), and finally we expose the service via a route (think of routes as of Kubernetes ingresses). Let’s have a look at the pods:
kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES server-1-deploy 0/1 Completed 0 54m 10.131.0.32 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none> server-1-ppl25 1/1 Running 0 54m 10.128.2.24 aro2-p8bjm-worker-northeurope3-wl4vw <none> <none> sqlapi-1-8jgx8 1/1 Running 0 40m 10.131.0.40 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none> sqlapi-1-deploy 0/1 Completed 0 40m 10.131.0.39 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none> sqlapi-pod 1/1 Running 0 3h30m 192.168.0.6 aro2-p8bjm-worker-northeurope2-rbxzc <none> <none>
As you can see we have a bunch of pods, with internal IP addresses (10.x.y.z). We will come back to this in a minute. But now let’s check out our services and our ingress, I mean, our route:
kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE server ClusterIP 172.30.72.7 <none> 1433/TCP 57m sqlapi ClusterIP 172.30.82.94 <none> 8080/TCP 43m sqlapilb LoadBalancer 172.30.226.18 192.168.0.11 8080:30039/TCP 6h50m
kubectl get route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD sqlapilb sqlapilb-project1.apps.m50kgrxk.northeurope.aroapp.io sqlapilb 8080 None
In the services you might have noticed the server
and sqlapi
ones, created by the command oc new-app
, as well as the sqlapilb
, that was created with the oc expose
command.
But let’s go back to the pods’ IP addresses. Where are this coming from? Which network plugin is Openshift using? Enter the operators. In Openshift, everything has its operator. You can think of an operator as the smart colleague you would ask to do something complicated. In this case, let’s ask the network operator (BTW you can find these commands in the Openshift documentation for the Networking Operator):
oc describe network.config/cluster Name: cluster Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: Network Metadata: Creation Timestamp: 2020-05-27T06:09:35Z Generation: 2 Resource Version: 1898 Self Link: /apis/config.openshift.io/v1/networks/cluster UID: 16792e9b-cb6f-4ffa-9dee-57d09e05bc92 Spec: Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 External IP: Policy: Network Type: OpenShiftSDN Service Network: 172.30.0.0/16 Status: Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Cluster Network MTU: 1450 Network Type: OpenShiftSDN Service Network: 172.30.0.0/16 Events: <none>
OK, there is quite a bit to unpack there. Let’s start with the technology itself: the network type is defined to be “OpenShiftSDN”. The Openshift SDN plugin leverages VXLAN to encapsulate traffic between nodes, this can be more apparent looking a bit deeper:
oc get clusternetworks.network.openshift.io -o yaml apiVersion: v1 items: - apiVersion: network.openshift.io/v1 clusterNetworks: - CIDR: 10.128.0.0/14 hostSubnetLength: 9 hostsubnetlength: 9 kind: ClusterNetwork metadata: creationTimestamp: "2020-05-27T06:10:34Z" generation: 1 name: default ownerReferences: - apiVersion: operator.openshift.io/v1 blockOwnerDeletion: true controller: true kind: Network name: cluster uid: da3cf28f-2ec6-4ccd-9c51-ffc0f5897be2 resourceVersion: "1774" selfLink: /apis/network.openshift.io/v1/clusternetworks/default uid: c74b6a66-99ff-492d-90fa-a615a84c337e mtu: 1450 network: 10.128.0.0/14 pluginName: redhat/openshift-ovs-networkpolicy serviceNetwork: 172.30.0.0/16 vxlanPort: 4789 kind: List metadata: resourceVersion: "" selfLink: ""
There you can see some interesting information such as the reduced MTU size (due to the VXLAN encapsulation overhead), as well as the UDP port used to encapsulate the packets between nodes. If you are not familiar with VXLAN, it means that the nodes build tunnels between each other, and traffic between the pods will flow inside of those tunnels. Hence Azure will only see the tunnel traffic where source and destination IP addresses are those of the nodes and not the pods’ addresses, such as the case of the kubenet plugin in ACI. As a consequence, there is no route table required like with kubenet.
Let’s continue with the IP ranges: you can see the Service Network being 172.30.0.0/16, which is the range where our services got their ClusterIP addresses (see the output of the command “oc get svc
” above).
That was easy, the next one will be a bit trickier: you see the cluster network being 10.128.0.0/14. Those were the addresses of our pods alright, but how is this range partitioned across nodes? If you look in the node definition with “oc describe node
” you will not see the allocated network, as you might have expected. But the network operator comes to the rescue again, with a Custom Resource Definition that will tell us exactly what we want to know:
kubectl get hostsubnets.network.openshift.io NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS aro2-p8bjm-master-0 aro2-p8bjm-master-0 192.168.0.40 10.129.0.0/23 aro2-p8bjm-master-1 aro2-p8bjm-master-1 192.168.0.38 10.128.0.0/23 aro2-p8bjm-master-2 aro2-p8bjm-master-2 192.168.0.39 10.130.0.0/23 aro2-p8bjm-worker-northeurope1-qt8l7 aro2-p8bjm-worker-northeurope1-qt8l7 192.168.0.4 10.131.0.0/23 aro2-p8bjm-worker-northeurope2-rbxzc aro2-p8bjm-worker-northeurope2-rbxzc 192.168.0.6 10.129.2.0/23 aro2-p8bjm-worker-northeurope3-wl4vw aro2-p8bjm-worker-northeurope3-wl4vw 192.168.0.5 10.128.2.0/23
You can see that each node gets a /23 range, that should be more than enough for a few pods. We could verify that having a look at the pods in one of the nodes. Let us take node 1, that got the range 10.131.0.0/23 (in case you have forgotten your binary, that would be any address like 10.131.0.x or 10.131.1.x):
kubectl get pod -A -o wide | grep aro2-p8bjm-worker-northeurope1-qt8l7
kuard1 kuard-amd64-1-deploy 0/1 Completed 0 10h 10.131.0.19 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
kuard1 kuard-amd64-1-qqdqm 1/1 Running 0 10h 10.131.0.20 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
kuard2 kuard-amd64-1-deploy 0/1 Completed 0 10h 10.131.0.21 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
kuard2 kuard-amd64-1-vvvlw 1/1 Running 0 10h 10.131.0.22 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-azure-logging mdsd-2qxr2 4/4 Running 0 10h 10.131.0.8 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-cluster-node-tuning-operator tuned-8vk6r 1/1 Running 0 10h 192.168.0.4 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-dns dns-default-m5wkx 2/2 Running 0 10h 10.131.0.12 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-image-registry image-registry-599645fd69-shsgf 1/1 Running 0 10h 10.131.0.11 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-image-registry node-ca-6xqt6 1/1 Running 0 10h 10.131.0.4 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-ingress router-default-cf4d7b6d5-xlkxx 1/1 Running 0 10h 10.131.0.18 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-machine-config-operator machine-config-daemon-4xwjt 2/2 Running 0 10h 192.168.0.4 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-marketplace certified-operators-f686bcd89-dtvd9 1/1 Running 0 10h 10.131.0.10 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-marketplace community-operators-599d7b4d5f-twnmq 1/1 Running 0 10h 10.131.0.6 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-marketplace redhat-operators-69c6f6bbfd-7nfp9 1/1 Running 0 33m 10.131.0.31 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-monitoring alertmanager-main-0 3/3 Running 0 10h 10.131.0.17 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-monitoring kube-state-metrics-55f99f49d7-lm8xb 3/3 Running 0 10h 10.131.0.2 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-monitoring node-exporter-mz26r 2/2 Running 0 10h 192.168.0.4 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-monitoring openshift-state-metrics-8f859d745-jx84p 3/3 Running 0 10h 10.131.0.7 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-multus multus-v8srq 1/1 Running 0 10h 192.168.0.4 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-sdn ovs-frcjl 1/1 Running 0 10h 192.168.0.4 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
openshift-sdn sdn-rlqgc 1/1 Running 0 10h 192.168.0.4 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
project1 sqlapi-1-4mb78 1/1 Running 0 138m 10.131.0.30 aro2-p8bjm-worker-northeurope1-qt8l7 <none> <none>
And fair enough, most IP addresses are 10.131.0.x, check! But wait a second, there are other pods with the IP address 192.168.0.4? And all of them have the same IP address? We will discuss that in a future post, but if you really want to know now it has to do with an attribute called “hostNetwork” that you can set on each pod, and assigns to it the IP address of the node itself.
Let us wrap it up here for the first post, in the next one we will dive into how pods talk to each other, and how connectivity flows to and from the Internet.
[…] Part 1: Intro, plugin […]
LikeLike
[…] Part 1: Intro, Plugin […]
LikeLike
[…] Part 1: Intro and SDN Plugin […]
LikeLike
[…] Part 1: Intro and SDN Plugin […]
LikeLike
Thank you for the series. It is very useful to get such insights into ARO SDN. Can you please share the sample app that you have used in your series, it will be useful for testing purposes.
LikeLike
Absolutely: https://github.com/erjosito/whoami/tree/master/api
LikeLike