This is part 3 of a blog series around networking in Azure Redhat Openshift, and we will see how pods talk to each other inside of the cluster and to other systems in the virtual Network or on-premises. Other posts in the series:
- Part 1: Intro and SDN Plugin
- Part 2: Internet and Intra-cluster Communication
- Part 3: Inter-Project and Vnet Communication
- Part 4: Private Link and DNS
- Part 5: Private and Public routers
In previous parts of this blog series we have seen how pods can talk between each other. Is that happening as well across project boundaries? Let’s find out, but the SDN plugin that we saw is used in Azure Redhat Openshift can give us some clues already.
Let’s start creating another API server in a different project, project2, that will try to access the SQL Server in project1:
# Variables project_name=project2 # New project oc new-project $project_name sql_password=yoursupersecurepassword # New app oc new-app --docker-image erjosito/sqlapi:0.1 -e "SQL_SERVER_FQDN=server.project1.svc.cluster.local" -e "SQL_SERVER_USERNAME=sa" -e "SQL_SERVER_PASSWORD=${sql_password}" # Exposing ClusterIP Svc over a route oc expose svc sqlapi
Note that the FQDN for the SQL Server is still pointing to the one in project1, and that we have not deployed a SQL Server in project2. We can verify that our new API is up and running, and that it has connectivity to the SQL Server in project1:
curl "http://sqlapilb-project2.apps.m50kgrxk.northeurope.aroapp.io/api/healthcheck" { "health": "OK" }
curl "http://sqlapilb-project2.apps.m50kgrxk.northeurope.aroapp.io/api/sqlversion" { "sql_output": "Microsoft SQL Server 2019 (RTM-CU4) (KB4548597) - 15.0.4033.1 (X64) \n\tMar 14 2020 16:10:35 \n\tCopyright (C) 2019 Microsoft Corporation\n\tDeveloper Edition (64-bit) on Linux (Ubuntu 18.04.4 LTS) " }
There are two lessons that can be learnt here. The first one is that Azure Redhat Openshift uses the SDN plugin in network policy mode. This means that pods in different tenants (aka namespaces) per default can communicate with each other without any restriction. You can find more informations about the different modes for Openshift DNS in the documentation for Openshfit SDN. Actually we already saw in part 1 of this series a hint for this:
oc get clusternetworks.network.openshift.io -o yaml apiVersion: v1 items: - apiVersion: network.openshift.io/v1 clusterNetworks: - CIDR: 10.128.0.0/14 hostSubnetLength: 9 hostsubnetlength: 9 kind: ClusterNetwork metadata: creationTimestamp: "2020-05-27T06:10:34Z" generation: 1 name: default ownerReferences: - apiVersion: operator.openshift.io/v1 blockOwnerDeletion: true controller: true kind: Network name: cluster uid: da3cf28f-2ec6-4ccd-9c51-ffc0f5897be2 resourceVersion: "1774" selfLink: /apis/network.openshift.io/v1/clusternetworks/default uid: c74b6a66-99ff-492d-90fa-a615a84c337e mtu: 1450 network: 10.128.0.0/14 pluginName: redhat/openshift-ovs-networkpolicy serviceNetwork: 172.30.0.0/16 vxlanPort: 4789 kind: List metadata: resourceVersion: "" selfLink: ""
The second important corollary of this interproject communication is that DNS service discovery works across projects: the API pod in project “project2” could successfully resolve the FQDN “server.project1.svc.cluster.local”.
But what if we want to restrict communications, and prevent pods in one project from being accessed by pods in other projects? Good old Kubernetes Network Policy to the rescue.
As you can check in the Openshift documentation for Network Policy, there are many ways of restricting communication between pods. In this example we are going to apply a network policy to the SQL Server to only accept connections from its own namespace (project1):
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-project1 spec: ingress: - from: - podSelector: {} podSelector: matchLabels: app: server policyTypes: - Ingress
This policy will apply to all pods with the label app=server (like our SQL Server pod). Per default the “podSelector” in the “from” clause is scoped to the current namespace, so this policy allows all traffic from any pod in the same namespace.
If you try again to reach the SQL Server in project1 from project2, it will not work:
curl "http://sqlapilb-project2.apps.m50kgrxk.northeurope.aroapp.io/api/sqlversion"
An additional useful policy that you might want to include is to configure frontend pods (such as the API pod in our example) to be only accessible from the ingress controller. You can find an example of such a policy in the Openshift documentation for Network Policy.
Connectivity to the Virtual Network
So far we have covered traffic flows inside of the cluster and from Internet. What about the rest of the Virtual Network and on-premises networks? Let’s do that. To test this we will deploy a Virtual Machine in the same Virtual Network, but in a different subnet. There are many ways of deploying a virtual machine in Azure, my favorite is CLI:
vm_name=apivm vm_nsg_name=${vm_name}-nsg vm_pip_name=${vm_name}-pip vm_disk_name=${vm_name}-disk0 vm_sku=Standard_B2ms publisher=Canonical offer=UbuntuServer sku=18.04-LTS image_urn=$(az vm image list -p $publisher -f $offer -s $sku -l $location --query '[0].urn' -o tsv) az network vnet subnet create -n $vm_subnet_name --vnet-name $vnet_name -g $rg --address-prefixes $vm_subnet_prefix ip-address testvm-pip --vnet-name $vnet_name --subnet $vm_subnet_name az vm create -n $vm_name -g $rg -l $location --image $image_urn --size $vm_sku --generate-ssh-keys \ --os-disk-name $vm_disk_name --os-disk-size-gb 32 \ --vnet-name $vnet_name --subnet $vm_subnet_name \ --nsg $vm_nsg_name --nsg-rule SSH --public-ip-address $vm_pip_name vm_pip_ip=$(az network public-ip show -n $vm_pip_name -g $rg --query ipAddress -o tsv) ssh-keyscan -H $vm_pip_ip >> ~/.ssh/known_hosts
The previous bash commands get the latest Ubuntu 18.04 image and deploy it in a new subnet in a virtual network. The script gets the allocated public IP and adds it to the known_hosts file so that we will be able to send commands to the VM. To verify that the VM has been deployed successfully, we can check its private IP address sending a remote command over SSH:
ssh $vm_pip_ip "ip a" 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 00:0d:3a:d8:21:fa brd ff:ff:ff:ff:ff:ff inet 192.168.0.101/28 brd 192.168.0.111 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::20d:3aff:fed8:21fa/64 scope link valid_lft forever preferred_lft forever
Let’s start with checking how the API sees us. In previous posts we have gone through a public Internet router, but if you remember, we had deployed an internal Load Balancer for our server as well:
oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE server ClusterIP 172.30.193.92 <none> 1433/TCP 46m sqlapi ClusterIP 172.30.3.55 <none> 8080/TCP 46m sqlapilb LoadBalancer 172.30.165.130 192.168.0.11 8080:31192/TCP 46m
By the way, we might want to have a look at how this IP address is implemented. If you remember, there are some load balancers provisioned in the node resource group:
node_rg_id=$(az aro show -n $cluster_name -g $rg --query 'clusterProfile.resourceGroupId' -o tsv) node_rg_name=$(echo $node_rg_id | cut -d/ -f 5) az network lb list -g $node_rg_name -o table Location Name ProvisioningState ResourceGroup ResourceGuid ----------- ---------------------- ------------------- --------------- ------------------------------------ northeurope aro2-p8bjm Succeeded aro2-resources b1630a28-0e71-49ee-9e63-9c0d5edeaebc northeurope aro2-p8bjm-internal Succeeded aro2-resources 7521a4e5-19d4-428e-a2e3-d59370457abf northeurope aro2-p8bjm-internal-lb Succeeded aro2-resources b01e0eb0-6035-4a61-9dd1-54642410c7ae northeurope aro2-p8bjm-public-lb Succeeded aro2-resources 5ec7a5b5-a6b9-4892-ba02-dc89acbe28ee
We are interested in the “-internal” one, that is the internal load balancer where the worker nodes are connected. To double check, let us verify the frontend IP addresses, we should see the private IP address of our service:
az network lb frontend-ip list --lb-name aro2-7lrgj-internal -g $node_rg_name -o table Name PrivateIpAddress PrivateIpAddressVersion PrivateIpAllocationMethod ProvisioningState ResourceGroup ------------------------------------- ------------------ ------------------------- --------------------------- ------------------- --------------- a37227ba481534bb6aba9b048186900e 192.168.0.11 IPv4 Dynamic Succeeded aro2-resources a22a55112e91348d48c6fcf87f4f1cca-apps 192.168.0.132 IPv4 Dynamic Succeeded aro2-resources
And there it is! One more thing, let us check the health probe that the Azure Load Balancer is using:
az network lb probe list --lb-name aro2-nvwgf-internal -g $node_rg_name -o table IntervalInSeconds Name NumberOfProbes Port Protocol ProvisioningState ResourceGroup ------------------- ----------------------------------------- ---------------- ------ ---------- ------------------- --------------- 5 af73928d6b8954aac8024b76d833f652-TCP-8080 2 31192 Tcp Succeeded aro2-resources
The important thing is that the probe is using the NodePort TCP port 31192, we will come back to this later. Now we can connect to the internal LB from the VM:
ssh $vm_pip_ip "curl -s http://192.168.0.11:8080/api/ip" { "my_default_gateway": "10.131.0.1", "my_dns_servers": "['172.30.0.10']", "my_private_ip": "10.131.0.40", "my_public_ip": "40.127.221.40", "path_accessed": "192.168.0.11:8080/api/ip", "sql_server_fqdn": "server.project1.svc.cluster.local", "sql_server_ip": "172.30.72.7", "x-forwarded-for": null, "your_address": "10.131.0.1", "your_browser": "None", "your_platform": "None" }
This is interesting: the pod sees us coming from the 10.131.0.1, not from the original IP address from the Virtual Machine. But what is 10.131.0.1? If you remember Part 1, 10.131.0.0/23 is the IP address range that the Openshift SDN has allocated to the worker node where our pod is. Each node has internally a virtual router based on Open vSwitch that will act as default gateway for the pods, and that router performs Source NAT for inbound traffic from outside of the cluster. The reason why packets must be SNATted is because the load balancer does not actually know in which node the relevant pod is, so it will choose any one, and then the packet will find the right pod (possibly in a different node). Openshift SDN uses SNAT to guarantee that the return packet follows the same path.
Something interesting to note is that the X-Forwarded-For header is empty, since there is not any reverse proxy in the way. Hence the client IP information is not visible to the application. In some cases this is a serious problem, what can be done to fix this?
We will explore one solution in this post modifying the internal load balancer, and will leave another one for a future post (adding an internal router). If my previous explanation was half way understandable, the root cause of the problem is that the Azure Load Balancer’s health probe checks on the NodePort TCP port, which is active in all nodes, and hence the traffic can hit first a node that does not have the pod. Can we reconfigure the load balancer so that it only sends traffic to nodes actually hosting a relevant pod? Yes! We will change the service’s Extranl Traffic Policy to “Local”
oc edit svc/sqlapilb spec: ... externalTrafficPolicy: Local ...
After doing that, let us verify the configuration of the Azure Load Balancer probes:
az network lb probe list --lb-name aro2-p8bjm-internal -g $node_rg_name -o table IntervalInSeconds Name NumberOfProbes Port Protocol ProvisioningState RequestPath ResourceGroup ------------------- ----------------------------------------- ---------------- ------ ---------- ------------------- ------------- --------------- 5 af73928d6b8954aac8024b76d833f652-TCP-8080 2 32352 Http Succeeded /healthz aro2-resources
There is an important difference: the probe now is not TCP, but HTTP, and it goes to a specific API and port on the Openshift node that tells whether there is any node for our application or not. As a consequence, the Azure Load Balancer will only send traffic to nodes containing relevant pods, and Source NAT will not be required any more. Let’s check from our VM again:
ssh $vm_pip_ip "curl -s http://192.168.0.11:8080/api/ip" { "my_default_gateway": "", "my_dns_servers": "['172.30.0.10']", "my_private_ip": "10.129.2.12", "my_public_ip": "51.104.149.59", "path_accessed": "192.168.0.11:8080/api/ip", "sql_server_fqdn": "server.project1.svc.cluster.local", "sql_server_ip": "172.30.105.222", "x-forwarded-for": null, "your_address": "192.168.0.101", "your_browser": "None", "your_platform": "None" }
And bingo! Not NAT involved any more, now the application sees the original client’s IP address.
Now that we are deep into the internal Azure Load Balancer, let’s try one more thing. You might have realized that the internal Load Balancer IP addresses come out of the worker nodes subnet. What if we exhaust the IP addresses there? You don’t want to be in a situation where you cannot scale or upgrade your cluster because of lack of IP addresses. Additionally, in certain situations you might want to whitelist the IP range of your LoadBalancer services, but exclude the nodes’ IP addresses. ARO has a feature for you: deploying internal LoadBalancer services in a dedicated subnet. This is controlled via an additional annotation, let’s create a new subnet and a LoadBalancer in that subnet:
ilb_subnet_name=apps ilb_subnet_prefix=192.168.0.128/28 az network vnet subnet create -n $ilb_subnet_name --vnet-name $vnet_name -g $rg --address-prefixes $ilb_subnet_prefix oc expose dc sqlapi --port 8080 --type=LoadBalancer --name=sqlapisubnet --dry-run -o yaml | awk '1;/metadata:/{ print " annotations:\n service.beta.kubernetes.io/azure-load-balancer-internal: \"true\"\n service.beta.kubernetes.io/azure-load-balancer-internal-subnet: \"'${ilb_subnet_name}'\"" }' | oc create -f -
As you can see, the previous command introduces two annotations: “service.beta.kubernetes.io/azure-load-balancer-internal” to signal that the Load Balancer will be internal, and “service.beta.kubernetes.io/load-balancer-internal-subnet” to specify the subnet where the ALB will be deployed. Let’s check it out!
oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE server ClusterIP 172.30.105.222 <none> 1433/TCP 34m sqlapi ClusterIP 172.30.215.232 <none> 8080/TCP 34m sqlapilb LoadBalancer 172.30.62.57 192.168.0.11 8080:31011/TCP 24m sqlapisubnet LoadBalancer 172.30.171.194 192.168.0.132 8080:31072/TCP 2m9s
And that is it, the new service called “sqlapisubnet” has been deployed with the IP address 192.168.0.132, what is the first allocatable IP address in the “apps” subnet 192.168.0.128/28.
This concludes this post, in the next part we will have a look at Azure Private Link and DNS. Thanks for reading!
[…] Part 3: Inter-Project and Vnet Communication […]
LikeLike
[…] Part 3: Inter-Project and Vnet Communication […]
LikeLike
[…] Part 3: Inter-Project and Vnet Communication […]
LikeLike
[…] Part 3: Inter-Project and Vnet Communication […]
LikeLike