Is Computer Networking too complex?

This question has been bothering me for quite some time now. Other technology areas constantly look to reduce complexity: take for example one of the most difficult fields out there, data science. Some years ago you needed a degree to even start with it, and now you can build and deploy models while sipping your favorite cocktail at the swimming pool using tools like Azure ML Studio, Google Auto ML or AWS SageMaker, not to mention the advent of Python replacing R (partially because of its simplicity), and the myriad of products with wizards that do Machine Learning for you, such as Splunk, Power BI, etc. Why are similar evolutions not happening in Networking?

You could argue that the computer networking industry has been even making a business out of this complexity with high-valued certifications such as CCIE (Cisco) and JNCIE (Juniper), which take years to prepare for. Others have written about this phenomenon, such as my admired Ivan Pepelnjak in The Ever-Increasing Complexity. And yet, you see newcomers in the industry like the cloud vendors falling into the very same complexity trap, where managing an Azure or an AWS network demands knowledge levels that might warrant an expert-level certification (not that I am trying to give Microsoft or Amazon ideas about an 8-hour exam).

The networking industry is on an eternal crusade to find alternatives to the dreaded command-line interface, denigrated and yet used by all. None of these seem to be able to take the market by storm, each with interesting results of their own. Cisco Application Centric Infrastructure has a special place in my heart (I wrote a book about Deploying ACI many moons ago): it is a valiant attempt at separating network infrastructure from network policy, where infrastructure configuration is standardized and hidden away, so that you don’t need to know about VXLANs or EVPN BGP. Hence, the policy model is completely decoupled from infrastructure concepts. Network administrators all over the world had a hard time digesting these concepts, and a common criticism was that it is “too complex” as compared with good, old command-line interface.

Fast-forward a couple of years, and you have public cloud networks that have had some time to ripe and mature. They are not the bare-bones VPCs from 5 years ago, and offer a much richer (and harder to configure) functionality. Cloud vendors naturally hide their physical infrastructure from their users, and yet all of them have networking abstractions based on traditional concepts such as subnets or ACLs. Even if networks defined this way are easier to understand by traditional network administrators, by trying not to alienate them (too much), these models don’t manage to leave behind the pain points associated with legacy networking.

Should cloud networks move closer to concepts such as the ones introduced by Cisco with ACI? Should other paradigms such as the Kubernetes network abstractions be adopted? Or would these novel approaches be put under the “too complex” umbrella by users?

Evaluating complexity of network models

I am proud to have published my very first research paper: Complexity Evaluation of Network Configurations and Abstractions. It is quite light reading (for a research paper), but let me give you a quick summary: the end goal is comparing different abstractions with objective metrics that give an idea of their relative complexity: the metrics are meaningless independently, they are only useful when comparing two or more abstractions.

I took 4 representative network configurations of the same topology: 12 end points (VMs, physical endpoints or pods) with a policy dictating who can talk to who:

Cisco CLI: the baseline for many of us.
Azure: as representant of the public cloud models. I could have picked any other public cloud (they are not too different from each other, especially AWS and Azure are really close), but Azure happens to be the one I know best.
Cisco ACI: for the reasons I mention above, it was the first significant attempt that I know of to decouple connectivity from policy.
Kubernetes: as the abstraction that has gained more adoption in my lifetime.

The first step is modeling the different configurations as graphs. I used the networkx Python library, although that is not important. The reason is because graphs can be studied, analyzed and summarized. For example, the following four diagrams show the summary graphs for my four topologies:

And lastly, I derive my metrics, which I summarize in this table:

L-Edges: loose couplings between model objects
T-Edges: tight couplings between model objects
I-Types: model object types directly related to infrastructure
P-Types: model object types directly related to policy
IP-ED: excess degree, or in other words, whether network administrators have to type IP addresses more than once

The “Excess Degree”

The last metric is the most critical one. Let me explain it with an example: imagine you need to configure two subnets and an ACL that allows traffic between them. What would you do?

Create subnet A, say 10.0.1.0/24
Create subnet B, say 10.0.2.0/24
Create an ACL that allows traffic between 10.0.1.0/24 and 10.0.2.0/24

Do you see the problem? You had to type the subnet IP addresses twice: the first time when creating the subnet, the second one when defining the policy (ACL). This is the main source of configuration errors, typos, and ultimately of complexity. Not to mention situations where somebody changes the subnet definition but forgets to update the ACL, or the lack of portability of the policy itself, since it is tightly linked to the infrastructure where it is deployed.

If you look at the table with the the complexity metrics above, you can see that both Cisco ACI and Kubernetes reduce this complexity vector (the last column, IP-ED) to zero:

Kubernetes gets rid of the subnets, and end points are instead loosely grouped by labels that can be used when defining network policy.
Cisco ACI takes a similar approach, with the only difference being that ACI Endpoint Groups (EPG) are groups of endpoints more tightly defined, instead of using free-text labels.

This simple example with IP addresses in ACLs goes everywhere: route-maps matching on IP prefixes defined somewhere else, static routes that have to contain the exact VNet prefixes, network statements in routing protocols matching the mask of the interface, etc.

What is the future?

One of my favorite quotes is Mark Twain’s “Prediction is difficult – particularly when it involves the future”. Network models in the cloud and on-premises need to evolve to modern abstractions. Kubernetes is proving that if network administrators are not able to evolve along, their jobs will move somewhere else: today many DevOps and Platform Engineering teams manage their Kubernetes networks themselves.

Especially public cloud vendors have a great opportunity and responsibility. They might move to Kubernetes-inspired abstractions as suggested by hedgehog (led by Mike Dvorkin, the main brain behind Cisco ACI’s object model, although they focus more on on-premises networking with SONiC devices) or evolve their models in directions such as Azure Virtual WAN or AWS Cloud WAN, although it is debatable whether just adding abstraction levels to existing technologies is the right way to go.

In any case, by investing the same simplification efforts in Networking as these vendors do in other areas such as Machine Learning, they will be setting the groundwork for more stable computer networks in the future.

4 thoughts on “Is Computer Networking too complex?”

petru

August 11, 2023 at 6:29 pm

Very interesting article. Kubernetes as a whole is still hard to digest. Hopefully the cloud vendors or the networking vendors will find a way to simplify the complexity and it will be easier for all of us to learn new technologies and debug them during incidents.

LikeLike

1. erjosito
  
  August 15, 2023 at 7:35 am
  
  🤞
  
  LikeLike
  
DRY Terraform code for Private Link and DNS – Cloudtrooper

August 19, 2023 at 1:06 pm

[…] last week’s almost-philosophical post on network complexity, let’s move on to more mundane tasks. Today I will focus on how to write efficient Terraform […]

LikeLike

[FI] Tietoliikennealan katsaus 2023-08 – loopback1.net

September 8, 2023 at 1:37 pm

[…] Onko verkosta tullut liian monimutkainen, kysyy Jose Moreno. Kehityskulku on mennyt CLI-hallinnasta, ACI:n objektimallisen abstraktoinnin kautta julkisen pilven verkkoihin ja Kubernetes-klustereihin. Näiden verkkomallien monimutkaisuutta Moreno on arvioinut tutkimuksessaan. Konfiguraatio mallinnetaan kuvat ja ulottuvuuksiksi. ACI ja Kubernetes psytyvät poistamaan konfiguraatioista toisteisuutta, vaikka muuten voivat olla hankalampia käsitellä. Azure ja CLI ovat monimutkaisia omilla tavoillaan. Varmaan siitä voidaan olla yhtä mieltä, että verkon abstraktointia tarvitaan, jotta hallittavuus pysyy aisoissa. Mutta toteutustapoja on monia ja kaikki eivät ole käyttäjäystävällisiä ja ymmärrettäviä, eivätkä välttämättä palvele tarkoitusta parhaalla mahdollisella tavalla. […]

LikeLike