Skip to main content

Connecting non-Kubernetes nodes to Calico overlay network

Kubernetes networking has some basic rules.  In short, every pod has to communicate with every other. Selecting the right network plugin for the cluster is a critical key component when planning and architecting a new cluster. Luckily there are great presentations and blog posts around the topic of Kubernetes cluster networking on the internet, but the available sources are very limited about how to connect external resources that aren’t part of the cluster into the mesh. It all depends on what we would like to achieve, so finally, we have to glue the solutions together.

In this post, I would like to tell our story @IBM about converting an existing node to become a full member of our Kubernetes + Calico network.
First of all, we had to specify the main goals:
  • Make node full member of the overlay network
  • The external node needs a pod IP to be able to reach it like any regular pod in the system
  • The pod IP must be listenable for services on the external node
  • Service discovery is mandatory for both directions, so pods have to resolve the external node's hostname to pod IP and the node has to reach Kubernetes services as well
  • On node restart, the same pod IP must reuse for the node
As I mentioned we are using Calico overlay network for many reasons which I don't want to cover now. If you are interested you can find more about available network plugins here and here. Let's jump into the implementation details.

Make node full member of the overlay network 

This part was the easiest one. In the case of Calico network in Kubernetes there is a Deployment (calico-kube-controllers) and a DaemonSet (calico-node). The only thing to do here is to run a well-configured calico-node service (both container and native services are supported) on the external node and all the magic happens behind the scenes.

The external node needs a pod IP to be able to reach it like any regular pod in the system

Network layer solution doesn't included into Kubernetes. I can agree with this engineering decision, networking is hard to do and also hard to generalize. Every company/service/whatever has its own special requirements, one is latency the other is throughput sensitive. Others have strict company policies or extra security regulations. And lastly but not least some team wants to connect non Kubernetes workloads to the mesh.

Kubernetes uses network plugins and the plugin's responsibility to manage the network itself. There is two kinds of them, namely kubenet and CNI. In our case, Calico is configured as CNI plugin.
After the node was registered in Calico data store and became a full member of the network we can use the Calico CNI plugin to create a Workload endpoint to allocate the pod IP address from the IP pool assigned to the node by calico-node service.

The pod IP must be listenable for services on the external node

The tricky part. By default CNI plugin does support only container execution (what a surprise in Kubernetes world). It means the previously created network interface and IP address live in a separated namespace and it is hidden from other processes. One option to solve this is to run services in this new network namespace. I suggest this only for network experts and for new installations where you want to make non containerized service available only for Kubernetes pods. The other option is to copy the network interface to the default namespace. This solution is a bit tricky but covers more common use cases and makes the interface available for regular services.

Service discovery is mandatory for both direction

Service discovery is one of the key components of the solution because this is the piece witch is used by application developers. There are two ways of service discovery;
  • Pods need to reach node by hostname:
    • During the provision, the configuration tool creates a headless service in Kubernetes which points to the node's pod IP
  • External node has to reach services in Kubernetes:
    • A node can reach ClusterIPs of the Kubernetes cluster via network interface created by CNI, so it can communicate with CoreDNS. During the provision, the configuration toolsets CoreDNS as the name resolver

On node restart, the same pod IP must reuse for the node

In the case of node restart, our target was to restore the same state of the Calico network. But the node was already registered, the IP pool was associated and the pod IP was allocated at the start time. We choose the simplest solution possible: changed the services to clean up the previously created Calico node and Workload endpoint before starting.

This is the end of part one. I hope it helped to get a better understanding of how Kubernetes and Calico network works and how to extend Calico network with non Kubernetes workers. If you have any comments please feel free to discuss them.

Next time we will try this in the practice. To be continued...

Popular posts from this blog

Advanced testing of Golang applications

Golang has a nice built-in framework for testing production code and you can find many articles on how to use it. In this blog post, I don't want to talk too much about the basics , table-driven testing ,  how to generate code coverage  or detect race conditions . I would like to share my personal experiences with a real-world scenario. Go is a relatively young and modern programming language on one side, and it is an old fashion procedural language on the other. You have to keep in mind that fact when you are writing production code from the beginning, otherwise, your program should become an untestable mess so easily. In a procedural way, your program is executed line by line and functions call other functions without any control of the dependencies. Hard to unit test, because you are testing underlying functions too, which are side effects from the perspective of testing.  It looks like everything is static if you are coming from object-oriented world. There are...

First impressions of the new Cloud Native programming language Ballerina

Nowadays everything is Cloud Native; everybody talks about CN tools, frameworks, solutions, and so on. On the other hand, those tools totally changed the way we design, develop, test and release modern applications. I think the number of issues that we solved with the new concepts is equal to the number of new challenges, so in short,     we simply shoveled problems from one hole to the other. Many new tools appeared on the market to make developers' life easier by integrating software with the underlying infrastructure watching file changes and building containers automatically generating resource descriptors on the fly allowing debugging in a running container etc. Next to the new tools, new programming languages such as Metaparticle , Pulumi or Ballerina have been born. The last one had my attention because others are extensions on top of some existing languages, while Ballerina is a brand new programming language, des...

Story of deploying pod to every node and preventing them from termination

 During the development of one of our new features, I faced an interesting challenge. The feature requests are simple and clear: there is an existing DaemonSet  (workload running on "every" node) on the target Kubernetes  cluster, we have to deploy another workload next to each instance and prevent workload termination under certain conditions. Let's split the problem into two parts; deployment and prevention. From the deployment perspective, another DaemonSet makes lots of sense. If we use the same node selectors as the existing one, Kubernetes would deploy pods to the same nodes. In our case a custom operator is working in the background, so we are able to sync node selectors, but for other kinds of deployments this should be a tricky piece. On the topic of prevention PodDisruptionBudget [PDB] comes into the picture. Without going into the details PDB allows us to define how many of the target pods should be terminated by Kubernetes at once. It has a maxUnavailable fi...