Skip to main content

Autoscaling Calico Route Reflector topology in Kubernetes

Kubernetes is a great tool to organize your workloads on a low or high scale. It has many nice features in different areas, but it is totally out-sourcing the complexity of the network. Network is one of the key layers of a success story and happily there are many available solutions on the market. Calico is one of them, and it is I think the most used network provider, including big players in public cloud space and has a great community who works day by day to make Calico better.

Installing Kubernetes and Calico nowadays is easy as a flick if you are happy with the default configurations. Otherwise, life became tricky very easily, there are so many options, configurations, topologies, automation, etc. Surprise or not, networking is one of the hard parts in high scale, and requires thorough design from the beginning. By default Calico uses IPIP encapsulation and full mesh BGP to share routing information within the cluster. This means every single node in the cluster is connected with every other node, which becomes a bottleneck shortly in large clusters.

If you are interested in the details please watch my talk on the topic:

Long story short, Calico uses the same technic as regular network solutions, namely introducing Route Reflector concept. There are many types of Route Reflector topologies, but one thing in common: Route Reflectors are dedicated nodes to collect routing information and advertise them to others.

So if you want to operate on high scale you have to design your own Route Reflector topology. I collected some of them here, please follow the link for more info.

Well done! Really?

If you have some basic experience with distributed systems you may know that change is the only constant. There are many moving parts for example nodes are coming and going or the entire network can die in a million ways. In this formation, each time the cluster changes an engineer has to check the current topology, re-calculate a new one, and apply various labels to define the best topology for the current cluster state. Easy to admit this way is less than optimal :D.

I and my team at IBM started to work on a solution to save our customers from wasting time on re-designing topology scale by scale. Our plan was to open source this feature and merge it as a core Calico feature. So first we and some members of the Calico community wrote proposal documentation, which you can find here. Then we implemented a POC and now I opened the official pull request into kube-controllers and libcalico-go projects.

The autoscaling feature is currently on review, but you can have a ride if you want.

So first bring your own cluster, or use my Kind template:
kind create cluster --config cluster.yaml
Then apply Calico manifests which I prepared (diff):
kubectl apply -f https://raw.githubusercontent.com/mhmxs/calico-manifests-dev/main/routereflector/calico-3.17-kdd.yaml
This feature is a technical preview, use it at your own risk! The calico-kube-controllers image has been built on my computer, so you have to trust me (or build your own).
If get an error "no matches for kind "BGPConfiguration" in version "crd.projectcalico.org/v1" please re-apply the manifest, your computer was not fast enough to create CRDs.

You can follow the logs of the operator:
kubectl logs -n kube-system -f -l k8s-app=calico-kube-controllers
And check BGP configurations:
kubectl get bgppeers
Optionally you can change the configuration:
# kubectl edit kubecontrollersconfigurations default
spec:
  controllers:
    routereflector:
      min: 2
      ratio: 0.5
      zoneLabel: kubernetes.io/arch
More options are available here.

Once topology becomes stable (give some time to it), you can test autoscaling by auto-scaling your cluster. Or it is less expensive to change its kubecontrollersconfigurations.

The auto scaler is far from perfect and supports only multi-cluster topology at the moment, but ready for wider tests and opens opportunities for further communications.
So please, please, please ...
  • feel free to share your experiences and ideas!
  • keep your eyes on the source code, reviews are welcome!
  • give more use-cases to us!
  • join our community Slack channel!

Popular posts from this blog

Advanced testing of Golang applications

Golang has a nice built-in framework for testing production code and you can find many articles on how to use it. In this blog post, I don't want to talk too much about the basics , table-driven testing ,  how to generate code coverage  or detect race conditions . I would like to share my personal experiences with a real-world scenario. Go is a relatively young and modern programming language on one side, and it is an old fashion procedural language on the other. You have to keep in mind that fact when you are writing production code from the beginning, otherwise, your program should become an untestable mess so easily. In a procedural way, your program is executed line by line and functions call other functions without any control of the dependencies. Hard to unit test, because you are testing underlying functions too, which are side effects from the perspective of testing.  It looks like everything is static if you are coming from object-oriented world. There are...

Kubernetes and Calico development environment as easy as a flick

I became an active member of the Calico community so I had to build my own development environment from zero. It wasn't trivial for many reasons but mainly because I have MacOS on my machine and not all of the features of Calico are available on my main operating system. The setup also makes some sense on Linux hosts, because if the node controller runs locally it might make changes to the system, which always has some risk in the playing cards. The other big challenge was that I wanted to start any version of Kubernetes with the ability to do changes in it next to Calico. Exactly I had to prepare two tightly coupled environments. My idea was to create a virtual machine with Linux on it, configure development environments for both projects in the VM and use VSCode 's nice remote development feature for code editing. In this way projects are hosted on the target operating system, I don't risk my system, I don't have to deal with poor file system sync between host a...