Skip to main content

Posts

Showing posts with the label SiteReliabilityEngineering

Story of deploying pod to every node and preventing them from termination

 During the development of one of our new features, I faced an interesting challenge. The feature requests are simple and clear: there is an existing DaemonSet  (workload running on "every" node) on the target Kubernetes  cluster, we have to deploy another workload next to each instance and prevent workload termination under certain conditions. Let's split the problem into two parts; deployment and prevention. From the deployment perspective, another DaemonSet makes lots of sense. If we use the same node selectors as the existing one, Kubernetes would deploy pods to the same nodes. In our case a custom operator is working in the background, so we are able to sync node selectors, but for other kinds of deployments this should be a tricky piece. On the topic of prevention PodDisruptionBudget [PDB] comes into the picture. Without going into the details PDB allows us to define how many of the target pods should be terminated by Kubernetes at once. It has a maxUnavailable fi