PodDisruptionBudget “Gotchas”

JeremyBest Practices, DevOpsLeave a Comment

To allow the Kubernetes Cluster Autoscaler to move pods around, pods (sometimes) need PodDistruptionBudgets which can specify either minAvailable or maxUnavailable, but not both. There are a bunch of “gotchas” to look out for.

You cannot set minAvailable: 0
It's not that you can't set minAvailable to zero, but since you are not allowed to set both minAvailable and maxUnavailable, most Helm charts have code like:

{{- if .Values.podDisruptionBudget.minAvailable }}
minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
{{- end  }}
{{- if .Values.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
{{- end  }}

If you set minAvailable: 0 that is the same as not setting it. That results in neither value getting set, which is effectively the same as setting maxUnavailable: 0 (which, of course, you also cannot do).

The fix: set minAvailable: "0%". So you may wonder, “why bother setting a PodDisruptionBudget at all if you are going to allow all the pods to be deleted?” Well, the reason is that this gives the Autoscaler explicit permission to evict pods that it might otherwise be too cautious to evict. For example, anything that uses emptyDir for anything. The contents of emptyDir will be lost when the pod is evicted, and so the Autoscaler will not evict the pods without explicit permission, in order to avoid deleting something important.

It is dangerous to set minAvailable: 1
It may seem innocuous to set minAvailable: 1, but frequently we scale deployments down to 1 instance, and if you combine that with minAvailable: 1 then you are stuck with a single pod that cannot be evicted, which in turn will prevent the instance from being taken out of service.

Not a fix: minAvailable: 25%. Kubernetes does not round these percentages to nearest, it always rounds up. So with 1 replica, it will round 25% up to 1, so this is not a fix at all.

The fix: set maxUnavailable: 50%. That will avoid knocking out the service when there are replicas to spare but not prevent the service from being evicted.

Sometimes you have to set both minAvailable and maxUnavailable
While you are not allowed to set both minAvailable and maxUnavailable in the actual PodDisruptionBudget resource, if the Helm chart provides a default value for one of them and you want to use the other one, you have to set both in the helmfile. Set the one you do not want to "" to hint to readers that you are unsetting a default.