Jenkins, plugins and environment report
Jenkins: 2.541.3 (via Helm chart jenkins/jenkins 5.9.12)
Kubernetes plugin: 5.28
Java: 21.0.2 - OpenJDK (OpenJDK 64-Bit Server VM)
OS: Fedora Linux 44 - 6.19.10-300.fc44.x86_64
Kubernetes cluster: K3s v1.34.6+k3s1
Reproduction steps
- Configure a Kubernetes cloud in Jenkins.
- Run builds that create pods in ephemeral namespaces (one namespace per build, for example with the Hierarchical Namespace Controller).
- After the builds complete, the namespaces are deleted by the external namespace manager.
- Wait a few minutes and observe the Jenkins logs.
Expected Results
After a build completes and the ephemeral namespace is deleted, the SharedIndexInformer created by registerPodInformer() for that namespace should be closed and removed from memory.
Actual Results
The informer is never closed. It retries indefinitely against the deleted namespace, receiving 403 Forbidden on every reconnection attempt.
Observed symptoms (growing linearly with the number of builds):
- Thread leak: one thread per zombie informer (389 JVM threads observed after a few hours vs ~135 after a restart)
- Log flood: thousands of
KubernetesClientException: Received 403 on websocket per minute
- CPU waste: journald/rsyslogd processing the error messages
The only current workaround is setting
-Dorg.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.disableDiagnosticLogs=true, which disables the entire pod event diagnostic feature introduced in #1627.
Root cause
// KubernetesCloud.java
private transient volatile Map<String, SharedIndexInformer<Pod>> informers = new ConcurrentHashMap<>();
public void registerPodInformer(KubernetesSlave node) {
informers.computeIfAbsent(node.getNamespace(), (n) -> {
// Creates a SharedIndexInformer — never removed from the map
// ...
});
}
There is no corresponding cleanup path:
$ grep -r "informers\.remove\|informer.*close\|informer.*stop" src/
# (no results)
KubernetesSlave._terminate() deletes the pod but does not touch the informer map.
Anything else?
Proposed fix
Add unregisterPodInformer(namespace) in KubernetesCloud and call it from KubernetesSlave._terminate() when no other pod from the same cloud remains in the namespace.
A structural alternative would be a single cluster-wide informer (client.pods().inAnyNamespace().withLabels(...)) instead of per-namespace informers, but this requires broader RBAC (ClusterRole).
Related
Jenkins, plugins and environment report
Reproduction steps
Expected Results
After a build completes and the ephemeral namespace is deleted, the
SharedIndexInformercreated byregisterPodInformer()for that namespace should be closed and removed from memory.Actual Results
The informer is never closed. It retries indefinitely against the deleted namespace, receiving
403 Forbiddenon every reconnection attempt.Observed symptoms (growing linearly with the number of builds):
KubernetesClientException: Received 403 on websocketper minuteThe only current workaround is setting
-Dorg.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.disableDiagnosticLogs=true, which disables the entire pod event diagnostic feature introduced in #1627.Root cause
There is no corresponding cleanup path:
KubernetesSlave._terminate()deletes the pod but does not touch the informer map.Anything else?
Proposed fix
Add
unregisterPodInformer(namespace)inKubernetesCloudand call it fromKubernetesSlave._terminate()when no other pod from the same cloud remains in the namespace.A structural alternative would be a single cluster-wide informer (
client.pods().inAnyNamespace().withLabels(...)) instead of per-namespace informers, but this requires broader RBAC (ClusterRole).Related
registerPodInformer(). This is the code that creates informers without a cleanup path, i.e. the direct origin of this bug.