Skip to content

registerPodInformer leaks SharedIndexInformers when pods use ephemeral namespaces #2820

@vquemener

Description

@vquemener

Jenkins, plugins and environment report

Jenkins: 2.541.3 (via Helm chart jenkins/jenkins 5.9.12)
Kubernetes plugin: 5.28
Java: 21.0.2 - OpenJDK (OpenJDK 64-Bit Server VM)
OS: Fedora Linux 44 - 6.19.10-300.fc44.x86_64
Kubernetes cluster: K3s v1.34.6+k3s1

Reproduction steps

  1. Configure a Kubernetes cloud in Jenkins.
  2. Run builds that create pods in ephemeral namespaces (one namespace per build, for example with the Hierarchical Namespace Controller).
  3. After the builds complete, the namespaces are deleted by the external namespace manager.
  4. Wait a few minutes and observe the Jenkins logs.

Expected Results

After a build completes and the ephemeral namespace is deleted, the SharedIndexInformer created by registerPodInformer() for that namespace should be closed and removed from memory.

Actual Results

The informer is never closed. It retries indefinitely against the deleted namespace, receiving 403 Forbidden on every reconnection attempt.

Observed symptoms (growing linearly with the number of builds):

  • Thread leak: one thread per zombie informer (389 JVM threads observed after a few hours vs ~135 after a restart)
  • Log flood: thousands of KubernetesClientException: Received 403 on websocket per minute
  • CPU waste: journald/rsyslogd processing the error messages

The only current workaround is setting
-Dorg.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.disableDiagnosticLogs=true, which disables the entire pod event diagnostic feature introduced in #1627.

Root cause

// KubernetesCloud.java
private transient volatile Map<String, SharedIndexInformer<Pod>> informers = new ConcurrentHashMap<>();

public void registerPodInformer(KubernetesSlave node) {
    informers.computeIfAbsent(node.getNamespace(), (n) -> {
        // Creates a SharedIndexInformer — never removed from the map
        // ...
    });
}

There is no corresponding cleanup path:

$ grep -r "informers\.remove\|informer.*close\|informer.*stop" src/
# (no results)

KubernetesSlave._terminate() deletes the pod but does not touch the informer map.

Anything else?

Proposed fix

Add unregisterPodInformer(namespace) in KubernetesCloud and call it from KubernetesSlave._terminate() when no other pod from the same cloud remains in the namespace.

A structural alternative would be a single cluster-wide informer (client.pods().inAnyNamespace().withLabels(...)) instead of per-namespace informers, but this requires broader RBAC (ClusterRole).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions