Skip to content

monit fails on 1.1183: monit wrapper confused with cgroupv2 originated from bosh release workload #585

@gberche-orange

Description

@gberche-orange

Observed symtom

When running bosh-stemcell-1.1183-vsphere-esxi-ubuntu-jammy-go_agent.tgz along with bosh release which manipulates cgroupsv2 entries (https://github.com/orange-cloudfoundry/k3s-wrapper-boshrelease ) we see unresponsive agent issue

Interactive use of monit fails

server/11aa3763-7a95-47ad-8089-7399a97a2d0d:~# monit summary
mkdir: cannot create directory ‘/sys/fs/cgroup/unified\n’: Read-only file system

It seems that the monit wrapper in 1.1183 with #563 is yet not robust to additional cgroupv2 fs mounts
https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blame/ubuntu-jammy/v1.1183/stemcell_builder/stages/bosh_monit/assets/monit-access-helper.sh#L17-L20

Analysis

Analysis from @ogrand : permit_monit_access() fails in face of additional cgroupv2 fsmounts entries present in the host as a result of the bosh release (cilium in our case)

https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/ubuntu-jammy/v1.1183/stemcell_builder/stages/bosh_monit/assets/monit-access-helper.sh#L17-L20

        # cgroupv2 (unified hierarchy)
        # Create a sub-cgroup under the current process's cgroup and move into it.
        # The iptables rules match on this cgroup path.
        cgroup_mount="$(awk '$3 == "cgroup2" { print $2 }' /proc/self/mounts)"

Comparing the output of this command across bosh deployments

On a deployment not manipulating cgroupv2 fs type mounts

    dns-recursor/1cf11aec-d00e-4121-b406-8fca86240e7e:~# cgroup_mount="$(awk '$3 == "cgroup2" { print $2 }' /proc/self/mounts)" ; echo "${cgroup_mount}"
    /sys/fs/cgroup/unified

On a k3s deployment with cilium workload, two lines (mount points) are returned instead, which breaks the remaining of the script logic

server/11aa3763-7a95-47ad-8089-7399a97a2d0d:~# cgroup_mount="$(awk '$3 == "cgroup2" { print $2 }' /proc/self/mounts)" ; echo "${cgroup_mount}"
    /sys/fs/cgroup/unified
    /run/cilium/cgroupv2

Annotated version of the fs mounts matching the grep command

    server/11aa3763-7a95-47ad-8089-7399a97a2d0d:~# cat /proc/self/mounts | grep cgroup2
     # device  mount_point         fs_type dummy
    cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
    none    /run/cilium/cgroupv2   cgroup2 rw,relatime 0 0

Details on cilium cgroupv2 fs mount

https://github.com/cilium/cilium/blob/e53e1341f483c722bdaaf56a72cc01334d8aaac2/Documentation/network/kubernetes/kubeproxy-free.rst#L114

Cilium will automatically mount cgroup v2 filesystem required to attach BPF cgroup programs by default at the path /run/cilium/cgroupv2. To do that, it needs to mount the host /proc inside an init container launched by the DaemonSet temporarily.

background on /proc/self/mounts and its format

https://manpages.ubuntu.com/manpages/noble/man5/proc_pid_mounts.5.html
/proc/mounts

Before Linux 2.4.19, this file was a list of all the filesystems currently mounted on the system. With the introduction of per-process mount namespaces in Linux 2.4.19 (see [mount_namespaces(7)](https://manpages.ubuntu.com/manpages/noble/man7/mount_namespaces.7.html)), this file became a link to /proc/self/mounts, which lists the mounts of the process's own mount namespace. The format of this file is documented in [fstab(5)](https://manpages.ubuntu.com/manpages/noble/man5/fstab.5.html).

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/4/html/reference_guide/s2-proc-mounts

the first column specifies the device that is mounted, the second column reveals the mount point, and the third column tells the file system type, and the fourth column tells you if it is mounted read-only (ro) or read-write (rw). The fifth and sixth columns are dummy values designed to match the format used in /etc/mtab.

https://man7.org/linux/man-pages/man7/cgroups.7.html

Note that on many modern systems, systemd(1) automatically mounts
the cgroup2 filesystem at /sys/fs/cgroup/unified during the boot
process.

Workaround

We're currently testing a (ugly) workaround to patch the monit script (using the https://github.com/orange-cloudfoundry/generic-scripting-release/)
from
cgroup_mount="$(awk '$3 == "cgroup2" { print $2 }' /proc/self/mounts)"
to
cgroup_mount="$(awk '$1 == "cgroup2" && $3 == "cgroup2" { print $2 }' /proc/self/mounts)"

using

sed -i 's:cgroup_mount=.*:cgroup_mount="$(awk '\''\$1 == "cgroup2" \&\& $3 == "cgroup2" { print $2 }'\'' \/proc\/self\/mounts)":g' /var/vcap/bosh/etc/monit-access-helper.sh

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Pending Review | Discussion

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions