Observed symtom
When running bosh-stemcell-1.1183-vsphere-esxi-ubuntu-jammy-go_agent.tgz along with bosh release which manipulates cgroupsv2 entries (https://github.com/orange-cloudfoundry/k3s-wrapper-boshrelease ) we see unresponsive agent issue
Interactive use of monit fails
server/11aa3763-7a95-47ad-8089-7399a97a2d0d:~# monit summary
mkdir: cannot create directory ‘/sys/fs/cgroup/unified\n’: Read-only file system
It seems that the monit wrapper in 1.1183 with #563 is yet not robust to additional cgroupv2 fs mounts
https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blame/ubuntu-jammy/v1.1183/stemcell_builder/stages/bosh_monit/assets/monit-access-helper.sh#L17-L20
Analysis
Analysis from @ogrand : permit_monit_access() fails in face of additional cgroupv2 fsmounts entries present in the host as a result of the bosh release (cilium in our case)
https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/ubuntu-jammy/v1.1183/stemcell_builder/stages/bosh_monit/assets/monit-access-helper.sh#L17-L20
# cgroupv2 (unified hierarchy)
# Create a sub-cgroup under the current process's cgroup and move into it.
# The iptables rules match on this cgroup path.
cgroup_mount="$(awk '$3 == "cgroup2" { print $2 }' /proc/self/mounts)"
Comparing the output of this command across bosh deployments
On a deployment not manipulating cgroupv2 fs type mounts
dns-recursor/1cf11aec-d00e-4121-b406-8fca86240e7e:~# cgroup_mount="$(awk '$3 == "cgroup2" { print $2 }' /proc/self/mounts)" ; echo "${cgroup_mount}"
/sys/fs/cgroup/unified
On a k3s deployment with cilium workload, two lines (mount points) are returned instead, which breaks the remaining of the script logic
server/11aa3763-7a95-47ad-8089-7399a97a2d0d:~# cgroup_mount="$(awk '$3 == "cgroup2" { print $2 }' /proc/self/mounts)" ; echo "${cgroup_mount}"
/sys/fs/cgroup/unified
/run/cilium/cgroupv2
Annotated version of the fs mounts matching the grep command
server/11aa3763-7a95-47ad-8089-7399a97a2d0d:~# cat /proc/self/mounts | grep cgroup2
# device mount_point fs_type dummy
cgroup2 /sys/fs/cgroup/unified cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
none /run/cilium/cgroupv2 cgroup2 rw,relatime 0 0
Details on cilium cgroupv2 fs mount
https://github.com/cilium/cilium/blob/e53e1341f483c722bdaaf56a72cc01334d8aaac2/Documentation/network/kubernetes/kubeproxy-free.rst#L114
Cilium will automatically mount cgroup v2 filesystem required to attach BPF cgroup programs by default at the path /run/cilium/cgroupv2. To do that, it needs to mount the host /proc inside an init container launched by the DaemonSet temporarily.
background on /proc/self/mounts and its format
https://manpages.ubuntu.com/manpages/noble/man5/proc_pid_mounts.5.html
/proc/mounts
Before Linux 2.4.19, this file was a list of all the filesystems currently mounted on the system. With the introduction of per-process mount namespaces in Linux 2.4.19 (see [mount_namespaces(7)](https://manpages.ubuntu.com/manpages/noble/man7/mount_namespaces.7.html)), this file became a link to /proc/self/mounts, which lists the mounts of the process's own mount namespace. The format of this file is documented in [fstab(5)](https://manpages.ubuntu.com/manpages/noble/man5/fstab.5.html).
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/4/html/reference_guide/s2-proc-mounts
the first column specifies the device that is mounted, the second column reveals the mount point, and the third column tells the file system type, and the fourth column tells you if it is mounted read-only (ro) or read-write (rw). The fifth and sixth columns are dummy values designed to match the format used in /etc/mtab.
https://man7.org/linux/man-pages/man7/cgroups.7.html
Note that on many modern systems, systemd(1) automatically mounts
the cgroup2 filesystem at /sys/fs/cgroup/unified during the boot
process.
Workaround
We're currently testing a (ugly) workaround to patch the monit script (using the https://github.com/orange-cloudfoundry/generic-scripting-release/)
from
cgroup_mount="$(awk '$3 == "cgroup2" { print $2 }' /proc/self/mounts)"
to
cgroup_mount="$(awk '$1 == "cgroup2" && $3 == "cgroup2" { print $2 }' /proc/self/mounts)"
using
sed -i 's:cgroup_mount=.*:cgroup_mount="$(awk '\''\$1 == "cgroup2" \&\& $3 == "cgroup2" { print $2 }'\'' \/proc\/self\/mounts)":g' /var/vcap/bosh/etc/monit-access-helper.sh
Observed symtom
When running bosh-stemcell-1.1183-vsphere-esxi-ubuntu-jammy-go_agent.tgz along with bosh release which manipulates cgroupsv2 entries (https://github.com/orange-cloudfoundry/k3s-wrapper-boshrelease ) we see unresponsive agent issue
Interactive use of monit fails
It seems that the monit wrapper in 1.1183 with #563 is yet not robust to additional cgroupv2 fs mounts
https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blame/ubuntu-jammy/v1.1183/stemcell_builder/stages/bosh_monit/assets/monit-access-helper.sh#L17-L20
Analysis
Analysis from @ogrand :
permit_monit_access()fails in face of additional cgroupv2 fsmounts entries present in the host as a result of the bosh release (cilium in our case)https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/ubuntu-jammy/v1.1183/stemcell_builder/stages/bosh_monit/assets/monit-access-helper.sh#L17-L20
Comparing the output of this command across bosh deployments
On a deployment not manipulating cgroupv2 fs type mounts
On a k3s deployment with cilium workload, two lines (mount points) are returned instead, which breaks the remaining of the script logic
Annotated version of the fs mounts matching the grep command
Details on cilium cgroupv2 fs mount
https://github.com/cilium/cilium/blob/e53e1341f483c722bdaaf56a72cc01334d8aaac2/Documentation/network/kubernetes/kubeproxy-free.rst#L114
background on /proc/self/mounts and its format
https://manpages.ubuntu.com/manpages/noble/man5/proc_pid_mounts.5.html
/proc/mounts
https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/4/html/reference_guide/s2-proc-mounts
https://man7.org/linux/man-pages/man7/cgroups.7.html
Workaround
We're currently testing a (ugly) workaround to patch the monit script (using the https://github.com/orange-cloudfoundry/generic-scripting-release/)
from
cgroup_mount="$(awk '$3 == "cgroup2" { print $2 }' /proc/self/mounts)"to
cgroup_mount="$(awk '$1 == "cgroup2" && $3 == "cgroup2" { print $2 }' /proc/self/mounts)"using