Skip to content

fix(cmd): add ps command for containerd recovery#601

Merged
cmainas merged 1 commit intourunc-dev:main-pr601from
sidneychang:add-ps-command
May 7, 2026
Merged

fix(cmd): add ps command for containerd recovery#601
cmainas merged 1 commit intourunc-dev:main-pr601from
sidneychang:add-ps-command

Conversation

@sidneychang
Copy link
Copy Markdown
Contributor

@sidneychang sidneychang commented May 1, 2026

Description

Add a runc-compatible ps --format json command to urunc.

containerd-shim-urunc-v2 reuses the runc shim manager and task service. During containerd restart, the recovery path calls Pids(), which eventually invokes the runtime binary as:

urunc ps --format json <container-id>

Without this command, containerd cannot obtain the host-visible PID associated with the urunc task and may treat the shim as leaked.

Return the VMM / sandbox monitor PID stored in state.json as a []int, matching the JSON format expected by containerd/go-runc and runc's ps implementation. Keep the result as a slice so it can be extended later if urunc supports multiple VMM or unikernel processes for one container.

Related issues

How was this tested?

Before the fix

The urunc container was running before restarting containerd:

root@sev:~/zxd/urunc-per-container# nerdctl ps -a
CONTAINER ID    IMAGE                                                         COMMAND                   CREATED          STATUS    PORTS    NAMES
ae3a6ebb87fe    harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest    "/urunit /usr/sbin/n…"    5 seconds ago    Up                 urunc-view-check-1

After restarting containerd:

root@sev:~/zxd/urunc-per-container# systemctl restart containerd

the same container became Created:

root@sev:~/zxd/urunc-per-container# nerdctl ps -a
CONTAINER ID    IMAGE                                                         COMMAND                   CREATED           STATUS     PORTS    NAMES
ae3a6ebb87fe    harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest    "/urunit /usr/sbin/n…"    14 seconds ago    Created             urunc-view-check-1

The state stayed Created on the next check:

root@sev:~/zxd/urunc-per-container# nerdctl ps -a
CONTAINER ID    IMAGE                                                         COMMAND                   CREATED           STATUS     PORTS    NAMES
ae3a6ebb87fe    harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest    "/urunit /usr/sbin/n…"    26 seconds ago    Created             urunc-view-check-1

The urunc shim process was still alive:

root@sev:~/zxd/urunc-per-container# ps aux | grep shim
root      512463  0.0  0.0 1233708 4704 ?        Sl   04:39   0:00 /usr/local/bin/containerd-shim-urunc-v2 -namespace default -id ae3a6ebb87fea672711de7b7ab75315eb7e39605330b71236d607657b1707c9a -address /run/containerd/containerd.sock -debug
root      513223  0.0  0.0   6548  1544 pts/1    S+   04:40   0:00 grep --color=auto shim

After stopping and removing the container:

root@sev:~/zxd/urunc-per-container# nerdctl stop ae3
ae3
root@sev:~/zxd/urunc-per-container# nerdctl rm ae3
ae3

the shim was still left behind:

root@sev:~/zxd/urunc-per-container# ps aux | grep shim
root      512463  0.0  0.0 1233708 4704 ?        Sl   04:39   0:00 /usr/local/bin/containerd-shim-urunc-v2 -namespace default -id ae3a6ebb87fea672711de7b7ab75315eb7e39605330b71236d607657b1707c9a -address /run/containerd/containerd.sock -debug
root      513492  0.0  0.0   6548  1548 pts/1    S+   04:40   0:00 grep --color=auto shim

At that point, the container metadata had already disappeared:

root@sev:~/zxd/urunc-per-container# nerdctl ps -a
CONTAINER ID    IMAGE    COMMAND    CREATED    STATUS    PORTS    NAMES

This shows the failure mode before the fix: containerd restart caused the urunc container to fall from Up to Created, and cleanup left a live containerd-shim-urunc-v2 process behind.

After the fix

After replacing the binaries with the fixed version, a new urunc container was started:

root@sev:~/zxd/urunc-per-container#nerdctl -n default  --snapshotter devmapper run -d --name urunc-view-check-1   --runtime io.containerd.urunc.v2   harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest
4426490c6afd310ebe8653126f0479efe25ed60c32b48e585d9f95e9b028fd17

The container was Up before restarting containerd:

root@sev:~/zxd/urunc-per-container# nerdctl ps -a
CONTAINER ID    IMAGE                                                         COMMAND                   CREATED          STATUS    PORTS    NAMES
4426490c6afd    harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest    "/urunit /usr/sbin/n…"    5 seconds ago    Up                 urunc-view-check-1

After restarting containerd:

root@sev:~/zxd/urunc-per-container# systemctl restart containerd

the container remained Up:

root@sev:~/zxd/urunc-per-container# nerdctl ps -a
CONTAINER ID    IMAGE                                                         COMMAND                   CREATED           STATUS    PORTS    NAMES
4426490c6afd    harbor.nbfc.io/nubificus/urunc/nginx-qemu-linux-raw:latest    "/urunit /usr/sbin/n…"    17 seconds ago    Up                 urunc-view-check-1

The container was then stopped and removed normally:

root@sev:~/zxd/urunc-per-container# nerdctl stop 4426
4426
root@sev:~/zxd/urunc-per-container# nerdctl rm 4426
4426

After removal, no urunc shim was left behind; only the grep process appeared:

root@sev:~/zxd/urunc-per-container# ps aux | grep shim
root      515037  0.0  0.0   6548  1544 pts/1    S+   04:42   0:00 grep --color=auto shim

LLM usage

ChatGPT

Checklist

  • I have read the contribution guide.
  • The linter passes locally (make lint).
  • The e2e tests of at least one tool pass locally (make test_ctr, make test_nerdctl, make test_docker, make test_crictl).
  • If LLMs were used: I have read the llm policy.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 1, 2026

Deploy Preview for urunc canceled.

Name Link
🔨 Latest commit 6d84825
🔍 Latest deploy log https://app.netlify.com/projects/urunc/deploys/69fb3ddf82d4c5000846f69f

@sidneychang sidneychang marked this pull request as ready for review May 1, 2026 21:26
@cmainas
Copy link
Copy Markdown
Contributor

cmainas commented May 5, 2026

Hello @sidneychang , let's discuss a bit about the root cause of the issue over the issue first.

Copy link
Copy Markdown
Contributor

@cmainas cmainas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @sidneychang ,

overall the PR looks good. A few comments:

  • please use sandbox monitor instead of VMM to also include non-VM based monitors.
  • there is the case where we spawn virtiofsd for some containers. This process is tightly connected with the monitor though. There is also the case where a monitor might spawn sub-processes. I think in that case we should also return these PIDs. However, this will not be straightforward without cgroups.

Would you like to work on the extra PIDs task as well in this PR or should we address it in a different iteration?

Comment thread cmd/urunc/ps.go Outdated

var psCommand = &cli.Command{
Name: "ps",
Usage: "displays the host-visible VMM processes associated with a container",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VMM -> monitor . There is support for non-VM monitors too in urunc.

Comment thread cmd/urunc/ps.go Outdated
Usage: "displays the host-visible VMM processes associated with a container",
ArgsUsage: `<container-id>`,
Description: `The ps command displays the host-visible process IDs associated
with a urunc container. For VM-based urunc containers this returns the VMM /
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this For VM-based urunc containers to not exclude other non-VM based monitors.

Comment thread cmd/urunc/ps.go
//
// Keep the return value as []int to match runc's ps implementation
// and containerd/go-runc's expectation for `ps --format json`.
pids := []int{unikontainer.State.Pid}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the case of virtiofsd, which is another process that starts for some urunc containers. We should also include this Pid here. However, this will require for a better bookkeeping of the Pids.

@sidneychang sidneychang force-pushed the add-ps-command branch 2 times, most recently from 5e03afe to 3b8d82f Compare May 6, 2026 08:12
@sidneychang
Copy link
Copy Markdown
Contributor Author

Thanks for the review @cmainas.
I have updated the code.

Regarding whether we should also return virtiofsd and subprocesses spawned by the monitor, I agree this is a valid scenario to consider. However, I would prefer to discuss and handle it separately as a follow-up.

My main concern is that urunc currently only persists the main monitor PID. For example, virtiofsd is started before execing the monitor, but its PID is not currently recorded separately, nor is it strongly tied to the monitor through cgroups or a similar mechanism. Therefore, I am not sure whether we can guarantee in all abnormal cases that helper processes will always exit once the main monitor exits.

If we include these helper/subprocess PIDs directly in the ps result now, I think we first need to clearly define their semantics relative to the main monitor PID, and decide how to handle the case where the main monitor no longer exists but some helper processes may still be alive. Otherwise, upper layers may see PIDs that do not really represent the liveness of the main monitor/container task.

So I think it would be better to treat this as a follow-up item.

@cmainas
Copy link
Copy Markdown
Contributor

cmainas commented May 6, 2026

Thanks for the review @cmainas. I have updated the code.

Regarding whether we should also return virtiofsd and subprocesses spawned by the monitor, I agree this is a valid scenario to consider. However, I would prefer to discuss and handle it separately as a follow-up.

Ok let's do it that way.

My main concern is that urunc currently only persists the main monitor PID. For example, virtiofsd is started before execing the monitor, but its PID is not currently recorded separately, nor is it strongly tied to the monitor through cgroups or a similar mechanism. Therefore, I am not sure whether we can guarantee in all abnormal cases that helper processes will always exit once the main monitor exits.

If we include these helper/subprocess PIDs directly in the ps result now, I think we first need to clearly define their semantics relative to the main monitor PID, and decide how to handle the case where the main monitor no longer exists but some helper processes may still be alive. Otherwise, upper layers may see PIDs that do not really represent the liveness of the main monitor/container task.

This is one of the reasons that in urunc's execution model the monitor process has PID 1 in the PID namespaces and acts as the int. If it exits all other process will exit too. See https://man7.org/linux/man-pages/man7/pid_namespaces.7.html

       If the "init" process of a PID namespace terminates, the kernel
       terminates all of the processes in the namespace via a SIGKILL
       signal.

@sidneychang
Copy link
Copy Markdown
Contributor Author

@cmainas Thanks, that makes sense.

@cmainas
Copy link
Copy Markdown
Contributor

cmainas commented May 6, 2026

Hello @sidneychang . Could you rebase over main so we can merge this?

@sidneychang
Copy link
Copy Markdown
Contributor Author

Hello @cmainas, I’ve rebased it over main.

Add a runc-compatible `ps --format json` command to urunc.

containerd-shim-urunc-v2 reuses the runc shim manager and task
service. The Pids() path eventually invokes the runtime binary as:

    urunc ps --format json <container-id>

Without this command, containerd cannot obtain the host-visible PID
associated with the urunc task.

Return the monitor PID stored in state.json as a []int, matching the
JSON format expected by containerd/go-runc and runc's `ps`
implementation. Keep the result as a slice so it can be extended later
if urunc needs to report additional monitor-related PIDs for a
container.

Signed-off-by: sidneychang <2190206983@qq.com>
@urunc-bot urunc-bot Bot changed the base branch from main to main-pr601 May 7, 2026 06:45
Copy link
Copy Markdown
Contributor

@cmainas cmainas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sidneychang for this.

@cmainas cmainas merged commit ebd8913 into urunc-dev:main-pr601 May 7, 2026
34 checks passed
github-actions Bot pushed a commit that referenced this pull request May 7, 2026
Add a runc-compatible `ps --format json` command to urunc.

containerd-shim-urunc-v2 reuses the runc shim manager and task
service. The Pids() path eventually invokes the runtime binary as:

    urunc ps --format json <container-id>

Without this command, containerd cannot obtain the host-visible PID
associated with the urunc task.

Return the monitor PID stored in state.json as a []int, matching the
JSON format expected by containerd/go-runc and runc's `ps`
implementation. Keep the result as a slice so it can be extended later
if urunc needs to report additional monitor-related PIDs for a
container.

PR: #601
Signed-off-by: sidneychang <2190206983@qq.com>
Reviewed-by: Charalampos Mainas <cmainas@nubificus.co.uk>
Approved-by: Charalampos Mainas <cmainas@nubificus.co.uk>
urunc-bot Bot pushed a commit that referenced this pull request May 7, 2026
Add a runc-compatible `ps --format json` command to urunc.

containerd-shim-urunc-v2 reuses the runc shim manager and task
service. The Pids() path eventually invokes the runtime binary as:

    urunc ps --format json <container-id>

Without this command, containerd cannot obtain the host-visible PID
associated with the urunc task.

Return the monitor PID stored in state.json as a []int, matching the
JSON format expected by containerd/go-runc and runc's `ps`
implementation. Keep the result as a slice so it can be extended later
if urunc needs to report additional monitor-related PIDs for a
container.

PR: #601
Signed-off-by: sidneychang <2190206983@qq.com>
Reviewed-by: Charalampos Mainas <cmainas@nubificus.co.uk>
Approved-by: Charalampos Mainas <cmainas@nubificus.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

urunc containers fall back to Created after containerd restart

2 participants