Skip to content

saveContainerState() should use atomic writes to prevent state.json corruption #598

@MayankSharmaCSE

Description

@MayankSharmaCSE

Problem

saveContainerState() in pkg/unikontainers/unikontainers.go (lines 694–709) writes
state.json using os.WriteFile() directly. If the urunc process is killed or the
system crashes mid-write, state.json ends up partially written — and every subsequent
urunc start, urunc kill, or urunc delete fails with a JSON parse error.

The real pain here is what happens after the corruption: the VMM process (QEMU,
Firecracker, etc.) is already running, but urunc can no longer read the state it needs
to manage it. The VM becomes an orphan — still consuming resources, still holding the
TAP device and network config, but invisible to the container lifecycle. You have to
find and kill it manually.

Root Cause

The current implementation does a direct overwrite:

stateName := filepath.Join(u.BaseDir, stateFilename) return os.WriteFile(stateName, data, 0o644) 

`os.WriteFile()` truncates the file and writes in-place. Any interruption between truncate and write-completion leaves 
the file in a corrupt state causing either empty, partially written, or containing a mix of old and new data.

What the fix should look like:

The codebase already has the correct pattern ,writePidFile() in
pkg/unikontainers/utils.go (lines 110–125) does exactly this:

1.Write to a temporary file in the same directory (ensures same filesystem for rename)
2.Sync / close the temp file
3.os.Rename() the temp file to the target path

os.Rename() is atomic on POSIX filesystems when source and destination are on the
same mount. Either the old state.json is there or the new one is — never a half-written
file.

The fix would be straightforward: extract a small atomicWriteFile() helper (or just
inline the temp+rename pattern in saveContainerState()), and ideally refactor
writePidFile() to use the same helper so both paths stay consistent.

Impact

  • Severity: High because corruption leaves running VMs unmanageable
  • Likelihood: Low under normal operation, but real under OOM kills, kill -9, or system crashes lead exactly the scenarios where you most need state recovery to work
  • Blast radius: All VMM backends (QEMU, Firecracker, Solo5, Cloud Hypervisor) are affected since they all depend on state.json for lifecycle management

Reproduction scenario

  • Start a unikernel container via urunc
  • Kill the urunc process (e.g., kill -9) during a state transition more specifically while saveContainerState() is writing
  • Attempt any lifecycle operation (urunc kill, urunc delete)
  • Observe JSON parse error; the VMM process is now orphaned

(Timing the kill precisely is tricky in practice, but the window exists on every
state write — and the consequence is severe enough to warrant the fix regardless.)

Additional context

This was found during a code audit of the state management paths. The fact that
writePidFile() already implements the atomic pattern suggests this was a known
concern at some point — saveContainerState() just wasn't updated to match.


Metadata

Metadata

Assignees

No one assigned

    Labels

    CoreRelated to urunc's internals

    Type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions