Chapter 9: runc Lifecycle
runc is containerd's default Linux runtime and the most common implementation of the OCI runtime contract. It takes the bundle from Chapter 8, turns config.json into libcontainer configuration, creates the requested Linux environment, and eventually calls execve for the configured process.
runc coordinates a parent process, an init process, namespace entry, cgroup placement, mount setup, hooks, seccomp, capabilities, labels, IO, state files, and cleanup. It is one implementation of the OCI runtime spec; crun, youki, runsc, and Kata are others.
create builds processes, namespaces, mounts, and cgroups but stops at the FIFO gate before execve; start releases the gate.
CLI Verbs
runc exposes the OCI lifecycle directly:
runcreates and starts a container in one command.createcreates the container environment and leaves the configured program gated.startstarts a container in thecreatedstate.stateemits OCI state JSON.killsends a signal.deletetears down a stopped or forcibly killed container.
The run command is convenient for humans. Container managers such as containerd use the two-phase create/start shape because it gives them a setup point between environment creation and process execution.
Reading The Bundle
runc starts by entering the bundle directory, opening config.json, decoding it into the OCI Go Spec, validating the process section, and converting the spec into libcontainer configuration.
The handoff from JSON to libcontainer is small:
if err = json.NewDecoder(cf).Decode(&spec); err != nil {
return nil, err
}
return spec, validateProcessSpec(spec.Process)
From there runc translates the spec into libcontainer's model: namespaces, mounts, cgroups, devices, process settings, hooks, and runtime flags such as systemd cgroup mode, rootless mode, and no-pivot behavior.
The Parent And The Gate
For an init container, runc creates an exec FIFO before starting the container path:
if process.Init {
if err := c.createExecFifo(); err != nil {
return err
}
}
The parent process then starts a cloned copy of /proc/self/exe running runc init. That init path receives pipes, bootstrap data, the init config, logging descriptors, and namespace setup instructions. runc marks unrelated file descriptors close-on-exec before starting runc init, a hardening detail added because leaked descriptors have produced container escapes in the past (CVE-2024-21626).
The FIFO is the gate. During runc create, the init process prepares the environment and waits. During runc start, runc releases that gate so the init path can continue to the final user process.
That gate is why created and running are different states: in created, namespaces, mounts, and cgroups exist and the init process is parked; the user-specified execve has not happened yet.
Namespace Entry
runc includes C namespace-entry code because namespace operations are sensitive to process and thread state. setns(2), PID namespace creation, and clone ordering do not fit cleanly into an already-running multithreaded Go runtime.
The parent starts runc init, the C nsenter code handles low-level namespace entry and clone staging, and the Go init code reads _LIBCONTAINER_* environment variables and init config from pipes. From there the init path chooses standard init for a new container or setns init for runc exec.
crun and youki use different internal structures to satisfy the same OCI contract.
Root Filesystem Setup
The standard Linux init path prepares networking and routes, initializes labeling state, then prepares the root filesystem:
if err := setupNetwork(l.config); err != nil {
return err
}
err := prepareRootfs(l.pipe, l.config)
prepareRootfs is where the bundle's root and mount declarations become a mount table inside the container's mount namespace. runc opens the rootfs, iterates configured mounts, creates device nodes when needed, sets up /dev/ptmx and /dev symlinks, runs parent-side hooks at the correct point, and switches root:
for _, m := range config.Mounts {
if err := setupAndMountToRootfs(pipe, config, mountConfig, m); err != nil {
return err
}
}
err = pivotRoot(rootFd)
The full code path adds hardening around /proc, /sys, user-namespace device behavior, and read-only remounts. The mount list in config.json becomes kernel mount state, then pivot_root(2), MS_MOVE, or chroot(2) swaps the prepared tree in as /.
Setup Order
runc's parent and init processes synchronize because setup order matters. The parent can apply cgroups before children escape placement, move configured network interfaces after it knows the child PID, run prestart and createRuntime hooks from the parent side, and pass file descriptors to the child. The child prepares the rootfs, runs container-side hooks, applies user and group settings, labels, capabilities, noNewPrivileges, seccomp, scheduler settings, I/O priority, and cwd checks close to the final exec.
That order is security-sensitive. A seccomp filter installed too early can block setup calls. A capability dropped too late gives more privilege to setup code than intended. A cwd outside the container root can become a host filesystem exposure.
Start, State, Kill, Delete
start only operates on a created container. It releases the exec FIFO so the init path can call execve for the configured program. state reports the OCI state fields from runc's stored state and live process information.
kill has more policy than a raw kill(2). runc has special handling for SIGKILL, stopped and running states, cgroup process killing, and cases where the container does not have a private PID namespace. delete --force kills before teardown and must handle processes that may remain in the cgroup after the init process exits, especially when PID namespaces are shared.
A runtime owns the lifecycle state and teardown rules, not just the start path: delete --force reaps stragglers in the cgroup, and shared PID namespaces require killing the whole tree.
Other Runtime Answers
runc is the book's main implementation path, but the OCI contract allows different answers:
| Runtime | What stays the same | What changes |
|---|---|---|
| crun | OCI bundle and lifecycle | C implementation and libcrun library-oriented design |
| youki | OCI bundle and lifecycle | Rust implementation and Rust abstractions for process, rootfs, cgroups, seccomp |
gVisor runsc |
OCI-facing command shape | Workload syscalls go through gVisor's userspace Sentry |
| Kata Containers | containerd/OCI-facing manager boundary | Workload runs inside a lightweight VM and guest agent |
Part IV moves back up the stack to containerd, where image content, snapshots, container metadata, tasks, shims, and CRI all meet before the runtime ever receives a bundle.
Sources And Further Reading
- runc repository at checked commit: https://github.com/opencontainers/runc/tree/eb7eaf19b6eec5d1143b257057899e4a7b738c81
- runc CLI commands: https://github.com/opencontainers/runc/tree/eb7eaf19b6eec5d1143b257057899e4a7b738c81
- runc
config.jsonloading: https://github.com/opencontainers/runc/blob/eb7eaf19b6eec5d1143b257057899e4a7b738c81/spec.go - runc libcontainer setup: https://github.com/opencontainers/runc/blob/eb7eaf19b6eec5d1143b257057899e4a7b738c81/libcontainer/container_linux.go
- runc parent process sync: https://github.com/opencontainers/runc/blob/eb7eaf19b6eec5d1143b257057899e4a7b738c81/libcontainer/process_linux.go
- runc init dispatch: https://github.com/opencontainers/runc/blob/eb7eaf19b6eec5d1143b257057899e4a7b738c81/libcontainer/init_linux.go
- runc standard init: https://github.com/opencontainers/runc/blob/eb7eaf19b6eec5d1143b257057899e4a7b738c81/libcontainer/standard_init_linux.go
- runc rootfs setup: https://github.com/opencontainers/runc/blob/eb7eaf19b6eec5d1143b257057899e4a7b738c81/libcontainer/rootfs_linux.go
- runc nsenter C source: https://github.com/opencontainers/runc/blob/eb7eaf19b6eec5d1143b257057899e4a7b738c81/libcontainer/nsenter/nsexec.c
- OCI runtime lifecycle: https://github.com/opencontainers/runtime-spec/blob/6999a89a76a0329f440d5740497bedb9dd431297/runtime.md