Chapter 8: OCI Runtime Bundles

An OCI runtime bundle is the local handoff between a higher-level manager and a low-level runtime: a directory containing config.json and the root filesystem the runtime will use as /.

By the time runc sees a bundle, containerd or another manager has already done the image work. Registry resolution, content download, layer unpacking, snapshot preparation, and mount calculation happen above the OCI runtime boundary. The runtime receives the result as local filesystem state plus a JSON spec.

flowchart LR manager[container manager] --> image[image and snapshot work] image --> bundle[OCI bundle] bundle --> runtime[OCI runtime] runtime --> kernel[Linux kernel primitives]

containerd, CRI-O, Docker, and other managers prepare filesystem and metadata state in their own way, then hand a common bundle to runc, crun, youki, runsc, Kata, or another compatible implementation.

Bundle Layout

The OCI runtime spec defines a bundle as a directory with config.json at the root. The root filesystem is referenced by root.path; the spec does not require the rootfs directory to have one universal name, though rootfs is conventional.

The practical shape is:

bundle/
  config.json
  rootfs/
    bin/
    etc/
    proc/
    ...

config.json is the contract. It describes the process, root filesystem, mounts, namespaces, cgroups, hooks, annotations, and platform-specific settings. The root filesystem is the tree that should become / inside the container after the runtime sets up the mount namespace and switches root.

The top-level Go type generated for the runtime spec is a useful map of that contract:

type Spec struct {
    Version string `json:"ociVersion"`
    Process *Process `json:"process,omitempty"`
    Root *Root `json:"root,omitempty"`
    Mounts []Mount `json:"mounts,omitempty"`
    Hooks *Hooks `json:"hooks,omitempty" platform:"linux,solaris,zos"`
    Annotations map[string]string `json:"annotations,omitempty"`
    Linux *Linux `json:"linux,omitempty" platform:"linux"`
}

The OCI runtime spec is the authoritative document. Most implementations vendor the runtime-spec/specs-go types, so configs that round-trip through one tool tend to round-trip through the others.

Process And Root

The process object describes the program to execute. It includes args, env, cwd, user and group settings, capabilities, rlimits, terminal settings, noNewPrivileges, AppArmor and SELinux labels, scheduler fields, I/O priority, and CPU affinity. A minimal one looks like this:

"process": {
  "args": ["/bin/sh", "-c", "echo hello"],
  "env": ["PATH=/usr/bin"],
  "cwd": "/",
  "noNewPrivileges": true
}

The root object names the container root filesystem. root.path is interpreted relative to the bundle unless it is absolute. root.readonly asks the runtime to make the root filesystem read-only after setup; individual mounts still carry their own options.

"root": { "path": "rootfs", "readonly": false }

The runtime creates the namespace and mount state that make process and root true.

Mounts

The mounts array is ordered. That matters because later mounts can cover earlier paths. Each mount has a destination, type, source, and options:

"mounts": [
  { "destination": "/proc", "type": "proc", "source": "proc" },
  { "destination": "/data", "type": "bind", "source": "/var/data",
    "options": ["rbind", "ro"] },
  { "destination": "/tmp",  "type": "tmpfs", "source": "tmpfs",
    "options": ["mode=1777", "size=64m"] }
]

The three entries share one JSON shape and produce three very different kernel calls: a proc mount, a host-path bind, and a fresh tmpfs. A bind mount source can be absolute or relative to the bundle.

The spec defines the desired mount table without prescribing how the runtime achieves it. A runtime may use classic mount(2), the file-descriptor mount API (fsopen(2), fsmount(2), move_mount(2)), idmapped mounts via mount_setattr(2), or bind-mount fallbacks, depending on kernel support and user-namespace mode.

Linux Namespaces

The Linux-specific namespaces list names namespace types such as mount, PID, network, UTS, IPC, user, cgroup, and time. For each namespace type, the meaning of path is the key:

OCI namespace entry Runtime behavior
Type present with no path Create a new namespace of that type.
Type present with path Join the namespace at that path, usually via setns(2).
Type absent Inherit the runtime's namespace of that type.

Duplicate namespace types are invalid.

linux.namespaces does not isolate anything by itself. It instructs the runtime which namespaces to create or join when it builds the container process — the kernel calls from Part II happen at that step, not at JSON-decode time.

Cgroups And Resources

linux.cgroupsPath describes where the container should live in the cgroup hierarchy. linux.resources describes controller settings: CPU, memory, block I/O, pids, hugepages, RDMA, device rules, and cgroup v2 unified settings not otherwise modeled by the spec.

The spec does not mandate one cgroup manager. A runtime can write cgroupfs directly, use systemd, or combine approaches depending on host configuration. For cgroup v2, delegation rules matter: ownership and writable files are constrained by the kernel's delegation model.

If a limit is expressed correctly in config.json but not visible in /sys/fs/cgroup, the failure is in the runtime's cgroup application path or the host manager interaction, not in the bundle format itself.

Hooks

Hooks run at defined points around container setup, each in a specific namespace context.

The current runtime spec defines:

A hook entry is a process invocation, nothing more:

"hooks": {
  "createRuntime": [
    { "path": "/usr/local/bin/setup-net",
      "args": ["setup-net", "--bridge", "br0"],
      "timeout": 5 }
  ]
}

prestart is deprecated but still appears in implementations for compatibility. Hooks are a common place for device injection and runtime extensions: nvidia-container-runtime is a createRuntime hook that injects GPU device nodes and library bind mounts before runc switches root. Each hook is a single exec at a defined lifecycle point — no plugin discovery, no shared state.

Runtime State

The OCI runtime also has a state JSON format. state reports fields such as ociVersion, id, status, pid, bundle, and annotations. The core statuses are creating, created, running, and stopped.

The lifecycle verbs use that state model:

The create/start split exists so a caller can do work between setup and execution: attach IO, pass file descriptors, run hooks, or coordinate external namespace setup.

What The Spec Leaves Open

The OCI runtime spec defines the bundle and lifecycle, not a particular process tree. runc, crun, youki, gVisor, and Kata can all keep an OCI-facing shape while making different implementation choices.

Those choices include:

Chapter 9 follows runc's choices through this list.

Sources And Further Reading