Chapter 13: CRI And Kubernetes

Kubelet does not talk to runc. It talks to the Container Runtime Interface, and in containerd, CRI is an in-daemon plugin that translates Kubernetes runtime and image RPCs into containerd service calls.

Kubernetes has pods, pod sandboxes, runtime handlers, image pulls, container creates, exec, attach, port-forward, status, stats, and logs. containerd has namespaces, image metadata, content, snapshotters, containers, tasks, shims, leases, events, and runtime options.

flowchart TB kubelet[kubelet] -->|CRI gRPC| cri[containerd CRI plugin] cri --> images[Image service] cri --> sandbox[Sandbox service] cri --> containers[Container service] cri --> tasks[Task service] cri --> cni[CNI plugins] tasks --> shim[Runtime v2 shim]

Registration

The CRI plugin registers itself as a containerd gRPC plugin and pulls in a long dependency list — runtime, image, sandbox, NRI, event, service, lease, transfer, sandbox-store, and warning plugins. During init it builds an in-memory containerd client pinned to the k8s.io namespace, constructs the CRI service, and registers Kubernetes runtime and image servers on containerd's existing gRPC server.

The registration call in plugins/cri/cri.go is two lines:

runtime.RegisterRuntimeServiceServer(s, instrumented)
runtime.RegisterImageServiceServer(s, instrumented)

Everything kubelet ever sees of containerd flows through those two servers.

The k8s.io namespace is the second half of the boundary. Kubernetes-managed images, containers, sandboxes, leases, and snapshots all live in that one metadata partition. It does not isolate Linux processes — it keeps containerd records from colliding with whatever a developer creates by hand through ctr in the default namespace.

CRI Service State

CRI keeps a Kubernetes-shaped index over containerd state — sandbox stores, container name indexes, CNI state, stats — so kubelet can answer its own questions without round-tripping through containerd. containerd remains the source of truth for images, snapshots, and tasks; CRI keeps Kubernetes bookkeeping next to it. That is why CRI source can look larger than expected: it is preserving Kubernetes semantics on top of containerd objects that have their own.

The CRI API Shape

Kubernetes splits CRI into a runtime service and an image service. The runtime service covers pod sandbox operations, container operations, exec, attach, port-forward, status, stats, checkpointing, and runtime config. The image service covers image listing, status, pulls, removals, and filesystem usage.

The shape is visible in kubelet's own behavior. An image pull is a CRI image-service call. A workload container create is a runtime-service call. Starting that container is another runtime-service call. Inside containerd, those three calls land on the same image, snapshot, container, task, and shim machinery from chapters 11 and 12.

The same split is why CreateContainer does not start the workload. CRI inherits the OCI and containerd convention of keeping create and start as separate lifecycle steps; the API never had a "run" verb to begin with.

Pod Sandboxes

The first runtime object Kubernetes asks for is a pod sandbox. The pause container is the most visible piece; the sandbox is the metadata, network namespace, CNI result, runtime endpoint, labels, monitor state, and (on the default Linux path) the pause container itself.

RunPodSandbox does the Kubernetes-facing setup. In containerd v2.3.0, the source path runs through thirteen steps:

generate and reserve a sandbox name;
create a lease;
resolve the runtime handler;
store sandbox metadata;
create a network namespace unless host networking is requested;
run CNI setup;
create sandbox metadata through the sandbox service;
ensure the pause image exists;
start the sandbox;
save the sandbox endpoint, labels, and spec;
run NRI hooks;
mark the sandbox ready;
store the sandbox and start an exit monitor.

CNI gets its own chapter in Part V; for now it is enough that CRI runs CNI as part of RunPodSandbox and turns the result into containerd service calls.

The default pod-sandbox controller still creates a real containerd container and task for the sandbox image. It builds a sandbox container spec, prepares a snapshot, creates a container with runtime options, creates a task with null IO, waits on it, starts it, and records the PID. That sequence is the bridge between a Kubernetes pod sandbox and the container/task/shim model from chapter 12.

Image Pulls Through CRI

CRI PullImage starts as a Kubernetes image request, not a raw containerd pull. The CRI code normalizes the reference, picks a snapshotter from the pod sandbox or runtime handler context, and then drops into either the local client pull path or the transfer service.

On the local pull path, CRI passes a set of options that all show up again later:

WithPullUnpack, so the image is unpacked into the chosen snapshotter;
the snapshotter selection itself;
labels for indexing and GC;
download concurrency and rate limits;
unpack-duplication suppression;
optional layer-discard behavior.

That handover is why runtime handlers can affect image pulls. If a handler points at a non-default snapshotter, the image service needs the choice at pull time, because unpack has to land in the same filesystem backend the workload will eventually mount. Pulling for one snapshotter and starting on another is a common operational mistake and a hard one to debug after the fact.

Workload Container Creation

CreateContainer takes a pod sandbox ID and a container config. It checks the sandbox exists, resolves the already-pulled image, generates CRI metadata, builds an OCI spec using the sandbox's PID and network namespace state, prepares a new writable snapshot, attaches runtime and sandbox metadata, creates the containerd container, records CRI container state, and emits a container-created event.

It does not start the workload. CreateContainer prepares metadata, spec, snapshot state, and CRI bookkeeping; the user's process does not exist until StartContainer runs.

The snapshot step is where chapter 11's image work meets the workload. The image has already been pulled and unpacked into committed snapshots. CreateContainer calls Prepare for an active writable snapshot on top of that chain, and that active snapshot is what task creation will eventually hand to the shim as the root filesystem.

Workload Start

StartContainer is the live-execution step. It verifies the sandbox is ready, creates loggers and IO, carries the sandbox endpoint into task options when one exists, creates a containerd task, waits on it in the background, runs NRI start hooks, starts the task, records the PID and start time, launches an exit monitor, and emits a container-started event.

The single call crosses every layer Part IV has introduced:

kubelet calls CRI StartContainer;
CRI looks up its container and sandbox state;
CRI asks containerd to create a task for the container;
containerd runtime v2 builds the bundle and dials the shim;
the shim asks the OCI runtime to create and start the process;
CRI records the PID, monitors exit, and reports status back to kubelet.

kubelet never has to know how io.containerd.runc.v2 becomes containerd-shim-runc-v2, where the bundle is written, or how snapshots are mounted into the rootfs.

Runtime Handlers

A runtime handler is the Kubernetes-facing name for a configured slice of containerd runtime behavior. Inside containerd it selects the runtime type, runtime options, snapshotter, sandboxer, runtime binary path, and IO mode, and it carries snapshotter information into the image service so pulls land in the right place.

Kubernetes should be able to ask for a runtime class — default, a VM-backed runtime such as Kata, an alternative such as crun — without constructing containerd runtime options by hand. The handler is the indirection that makes RuntimeClass a first-class Kubernetes object instead of an opaque config string.

The same handler is where most operational mistakes surface. A handler that names one snapshotter while the image was pulled and unpacked into another will fail at container start as a filesystem problem, not a configuration one. A handler that selects a sandbox-aware runtime pulls shim grouping and sandbox endpoints into task startup. Every field on a runtime handler changes a concrete containerd behavior: snapshotter selection, runtime options, shim grouping for sandbox-aware runtimes.

Where This Goes

The stack reads cleanly in both directions. From kubelet down: CRI request → containerd services → runtime v2 shim → OCI runtime → kernel. From the kernel up: process and namespaces → OCI runtime → shim → containerd task → CRI container → Kubernetes pod. Part V picks up the pod's network namespace and shows how it gets wired into the host.

Sources And Further Reading

Kubernetes CRI docs: https://kubernetes.io/docs/concepts/containers/cri/
CRI API proto v0.36.0: https://github.com/kubernetes/cri-api/blob/v0.36.0/pkg/apis/runtime/v1/api.proto
containerd CRI architecture docs: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/docs/cri/architecture.md
CRI plugin: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/plugins/cri/cri.go
CRI runtime service: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/service.go
CRI sandbox run path: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/sandbox_run.go
Pod sandbox controller: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/podsandbox/sandbox_run.go
CRI image pull path: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/images/image_pull.go
CRI container create path: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/container_create.go
CRI container start path: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/container_start.go