Chapter 16: Pod Networking Model
Kubernetes promises pod networking, not "a CNI network." Each pod gets an IP address. Containers in the same pod share the pod's network namespace and port space. Pods can communicate with pods on other nodes without NAT at the Kubernetes layer, and node agents can communicate with pods on their node.
A plugin can satisfy that model with bridges and routes, overlays, eBPF, cloud provider routes, direct device attachment, or a mix of those techniques. In the containerd path, kubelet does not run the plugin itself. kubelet calls CRI, containerd CRI creates or receives the pod sandbox network namespace, and containerd invokes CNI for that namespace.
The pod IP belongs to the shared pod network namespace. It is not assigned independently to each workload container.
The Kubernetes Contract
The Kubernetes networking model has three details that shape runtime behavior. Pods on a node can communicate with all pods on all nodes without NAT at the Kubernetes layer. Agents on a node, such as kubelet or system daemons, can communicate with pods on that node. Containers in one pod share the pod IP and port space because they share one network namespace.
NetworkPolicy sits beside that model rather than inside CNI core. Kubernetes defines the policy API, but enforcement is plugin-specific. A bridge-only local setup, an eBPF plugin, and a cloud CNI can all attach namespaces through CNI while providing very different policy behavior.
RunPodSandbox Creates The Namespace
In containerd v2.3.0, the CRI path handles networking inside RunPodSandbox. If the pod requests host networking, CRI skips pod network namespace creation. Otherwise it creates a namespace mount, stores the path in sandbox metadata, and passes that path into network setup.
The source path is:
sandbox.NetNS, err = netns.NewNetNS(netnsMountDir)
sandbox.NetNSPath = sandbox.NetNS.GetPath()
When pod-level user namespaces are enabled, containerd has to create the network namespace after entering the user namespace context. The Linux helper path uses CLONE_NEWNET and then mounts the namespace from the helper process PID:
syscall.CLONE_NEWNET,
netns.NewNetNSFromPID(netnsMountDir, uint32(pid))
User namespaces change ownership and capability rules around namespace creation, so the runtime has to create the network namespace inside the right user-namespace context before CNI runs against it.
CRI Calls CNI
After namespace creation, containerd CRI calls setupPodNetwork. That function chooses the CNI plugin set for the runtime handler, prepares Kubernetes labels and runtime capabilities, calls the selected network plugin, and stores the result on the sandbox:
result, err = netPlugin.Setup(ctx, id, path, opts...)
sandbox.CNIResult = result
The id is the sandbox ID. The path is the pod network namespace path. The options carry Kubernetes metadata and runtime data that plugins may need. Labels include pod namespace, pod name, pod UID, and the sandbox container ID:
"K8S_POD_NAMESPACE": config.GetMetadata().GetNamespace(),
"K8S_POD_INFRA_CONTAINER_ID": id,
Capabilities can include annotations, port mappings, bandwidth, DNS, and cgroup path. A plugin can ignore unsupported fields, but a chained plugin such as portmap depends on runtime-provided capability data to install host-port rules.
containerd can also select CNI configuration by runtime handler. During Linux service initialization, it builds go-cni instances with plugin configuration directories and binary directories:
cni.WithPluginConfDir(dir),
cni.WithPluginDir(c.config.NetworkPluginBinDirs)
That gives operators a way to pair a Kubernetes runtime class or handler with a particular network configuration.
The Sandbox Still Starts
CNI setup does not replace the pod sandbox task. After networking is configured, the default pod sandbox controller still prepares a sandbox container from the pause image. It builds an OCI spec, prepares a snapshot, creates a containerd container, creates a task with null IO, starts it, records the PID, and marks the sandbox ready.
The sandbox process is the stable namespace anchor for the pod: the network namespace was created before CNI ran, and the sandbox task keeps the pod's namespaces alive for workload containers that start later.
The pause process is one piece of the sandbox. The rest is CRI metadata, the network namespace path, the CNI result, labels, runtime endpoint data, and monitor state, all of which containerd needs to answer kubelet later.
Workload Containers Join The Namespace
When kubelet asks CRI to create a workload container, the request names a pod sandbox. containerd looks up that sandbox, reads the sandbox PID and network namespace path, and builds the workload's OCI spec so it joins the pod namespaces.
The spec option that does the work is WithPodNamespaces. For the network namespace, it writes an OCI Linux namespace entry that points at the sandbox process namespace:
oci.WithLinuxNamespace(runtimespec.LinuxNamespace{
Type: runtimespec.NetworkNamespace,
Path: GetNetworkNamespace(sandboxPid),
})
The same option joins IPC and UTS namespaces and handles pod-level user namespace settings. The workload process does not get a new network stack: it joins the pod sandbox's network namespace, sees the pod interfaces and routes, and shares the port space with the other containers in the pod.
That is why two containers in one pod can collide on a TCP port even when they have different root filesystems and processes. Their network namespace is the same object.
DNS And Pod Files
Pod DNS is configured around the network setup, not by it. Kubernetes defines DNS behavior for Services and Pods, and containerd mounts sandbox /etc/hosts, hostname, and /etc/resolv.conf files into workload containers when those files exist or are delegated to a sandbox controller.
CNI can return DNS fields, and plugins can participate in resolver configuration. But the resolver file a process reads, the search domains Kubernetes chooses, and the records served by the cluster DNS add-on are not created by CLONE_NEWNET. A working pod network needs both sides: an attached namespace and the pod files that make names resolve the way Kubernetes promises.
Cleanup
Teardown follows the ownership chain in reverse. CRI owns sandbox lifecycle state. CNI DEL removes the network attachment. libcni deletes chained plugins in reverse order, which lets decorators such as port mapping clean their rules before the base interface disappears. The runtime can then remove the sandbox network namespace after network teardown has run.
Cleanup is where partial failures matter. A failed ADD can leave an IPAM allocation, a host-side veth, or a firewall rule. A failed DEL can make the next pod fail for reasons that look unrelated.
Where This Goes
Part VI turns these relationships into experiments inside a disposable VM.
Sources And Further Reading
- Kubernetes networking model: https://kubernetes.io/docs/concepts/services-networking/
- Kubernetes Pods: https://kubernetes.io/docs/concepts/workloads/pods/
- Kubernetes DNS for Services and Pods: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
- containerd CRI sandbox run path: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/sandbox_run.go
- containerd CRI Linux sandbox network namespace helper: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/sandbox_run_linux.go
- containerd CRI CNI initialization: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/service_linux.go
- containerd pod sandbox controller: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/podsandbox/sandbox_run.go
- containerd CRI container create path: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/server/container_create.go
- containerd CRI pod namespace spec opts: https://github.com/containerd/containerd/blob/2976f38ccbfcda5ef1364d63d60b0a304e4bf94a/internal/cri/opts/spec_opts.go
- containerd
go-cni: https://github.com/containerd/go-cni/tree/v1.1.13