Chapter 18: Linux Primitive Experiments

These labs look at kernel objects directly: namespace links, mount table rows, cgroup files, process IDs — before any runtime appears.

These commands are sketches for a disposable Linux VM. Do not run the mutating sections on a normal workstation.

Namespace Membership

Question: what changes when a process enters new namespaces?

Scope: VM-only mutation, with an optional user-namespace variant later.

Start by recording the current namespace links:

readlink /proc/self/ns/{mnt,pid,uts,ipc,net,user,cgroup}

Then run a shell in new PID, mount, UTS, and IPC namespaces:

sudo unshare --fork --pid --mount --mount-proc --uts --ipc bash

Inside that shell, inspect the same links:

echo "inside pid: $$"
readlink /proc/self/ns/{mnt,pid,uts,ipc,net,user,cgroup}
ps -ef
exit

The PID namespace needs --fork because the new PID namespace applies to children. --mount-proc mounts a procfs view for that PID namespace, which is why ps becomes meaningful inside the lab shell. The network and user namespace links should not change in this exact command; the point is to see that namespaces are independent choices.

Cleanup is the shell exit. Verification is another read of the original shell's namespace links.

Entering A Target Namespace

Question: how does another process enter an existing namespace?

Scope: VM-only mutation.

In one shell, keep a namespaced process alive:

sudo unshare --fork --pid --mount --mount-proc --uts bash -c 'hostname cdb-lab; sleep 300'

In another shell, find that process and inspect its namespace links:

pid=$(pgrep -f "sleep 300" | head -n 1)
sudo readlink /proc/"$pid"/ns/{pid,mnt,uts}
sudo nsenter --target "$pid" --pid --mount --uts hostname

nsenter(1) is the user-space wrapper for setns(2). The test is whether /proc/$pid/ns/uts matches the link the parent shell read; the hostname output is only a sanity check.

Cleanup the sleeping process:

sudo kill "$pid"

Mount Namespace

Question: does a mount namespace give a process its own mount table?

Scope: VM-only mutation.

Run a shell with a new mount namespace, then make propagation private before creating test mounts:

sudo unshare --mount bash
mount --make-rprivate /
mkdir -p /tmp/cdb-mnt/source /tmp/cdb-mnt/target
touch /tmp/cdb-mnt/source/inside-source
mount --bind /tmp/cdb-mnt/source /tmp/cdb-mnt/target
findmnt /tmp/cdb-mnt/target
grep /tmp/cdb-mnt /proc/self/mountinfo
umount /tmp/cdb-mnt/target
exit

The bind mount exists in the new mount namespace. mount --make-rprivate / is there because shared mount propagation can make mount experiments surprise the host.

Verify cleanup from the original shell:

findmnt /tmp/cdb-mnt/target || true
rm -rf /tmp/cdb-mnt

Root Filesystem Boundary

Question: what does a root filesystem need before a process can run inside it?

Scope: VM-only mutation.

A directory is not automatically runnable: a dynamic binary needs its loader and shared libraries, /proc does not exist unless mounted, device nodes do not appear unless created or mounted, and a shell does not exist unless it is in the tree.

Whichever rootfs the lab uses — a static busybox tarball, an exported image, an extracted layer — the inspection checkpoints are the same:

find rootfs -maxdepth 2 -type f -o -type l | sort
file rootfs/bin/sh
ldd rootfs/bin/sh || true

rootfs/ is a directory tree, not an image or a snapshot; it can become / for a process once the mount namespace and root switch are set up. pivot_root(2) is the runtime's call (chapter 20 uses it inside an OCI bundle); chroot(2) is enough to demonstrate pathname resolution but does not produce container filesystem setup.

cgroup v2

Question: how does cgroup v2 attach a process to resource-control files?

Scope: VM-only mutation. On a systemd VM, prefer a delegated subtree. Do not write arbitrary cgroup files under host-managed services.

Start with inspection:

findmnt -no TARGET,FSTYPE,OPTIONS /sys/fs/cgroup
cat /sys/fs/cgroup/cgroup.controllers
cat /proc/self/cgroup

The mutating lab should create a named lab cgroup only under a subtree the lab owns. The smallest useful controller experiment is pids.max, because it can be observed without a benchmark:

sudo mkdir /sys/fs/cgroup/cdb-lab
echo $$ | sudo tee /sys/fs/cgroup/cdb-lab/cgroup.procs
cat /sys/fs/cgroup/cdb-lab/cgroup.procs
echo 20 | sudo tee /sys/fs/cgroup/cdb-lab/pids.max
cat /sys/fs/cgroup/cdb-lab/pids.current

Cleanup requires moving the shell back out before removing the cgroup:

echo $$ | sudo tee /sys/fs/cgroup/cgroup.procs
sudo rmdir /sys/fs/cgroup/cdb-lab

If rmdir fails, something still belongs to the cgroup or the hierarchy does not allow the lab to remove it.

Sources And Further Reading

namespaces(7): https://man7.org/linux/man-pages/man7/namespaces.7.html
unshare(1): https://man7.org/linux/man-pages/man1/unshare.1.html
nsenter(1): https://man7.org/linux/man-pages/man1/nsenter.1.html
mount_namespaces(7): https://man7.org/linux/man-pages/man7/mount_namespaces.7.html
mount(2): https://man7.org/linux/man-pages/man2/mount.2.html
pivot_root(2): https://man7.org/linux/man-pages/man2/pivot_root.2.html
Linux cgroup v2 docs: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
Linux shared subtree docs: https://github.com/torvalds/linux/blob/57b8e2d666a31fa201432d58f5fe3469a0dd83ba/Documentation/filesystems/sharedsubtree.rst