How Docker works

The Problem Statement

Interviewer: "Walk me through what actually happens when you run docker run nginx. What system calls fire, what kernel features are used, and how is the container isolated from the host? How is this different from a virtual machine?"

This question tests three things: whether you understand that containers are not VMs, whether you know the specific Linux kernel primitives (namespaces, cgroups, OverlayFS) that make isolation work, and whether you can articulate the role Docker plays on top of those primitives.

Most candidates say "containers are lightweight VMs." That is wrong. A container is a regular Linux process with restricted visibility and limited resources. There is no hypervisor, no guest kernel, no hardware emulation. The difference between "lightweight VM" and "isolated process using kernel namespaces" is the difference between a surface-level answer and a senior one.

I like this question because it separates people who have used Docker from people who understand Docker. Everyone can type docker run. Far fewer can explain what happens between pressing Enter and the container serving its first request.

Clarifying the Scenario

You: "Great question. Let me clarify scope before I dive in."

You: "When you say 'how Docker works,' do you want me to focus on the kernel primitives (namespaces, cgroups, filesystem layers) or the Docker daemon architecture (containerd, runc, the image registry protocol)?"

Interviewer: "Both. Start with the kernel primitives, then explain what Docker adds on top."

You: "Got it. Should I also cover networking? Docker has multiple network modes that behave very differently."

Interviewer: "Yes, cover networking. I want the full picture."

You: "One more thing. Should I compare containers to VMs at the start to set the baseline?"

Interviewer: "Briefly, yes."

You: "OK. I will structure this in four parts: how containers differ from VMs at the kernel level, the three kernel primitives that make isolation work, how Docker orchestrates these primitives through its daemon architecture, and how container networking works across the different modes."

My Approach

I break this into five parts:

Containers vs VMs: The fundamental architectural difference at the kernel boundary
Namespaces: How Linux gives each container its own view of PID, network, filesystem, and more
Cgroups: How the kernel enforces CPU, memory, and I/O limits per container
OverlayFS and image layers: How Docker builds images as stacked read-only layers with a writable top
Container networking: Bridge, host, overlay, and macvlan modes and when each applies

The key insight most people miss: Docker itself does not isolate anything. The Linux kernel does all the isolation. Docker is a user-friendly toolchain (CLI, daemon, image format, registry protocol) that calls the right kernel APIs in the right order. If you wanted to, you could create a "container" with raw unshare and cgroup commands. Docker just makes it reproducible and portable.

Here is the mental model I use. Think of a VM as renting a separate apartment in a building. You get your own walls, your own plumbing, your own electrical panel. A container is more like renting a desk in a coworking space. You share the building's walls, plumbing, and electrical (the kernel), but you get a divider around your desk (namespaces), a cap on how much electricity your desk can use (cgroups), and your own set of files in a locked drawer (OverlayFS).

Aspect	Virtual Machine	Container
Kernel	Own guest kernel	Shares host kernel
Boot time	30-60 seconds	Milliseconds
Memory overhead	512 MB - 4 GB per VM	5 - 50 MB per container
Isolation strength	Hardware-level (hypervisor)	Process-level (kernel features)
Density	10-20 per host	100-1000+ per host
Filesystem	Full disk image	Layered filesystem (OverlayFS)
Security boundary	Strong (separate kernel)	Weaker (shared kernel attack surface)

Containers share the host kernel. This means a kernel vulnerability affects every container on the host. This is why VM-based isolation (like Firecracker, gVisor, or Kata Containers) exists for multi-tenant workloads where you do not trust the container code.

The Architecture

Here is the full picture of what happens when you run docker run nginx, from the CLI command down to the kernel:

Here is the step-by-step walkthrough:

CLI to daemon: docker run nginx sends a REST API call to the Docker daemon (dockerd). The daemon validates the image reference, pull policy, and runtime flags.
Daemon to containerd: dockerd calls containerd over gRPC. containerd is the actual container lifecycle manager. It handles image pulling, creating OverlayFS snapshots, and managing container tasks.
Snapshot creation: containerd creates an OverlayFS mount for the container. The nginx image layers become read-only lower directories, and a new writable upper directory is created for this specific container instance.
Shim spawn: containerd spawns a containerd-shim process for this container. The shim is the parent process of the actual container. If containerd crashes or restarts, the shim keeps the container alive. This is how you can upgrade Docker without killing running containers.
runc execution: The shim forks and execs runc, the OCI-compliant runtime. runc calls clone() with namespace flags (CLONE_NEWPID, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWUTS, CLONE_NEWIPC). The resulting child process is the container's PID 1.
cgroup setup: runc creates a new cgroup hierarchy under /sys/fs/cgroup/ and writes the CPU, memory, and I/O limits from the docker run flags (--memory, --cpus).
Network setup: runc creates a veth pair (virtual ethernet), attaches one end to the container's network namespace and the other to the Docker bridge (docker0). The container gets its own IP address from the bridge's subnet.

For your interview: the critical insight is this layered architecture. Docker is not one monolithic program. It is a stack: CLI calls daemon, daemon calls containerd, containerd calls runc, runc calls the kernel. Each layer has a clear responsibility and can be swapped independently. This is why you can use Podman (which replaces dockerd) or gVisor (which replaces runc) without changing anything else.

Namespaces and Cgroups: The Isolation Primitives

This is where the real isolation happens. Docker does not isolate anything. The kernel does, through two mechanisms: namespaces (what the process can see) and cgroups (what the process can use).

The Problem Statement

Interviewer: "Walk me through what actually happens when you run docker run nginx. What system calls fire, what kernel features are used, and how is the container isolated from the host? How is this different from a virtual machine?"

Clarifying the Scenario

You: "Great question. Let me clarify scope before I dive in."

Interviewer: "Both. Start with the kernel primitives, then explain what Docker adds on top."

You: "Got it. Should I also cover networking? Docker has multiple network modes that behave very differently."

Interviewer: "Yes, cover networking. I want the full picture."

You: "One more thing. Should I compare containers to VMs at the start to set the baseline?"

Interviewer: "Briefly, yes."

My Approach

I break this into five parts:

Containers vs VMs: The fundamental architectural difference at the kernel boundary
Namespaces: How Linux gives each container its own view of PID, network, filesystem, and more
Cgroups: How the kernel enforces CPU, memory, and I/O limits per container
OverlayFS and image layers: How Docker builds images as stacked read-only layers with a writable top
Container networking: Bridge, host, overlay, and macvlan modes and when each applies

Aspect	Virtual Machine	Container
Kernel	Own guest kernel	Shares host kernel
Boot time	30-60 seconds	Milliseconds
Memory overhead	512 MB - 4 GB per VM	5 - 50 MB per container
Isolation strength	Hardware-level (hypervisor)	Process-level (kernel features)
Density	10-20 per host	100-1000+ per host
Filesystem	Full disk image	Layered filesystem (OverlayFS)
Security boundary	Strong (separate kernel)	Weaker (shared kernel attack surface)

The Architecture

Here is the full picture of what happens when you run docker run nginx, from the CLI command down to the kernel:

Here is the step-by-step walkthrough:

CLI to daemon: docker run nginx sends a REST API call to the Docker daemon (dockerd). The daemon validates the image reference, pull policy, and runtime flags.
Daemon to containerd: dockerd calls containerd over gRPC. containerd is the actual container lifecycle manager. It handles image pulling, creating OverlayFS snapshots, and managing container tasks.
Snapshot creation: containerd creates an OverlayFS mount for the container. The nginx image layers become read-only lower directories, and a new writable upper directory is created for this specific container instance.
Shim spawn: containerd spawns a containerd-shim process for this container. The shim is the parent process of the actual container. If containerd crashes or restarts, the shim keeps the container alive. This is how you can upgrade Docker without killing running containers.
runc execution: The shim forks and execs runc, the OCI-compliant runtime. runc calls clone() with namespace flags (CLONE_NEWPID, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWUTS, CLONE_NEWIPC). The resulting child process is the container's PID 1.
cgroup setup: runc creates a new cgroup hierarchy under /sys/fs/cgroup/ and writes the CPU, memory, and I/O limits from the docker run flags (--memory, --cpus).
Network setup: runc creates a veth pair (virtual ethernet), attaches one end to the container's network namespace and the other to the Docker bridge (docker0). The container gets its own IP address from the bridge's subnet.

Namespaces and Cgroups: The Isolation Primitives

This is where the real isolation happens. Docker does not isolate anything. The kernel does, through two mechanisms: namespaces (what the process can see) and cgroups (what the process can use).

How Docker works

The Problem Statement

Clarifying the Scenario

My Approach

The Architecture

Namespaces and Cgroups: The Isolation Primitives

Continue Reading with Premium

Comments

How Docker works

The Problem Statement

Clarifying the Scenario

My Approach

The Architecture

Namespaces and Cgroups: The Isolation Primitives

Continue Reading with Premium

Comments