Unveiling Docker's Secrets: Exploring Namespaces, Cgroups, and Union Filesystems

Unveiling Docker's Secrets: Exploring Namespaces, Cgroups, and Union Filesystems

INTRODUCTION
Imagine a world where applications run seamlessly across environments—development, testing, and production—without the familiar headache of "it works on my machine." Docker made this dream a reality, revolutionizing how software is built, shipped, and deployed.

In today’s blog, we’ll dive deep into the mechanics of how Docker orchestrates and manages several containers seamlessly. Have you ever wondered how Docker allows multiple containers to run side by side without stepping on each other’s toes? Or how it ensures that a single container doesn’t hog all the system’s resources? Or even how it magically layers file systems to create lightweight, efficient container images?

The answers lie in three fundamental building blocks of Docker’s architecture: Namespaces, Control Groups (cgroups), and Union Filesystems. By the end of this blog, you’ll have a clear understanding of how Docker brings the promise of containerization to life. Whether you're a developer curious about Docker's inner workings or a systems enthusiast seeking to deepen your technical knowledge, this blog is for you.

So, let’s dive in and uncover how Docker transforms your system into a powerhouse of isolated, resource-efficient, and lightning-fast containers!

NAMESPACES.

A namespace is a key feature of the Linux kernel that creates isolated environments, restricting what processes can see or access. This is essential for technologies like Docker, which use namespaces to ensure secure and isolated execution of applications. It's like putting processes in their own "private room" where they can only see and interact with their own resources, even though they are sharing the same physical computer with other processes.

Example : Let’s assume that you have a host system where the init process ( the first process that the Linux kernel starts after it has initialized itself ) with PID 1 manages processes at the system level. On this host you start two Docker containers: Container A and Container B.

When you start container A it get’s its own PID namespace. Within this namespace:

The first process inside container A ( let’s say a web server) is assigned PID 1. Additional processes started in container A will get PIDs like 2, 3, etc, but these PIDs are visible only within Container A

Similarly When you start container A it get’s its own PID namespace. Within this namespace:

The first process inside container A ( let’s say a database server) is assigned PID 1. Additional processes started in container A will get PIDs like 2, 3, etc, but these PIDs are visible only within Container A

From the host system’s perspective, all container processes are visible but with different PIDs. For example:

The web server in Container A may appear as PID 201 on the host. The database server in Container A may appear as PID 301 on the host.

So I hope that you understood the concept of namespaces now let’s see the various types of namespaces

  1. PID (Process ID) Namespace: Isolates the process ID number space, allowing containers to have their own independent set of process IDs. This prevents processes inside a container from being aware of processes outside the container.

  2. NET (Network) Namespace: Isolates network resources such as interfaces, IP addresses, routing tables, and port numbers. Each container gets its own virtual network stack, ensuring network traffic and configurations are isolated.

  3. USER Namespace: Isolates user and group ID numbers, allowing containers to have their own set of user and group IDs. This enhances security by preventing privilege escalation attacks. Docker isolates the user IDs and group IDs of processes in a container. This means that processes in one container cannot see the user IDs and group IDs of processes in another container.

  4. MNT Namespace: The MNT (Mount) namespace isolates the set of filesystem mount points, allowing each container to have its own independent view of the filesystem hierarchy. This means that containers can mount and unmount filesystems without affecting the host or other containers.

  5. UTS Namespace: The UTS (Unix Timesharing System) namespace allows each container to have its own hostname and domain name, providing isolation for system identification attributes. Here’s how it works

  6. IPC Namespace: The IPC (Inter-Process Communication) namespace in Docker isolates the communication mechanisms that processes use to exchange data, such as shared memory segments, semaphores, and message queues

Cgroups (control groups)

Control Groups (cgroups) are a Linux kernel feature that allows you to allocate, limit, and monitor the resources (such as CPU, memory, I/O, and network bandwidth) used by a group of processes.

control groups allocate a specific amount of resources (eg: CPU core or memory) to a groups of processes. For example a container running a database can be limited to use only 2 GB of memory. Cgroups prevent processes from exceeding the allocated resources.

Union File System

Docker uses a union file system to efficiently manage container images and layers. It leverages OverlayFS is a union mount filesystem implementation for Linux. So basically with the overlayFS, and more specifically the overlay2 storage driver which docker uses contains 3 layers.

Base Layer

Overlay Layer

Diff Layer

Let’s see each and every layer in Detail

Base Layer

The base layer contains the foundational files for the file system and is set to read-only. When you pull an image, such as Ubuntu, this image represents the base layer. Simply put, you can think of it as the starting point or foundation for your container.

Overlay Layer

The overlay layer is the primary workspace where users interact. It initially displays the base layer's content but allows users to make changes, including modifying files. When modifications occur, the changes are not directly applied here but are instead stored in a separate layer. This layer presents a unified view by combining the base layer with any updates from the diff layer, prioritizing files from the diff layer over those in the base layer. If you're familiar with Docker, this layer represents the view you interact with when running a container.

Diff Layer

The diff layer stores all modifications made in the overlay layer.

You might wonder what happens when a file in the base layer is modified. The solution lies in the concept of "copy-on-write." When a change is made to a file that already exists in the base layer, the overlay filesystem duplicates the file into the diff layer, and the updates are applied to this copy. This process ensures seamless functioning of the union filesystem, making copy-on-write a critical mechanism.