Containers
2026-03-04
Edited: 2026-03-04
Containers
Containers are used to isolate processes from a host system. The isolation can include the filesystem, network, processes, and resources. While containers can also be used for security reasons, a virtual machine (VM) is more secure but at the cost of performance and overhead. Containers on the other hand, avoids the performance issues by relying on the host for resources, but using native OS features to restrict access.
Chroot
Chroot, short for change root, restricts the filesystem, only allowing a certain view of the filesystem. Injection of files and directories can be done through read-only mounts.
See: `chroot`
Namespaces
However, processes inside the chroot can still access and manipulate processes outside. Namespaces provides isolation for the process tree, network, and mounts. Multiple namespaces can be created and join, to allow for example, sharing some resources from the host.
For example, there are `PID` and `network` namespaces to isolate the process tree and network stack respectively.
See: `unshare`, `nsenter`
User Namespaces
These are different from the namespaces in the previous section. Common in the context of rootless containers, these user namespaces are used to map a range of UIDs and GIDs on the host to a different range of UIDs and GIDs on the container.
Cgroups
Cgroups, short for control groups, allow restricting resources, such as memory or cpu, used by the container, and kills the processes as needed.
Cgroups and their configuration can be found under `/sys/fs/cgroup/`. The PIDs of which the cgroup is applied to is in the the `/sys/fs/cgroup/cgroup_name/cgroup_config_name/task` file.
Capabilities
Capabilities are like split-off powers of root and provides fine-grained controls of what a process can do when using root. It goes without saying that for containers, the less dangerous capabilities are granted, the better.
See: `capsh`
Seccomp
Seccomp, short for secure computing, restricts what system calls a process can make. This is even more restrictive then capabilities.
Linux Security Modules and Mandatory Access Controls
This section does not really have to do with containers as much as they do with the host system in general, but are still pretty important.
Abbreiviated as LSMs, are as their name implies, linux security modules are a series of modules for Linux to support security policies and modules. These are wide sweeping changes applied to the entire system unlike the above methords which are generally on a file by file or application by application basis.
Mandatory access controls, MAC for short, restrict what could be accessed by a process, program, or threads. The restriction can be applied to files, directories, networks, memory, or devices. Attributes can be assigned to the restriction, such as read-only or write-only.
Under Linux, `tomoyo`, `apparmor`, `selinux` are the prominent MACs. Apparmor operates based on path while Selinux operates on inodes. Tomoyo is similar to Apparmor, but includes a self-learning profile stage where it attempts to creates restriction itself.
See also: `landlock`, `lockdown`, `yama`, `integrity`, `bpf`
References
Containers from scratch barco: Linux Containers From Scratch in C 7 Ways to Escape a Container user namespaces