Containers
2025-01-13
Containers
Containers are used to isolate processes from a host system. The isolation can include the filesystem, network, processes, and resources. While containers can also be used for security reasons, a VM is more secure but at the cost of performance and overhead. Containers on the other hand, avoids the performance issues by relying on the host for resources, but using native OS features to restrict access.
Chroot
Chroot, short for change root, restricts the filesystem, only allowing a certain view of the filesystem. Injection of files and directories can be done through read-only mounts.
See: chroot
Namespaces
However, processes inside the chroot can still access and manipulate processes outside. Namespaces provides isolation for the process tree, network, and mounts. Multiple namespaces can be created and join, to allow for example, sharing some resources from the host.
For example, there are PID
and network
namespaces to isolate the process tree and network stack
respectively.
See: unshare
, nsenter
Cgroups
Cgroups, short for control groups, allow restricting resources, such as memory or cpu, used by the container, and kills the processes as needed.
Cgroups and their configuration can be found under
/sys/fs/cgroup/
. The PIDs of which the cgroup is applied to
is in the the
/sys/fs/cgroup/cgroup_name/cgroup_config_name/task
file.
Capabilities
Capabilities are like split-off powers of root and provides fine-grained controls of what a process can do when using root. It goes without saying that for containers, the less dangerous capabilities are granted, the better.
See: capsh
Seccomp
Seccomp, short for secure computing, restricts what system calls a process can make. This is even more restrictive then capabilities.
Linux Security Modules
Abbreiviated as LSMs, are as their name implies, a series of modules for Linux to support security policies and modules.
Mandatory Access Controls
MAC, for short, restrict what could be accessed by a process, program, or threads. The restriction can be applied to files, directories, networks, memory, or devices. Attributes can be assigned to the restriction, such as read-only or write-only.
Under Linux, tomoyo
, apparmor
,
selinux
are the prominent MACs. Apparmor operates on path
while Selinux operates on inodes. Tomoyo is similar to Apparmor, but
includes a self-learning profile stage where it attempts to creates
restriction itself.
More
landlock
, lockdown
, yama
,
integrity
, bpf
References
Containers from scratch barco: Linux Containers From Scratch in C 7 Ways to Escape a Container