Upload
simon-boulet
View
1.715
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Containers technologies have been gaining a lot of traction in the DevOps world, especially with the arrival of Docker.io. But these technologies have been around for more than 10 years. In this talk, we will dive through the history of Linux Containers, and how it differentiates with traditional virtualization technologies. From Linux UML and VServer days in the early 2000s, through OpenVZ and the rise of Linux namespaces and cgroups, to LXC, the new kid on the block.
Citation preview
A Decade of Linux Containers
Simon BouletConsultant, Deployment and [email protected]
A Decade of Linux Containers
An Introduction to Containers
“[...] should it be possible for the operating system to ensure that excessive resource usage by one group of processes doesn't interfere with another group of processes? Should it be possible for a single kernel to provide resource-usage statistics for a logical group of processes? Likewise, should the kernel be able to allow multiple processes to transparently use port 80?”Glauber Costa, Parallels (SWSoft / company behind OpenVZ)http://lwn.net/Articles/524952/
Containers (vs virtualization)
● Group processes together to create secure, isolated virtual environments
● Share the host kernel / operating system● Generally perform better than traditional
virtualization● Often have limitations with kernel features
(VPN, loopback devices, iptables, FUSE, NFS, etc.)
User-Mode Linux (UML)● Kernel patch to compile the Linux kernel as
“regular” binary. Run linux inside linux: ./linux● First paper in August 2000, Linux 2.2.x [1]● Mainstream since 2.6.0 (December 2003)● No root access needed (network requires
TUN/TAP)● Linode was initially offering UML containers
and switched to Xen on March 28, 2008 [2]● Works out of the box with all recent kernels [3][1] http://user-mode-linux.sourceforge.net/old/als2000/index.html[2] https://blog.linode.com/2008/03/28/linodes-in-xen/[3] http://uml.devloop.org.uk/
Linux-VServer
● Created by Jacques Gelinas, a Montrealer● First public announcement October 2001 [1] ● Use a “security context” concept to isolate
processes (similar to Linux Namespaces)● Still alive (latest patch for Linux 3.10.21)● Dreamhost (the company behind Ceph) still
use Linux-VServer for their VPS offering
[1] http://www.cs.helsinki.fi/linux/linux-kernel/2001-40/1065.html[2] http://www.dreamhost.com/servers/vps/
OpenVZ
● Patch based on latest RHEL kernel (currently 2.6.32; 40MB gzip patch). Extends Linux Cgroups/Namespaces features
● Mature (initial release in 2005), OSS behind Parallels Virtuozzo (commercial)
● Future of OpenVZ lies within Linux Cgroups/ Namespaces. Recent version of OpenVZ tools work partially with recent mainstream kernels
● OpenVZ developers very active in Linux kernel/Namespaces community
OpenVZ Contributions to Linux Kernel
http://openvz.org/Development_portal
OpenVZ: LXC/Namespaces older brother“OpenVZ is great, and it has been around for longer than LXC, so some people consider it to be more stable and secure. However, one has to keep in mind that LXC and OpenVZ share many developers in common, and that LXC is nothing else than “OpenVZ redesigned to be able to be merged into the mainline kernel”. Therefore, OpenVZ will eventually sunset, to be fully replaced by LXC.”Jérôme Petazzoni, Senior Engineer at dotCloud (company behind Docker)http://blog.docker.io/2013/08/containers-docker-how-secure-are-they/
LXC
● Docker uses LXC for creating containers● First release of LXC September 2008● Set of userspace tools to create containers
on top of Linux Cgroups and Namespaces● LXC containers are not fully secure yet.
It’s possible for root inside container to escape and gain root on host. Need AppArmor/SELinux. Future lies in the User namespace.
Linux NamespacesDifferent namespaces = Different “Views” of the kernel
Linux 2.4.19 - 3 Aug 2002 Mount namespace Mount Points
Linux 2.6.19 - 29 Nov 2006 UTS namespace Hostname
IPC namespace Interprocess communication
Linux 2.6.24 - 24 Jan 2008 PID namespace Processes in different PID namespace can have the same PID
Network namespace Network devices, IP addresses, routing tables, iptables entries
Linux 3.8 - 18 Feb 2013 User namespace Root privileges for operations inside a user namespace, but unprivileged outside the namespace. Number of Linux filesystems are not yet user-namespace aware.
http://lwn.net/Articles/531114/
Linux Cgroups● Virtually group processes together, apply
limits, priority, accounting, etc.● Divided in subsystems, each subsystem
representing a resource (CPU, memory, etc)blkio Limit input/output access to and from block devices
cpu Uses the scheduler to provide access to the CPU
devices Allows or denies access to devices
freezer Suspends or resumes tasks in a cgroup
memory Set limits on memory use by tasks in a cgroup, and generates automatic reports on memory resources used by those tasks
...
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch01.html
Playing with Cgroups
● Cgroups are configured through the cgroup virtual file system (similar to /proc)
● Mounting the cgroup virtual filesystem for the desired subsystem (ex. blkio):
● Create a new cgroup named “1mbsec” in the blkio sybsystem:
sudo mkdir -p /sys/fs/cgroup/blkiosudo mount -t cgroup -oblkio blkio /sys/fs/cgroup/blkio
sudo mkdir /sys/fs/cgroup/blkio/1mbsec
Playing with Cgroups (cont.)
● Set a limit of 1MB/ sec on this cgroup:
● Attach current process (shell) to the 1mbsec cgroup:
● Writes are now throttled to 1MB/sec:
echo '253:2 '$((1024*1024)) |sudo tee /sys/fs/cgroup/blkio/1mbsec/blkio.throttle.write_bps_device
echo $$ | sudo tee /sys/fs/cgroup/blkio/1mbsec/tasks
dd if=/dev/zero of=100mbtest.bin bs=1M count=100 conv=fdatasync100+0 records in100+0 records out104857600 bytes (105 MB) copied, 100.055 s, 1.0 MB/s
My Personal Experience● OpenVZ is generally the “go-to” for public /
production containers (unless you need some of the recent kernel features)
● LXC is gaining a lot of interest, especially with tools like Docker. Escaping LXC containers is a major security issue, you will need to learn AppArmor/SELinux to secure LXC
● User-Mode Linux is a very well kept secret. It’s a great way to quickly run containers, especially in non-root environments, and works out the box with all recent kernels.