Linux Plumbers Conference 2014, Düsseldorf Docker and the Linux kernel Cristian S., Docker Inc. firstname.lastname@example.org
What is Docker ?
The Matrix From Hell
Another Matrix From Hell
Solution: the intermodal shipping container
Solution to the deployment problem: the Linux container
High level overview • Uses namespaces & cgroups • Runs on mainline kernels • Lower overhead than VMs • Can run a full system with an init or a single service/process • Snapshot based approach to build one image on top of another
Docker's code • Licensed under the Apache 2 license • There's no paid/premium/commercial version • Docker and its code are free and will be free • Lives at: https://github.com/docker/docker/
Use cases • Application development, testing, packaging & deployment • PaaS/SaaS/cloud infrastructure • Application & service isolation • Stress testing & benchmarking (including the Linux kernel)
Under the hood • Exec drivers provide the execution environment (virtualization/container tech) – native (libcontainer based) and LXC – are platform dependent – native is the default • Graph drivers are the storage providers – aufs, devicemapper, btrfs and vfs – vfs shouldn't be used (only used by the tests by default) – PRs open on GitHub for ZFS and OverlayFS • Existing graph & exec drivers only support Linux
Kernel requirements • Kernel 3.8 is the absolute minimum (except RHEL's 2.6.32) • Stable & supported kernels >= 3.10 are recommended • BTRFS has special requirements • Kernel configuration can be checked using https://raw.githubusercontent.com/docker/ docker/master/contrib/check-config.sh
AUFS graph driver stability & performance • Operations with containers are faster than devicemapper & (sometimes) btrfs • Known problems – Stale NFS file handle, cap_set_file, invalid argument on mount – Causes troubles on btrfs, remote file systems and many file systems which aren't ext3/ext4 – Direct IO problems & poor performance for IO intensive workloads • Limitations – Requires aufs-tools for auplink to dereference hard links – No support for hard links across layers – Limit of 127 layers – Can't be used on Fedora/RHEL/all systems which don't apply AUFS patches and don't ship aufs-tools • AUFS is developed outside of the mainline kernel tree • Update the kernel using distro updates
Devicemapper graph driver stability & performance • Uses loopback mounted block devices by default • Allows EXT4 or XFS to be used • Known problems – EBUSY errors (fix to be tested) – Potential file system corruption bug with ext4 ● Might be caused by loopback mounted block devices – Older kernels: space not freed when freed on the file system, various kernel errors and problems • Limitations – Devicemapper requires configuration of the storage to avoid the use of loopback mounted block devices • Update the kernel using distro updates
btrfs graph driver stability & performance
• BTRFS bugs are a problem for Docker • Using the RAID like features of BTRFS is likely to cause data loss • Known problems – Data can be corrupted if exotic mount options are used – Kernel 3.8 & older than the latest stable kernels can cause data loss & corruption – The file system becomes slower the more data is written & stored on it – Performance degrades quickly & fragmentation is a problem – Balancing the file system to fix fragmentation could trigger some bugs • BTRFS is used automatically if Docker's root folder is on BTRFS • Update the kernel using distro updates • Using the latest minor version of supported kernels is recommended
Performance & stability
• Generally speaking, the kernel has been stable – Fixes have been made to file systems, namespaces, cgroups, netfilter, aufs, btrfs, devicemapper and other kernel components – Recent kernels and the newest minor versions of LTS kernels have fixed a lot of problems, including devicemapper and btrfs problems – Some PID 1 issues are still being discussed – Changes around the cgroups and the namespaces have been made; this has also helped improve stability for containers • Performance is actively being studied – Docker's code has been improved to use less memory & be faster – Performance needs to be studied on the kernel side to achieve better scalability by a) making fewer syscalls in Docker where possible, b) making those syscalls faster in the kernel
• Avoid running kernels no longer supported by your Linux distribution (kernel 3.8 lts- raring from Ubuntu 12.04.x) • Open Docker issues on github; post errors (full kernel panics, btrfs check output, Docker daemon logs, `docker info`, `docker version` and `uname -a` output) when encountering errors • File bug reports with your distribution
What can Docker do for the kernel?
• Can be used to test the kernel's stability and evaluate its performance • Changes made to the kernel can be tested easily against Docker • Can be used to test the running kernel to avoid breaking user space compatibility • Makes it simple to mix workloads for stress testing and performance testing • Exposes some hard to trigger kernel problems
What can a kernel developer do with Docker?
• Find bottlenecks in the kernel's code • Run a KVM VM in a container • Stress testing for file systems, network, namespaces and the kernel in general • Large scale testing with containers • Hardware testing in containers • Network testing with advanced topologies
How can someone contribute to Docker? What about kernel developers?
• You don't necessarily have to write code • Providing feedback, doing code review or telling the right kernel developer to do that are also useful helpful • Suggestions on how to debug specific kernel bugs and bugs in general are welcome • Contributing to Docker is also helping the kernel because Docker relies on the kernel