This year, LinuxCon and ContainerCon attendees will have the opportunity to hear Jerome Petazzoni speak on Docker, Containers & Security: State Of Union. Jerome works at Docker Inc., where he helps others to containerize all the things. Jerome has worked in miscellaneous technical fields, including VOIP, embedded systems, web hosting, virtualization and cloud computing.
In this interview, he provides insight into Docker and explains how one can manage storage for Docker containers and how to make data move with Docker containers.
What are containers? What various container technologies exist, and how is Docker different from other container technologies?
From a high-level point of view, containers look like lightweight virtual machines. You can install whatever you want in a container, independently from (and without affecting!) other containers or the host environment. Each container has its own network stack, process (PID) space, file system, etc. And their footprint is significantly smaller than VMs: containers start faster, and they require less memory and disk space. This is because, from a low-level point of view, containers are just regular processes on the host machine, using kernel features like namespaces and control groups to provide the isolation. Starting a container is just starting a regular UNIX process; creating a container is just cloning a snapshot of a copy-on-write filesystem (which is extremely cheap nowadays, both in time and disk usage).
Docker is different from other container technologies because it is way more than a mere container engine. Docker is a platform that encompasses the Docker Engine (to run containers), the Docker Hub (a public library of ready-to-use container images, which can also be used to store users' custom images), and a vast ecosystem of tools like Docker Compose, Docker Machine, Docker Swarm, and many others; all gravitating around public and open APIs.
How is Docker different from other hypervisor virtualization technologies?
One could say that Docker is a "hypervisor for containers," but many purists would probably frown upon this metaphor, since hypervisors generally manage VMs, and Docker manages containers. The deep technical details are very different. When a hypervisor starts a VM, it creates virtual hardware, and leverages on specific CPU instructions or features, like VT-x, AMD-V, or privilege levels. When Docker creates a container, it leverages on kernel features like namespaces and control groups, without relying on specific hardware features.
This means that containers are more portable in one sense, since they can run on top of physical and virtual machines all the same; but they are also less portable in another sense, since a container will use its host kernel. This means that you cannot run e.g. a Windows container on a Linux kernel (except if your Linux kernel can execute Windows binaries).
How do you suggest to manage storage for Docker containers? How to link the data with docker containers?
Docker has the notion of "volumes," which are directories shared between the container and its host. Volumes are conceptually similar to "shared folders" on virtual machines, except that they don't require any particular setup in the container, and they have zero overhead, because they are implemented using bind mounts.
When you have data lying on a disk (whether it's an actual disk, or a RAID pool, or something mounted over the network, or anything else) the easiest option is to mount that data on the container host, then expose it to the container through this "volume" mechanism.
Docker also has a (brand new, still experimental) plugin mechanism allowing a container to provide storage for other containers. This means that a container could be responsible for the heavy lifting associated with being a member of a Ceph, Gluster, or other storage cluster, and expose block devices and mount points to other containers.
How do you make data move with the Docker containers when the Docker container is started on different machine?
Exactly like we did it before containers: network storage, distributed file systems, data transfer or synchronization with rsync, unison, etc. There are, however, two advantages when using containers. First, the way we access the data is abstracted away from the container. If I switch, for example, from DRBD to Ceph, my container doesn't have to know about it; in fact, the same container will run identically on local storage, or on distributed storage. The other advantage comes from those new storage plugins. They will make data access simpler, by separating cleanly the application container and the storage container.
How can you ensure changes are made to running Docker containers is updated back to the base container image?
Docker provides API calls to compare a container to its original image, and to create a new image from an existing container. The CLI counterparts to those API calls are "docker diff" and "docker commit."
How can docker containers help build highly available solutions?
When building a highly available system, you usually go through a pretty long checklist of things to do. Docker will make some of those steps easier. For instance, ensuring that you deploy consistent versions of your software on the relevant machines. Docker won't magically solve problems (the use of magic in systems operations is generally frowned upon) but it will make many things easier, faster, more reliable—just like using a package manager is generally more convenient than compiling everything from source.
How do you see adoption of Docker growing among enterprise customers in production environments?
The general picture looks like this: Docker starts as a development tool to achieve consistent, repeatable development environments, similar to Hashicorp's Vagrant. Then, it graduates to CI/CD, where it helps cutting testing times in half (or even more). From there, it gets used in staging or pre-production, where the risk is low. Eventually, once the operational team has gained enough experience and confidence with running Docker, it goes to serve production traffic.
This article is part of the Speaker Interview Series for LinuxCon, CloudOpen, and ContainerCon North America 2015. LinuxCon North America is an event where "developers, sysadmins, architects and all levels of technical talent gather together under one roof for education, collaboration and problem-solving to further the Linux platform."