A critical view on Docker

TL;DR Before you start reading this, I want to make it clear that I absolutely don’t hate Docker or the application container idea in general, at all!. I really see containers become a new way of doing things in addition to the existing technologies. In fact, I use containers myself more and more.

Currently I’m using Docker for local development because it’s so easy to get your environment up and running in just e few seconds. But of course, that is “local” development. Things start to get interesting when you want to deploy over multiple Docker hosts in a production environment.

At the “Pragmatic Docker Day” a lot of people who were using (some even in production) or experimenting with Docker showed up. Other people were completely new to Docker so there was a good mix.

During the Open Spaces in the afternoon we had a group of people who decided to stay outside (the weather was really to nice to stay inside) and started discussing the talks that were given in the morning sessions. This evolved in a rather good discussion about everyone’s personal view on the current state of containers and what they might bring in the future. People chimed in and added their opinion to the conversation

That inspired me to write about the following items which are a combination of the things that came up during the conversations and my own view on the current state of Docker.

The Docker file

A lot of people are now using some configuration management tool and have invested quite some time in their tool of choice to deploy and manage the state of their infrastructure. Docker provides the Dockerfile to build/configure your container images and it feels a bit like a “dirty” way/hack to do this given that config management tools provide some nice features.

Quite some people are using their config management tool to build their container images. I for instance upload my Ansible playbooks into the image (during build) and then run them. This allows me to reuse existing work I know that works. And I can use it for both containers and non-containers.

It would have been nice if Docker somehow provided a way to integrate the exiting configuration management tools a bit better. Vagrant does a better job here.

As far as I know you also can’t use variables (think Puppet Hiera or Ansible Inventory) inside your Dockerfile. Something configuration management tools happen to do be very good at.

Bash scripting

When building more complex Docker images you notice that a lot of Bash scripting is used to prep the image and make it do what you want. Things like passing variables into configuration files, creating users, preparing storage, configure and start services, etc.. While Bash is not necessarily a bad thing, it all feels like a workaround for things that are so simple when not using containers.

Dev vs Ops all over again?

The people I talked to agreed on the fact that Docker is rather developer focused and that it allows them to build images containing a lot of stuff where you might have no control over. It abstracts away possible issues. The container works so all is well..right?

I believe that when you start building and using containers the DevOps aspect is more important then ever. If for instance a CVE is found in a library/service that has been included in the container image you’ll need to update this in your base image and then rolled out through your deployment chain. To make this possible all stakeholders must know what is included, and in which version of the Docker image. Needless to say this needs both ops and devs working together. I don’t think there’s a need for “separation of concerns” as Docker likes to advocate. Haven’t we learned that creating silo’s isn’t the best idea?

More complexity

Everything in the way you used to work becomes different once you start using containers. The fact that you can’t ssh into something or let your configuration management make some changes just feels awkward.

Networking

By default Docker creates a Linux Bridge on the host where it creates interfaces for each container that gets started. It then adjusts the iptables nat table to pass traffic entering a port on the host to the exposed port inside the container.

To have a more advanced network configuration you need to look at tools like weave, flannel, etc.. Which require more research to see what fits your specific use case best.

Recently I was wondering if it was possible to have multiple nics inside your container because I wanted this to test Ansible playbooks that configure multiple nics. Currently it’s not possible but there’s a ticket open on GitHub https://github.com/docker/docker/issues/1824 which doesn’t give me much hope.

Service discovery

Once you go beyond playing with containers on your laptop and start using multiple docker hosts to scale your applications, you need to have a way to know where the specific service you want to connect to is running and on what port it is running. You probably don’t want to manually define ports per container on each host because that will become tedious quite fast. This is were tools like Consul, etcd etc.. come in. Again some extra tooling/complexity.

Storage

You will always have something that needs persistence and when you do, you’ll need storage. Now, when using containers the Docker way, you are assumed to put as much as possible inside the container image. But some things like log files, configuration files, application generated data, etc.. are a moving target.

Docker provides volumes to pass storage from the host inside a container. Basically you map a path on the host to a path inside the container. But this poses some questions like, how do I share this in case the container gets started, how can I make sure this is secure? How do I manage all these volumes? What is the best way to share this among different hosts? …

One way to consolidate your volumes is to use “data-only” containers. This means that you run a container with some volumes attached to it and then link to them from other containers so they all use a central place to store data. This works but has some drawbacks imho.

This container just needs to exist (it doesn’t even need to be running) and as long as this container or a container that links to it exists, the volumes are kept on the system. Now, if you by accident delete the container holding the volumes or you delete the last container linking to them, you loose all your data. With containers coming and going, it can become tricky to keep track of this and making mistakes at this level has some serious consequences.

Security

Docker images

One of the “advantages” that Docker brings is the fact that you can pull images from the Docker hub and from what I have read this is in most cases encouraged. Now, everyone I know who runs a virtualization platform will never pull a Virtual Appliance and run it without feeling dirty. when using a cloud platform, chances are that you are using prebuild images to deploy new instances from. This is analogue to the Docker images with that difference that people who care about their infrastructure build their own images. Now most Linux distributions provide an “official” Docker image. These are the so called “trusted” images which I think is fine to use as a base image for everything else. But when I search the Docker Hub for Redis I get 1546 results. Do you trust all of them and would you use them in your environment?

What can go wrong with pulling an OpenVPN container. Right..?

This is also an interesting read: https://titanous.com/posts/docker-insecurity

User namespacing

Currently there’s no user namespacing which means that if a UID inside the docker container matches the UID of a user on the host, that user will have access to the host with the same permissions. This is one of the reasons why you should not run processes as the root user inside containers (and outside). But even then you need to be careful with what you’re doing.

Containers, containers, containers..

When you run more and more stuff in containers, you’ll end up with a few hundred, thousand or even more containers. If you’re lucky they all share the same base image. And even if they do, you still need to update them with fixes and security patches which results in newer base images. At this point all your existing containers should be rebuild and redeployed. welcome to the immutable world..

So the “problem” just shifts up a layer. A Layer where the developers have more control over what gets added. What do you do when the next OpenSSL bug pops up? Do you know which containers has which OpenSSL version..?

Minimal OS’s

Everyone seems to be building these mini OS’s these days like CoreOS, ProjectAtomic, RancherOS, etc.. The idea is that updating the base OS is a breeze (reboot, AB partition etc..) and all services we need are running inside containers.

That’s all nice but people with a sysadmin background will quickly start asking questions like, can I do software raid? Can I add my own monitoring on this host? Can I integrate with my storage setup? etc…

Recap

What I wanted to point out is that when you decide to start using containers, keep in mind that this means you’ll need to change your mindset and be ready to learn quite some new ways to do things.

While Docker is still young and has some shortcomings I really enjoy working with it on my laptop and use it for testing/CI purposes. It’s also exciting (and scary at the same time) to see how fast all of this evolves.

I’ve been writing this post on and off for some weeks and recently some announcements at Dockercon might address some of the above issues. Anyway, if you’ve read until here, I want to thank you and good luck with all your container endeavors.