EDIT: incredibly, I'm seeing people use this benchmark to argue both 1) that Doc...

jpgvm · on July 10, 2014

It's worth mentioning when you benchmark Docker you are never really benchmarking Docker unless what you are doing is measuring setup/teardown of containers.

What you -are- benchmarking is namespaces, cgroups, CoW filesystems and your underlying hardware.

As Solomon has already pointed out as soon as you use a bind mounted volume in Docker parlance you are simply benchmarking your host Linux kernel + host filesystem + host hardware, nothing more. I am unsure of whether Docker is now configuring the cgroups cfq or blkio controllers yet but that could also play a part.

The TLDR is this: Docker doesn't have an impact on performance because it's just glue code, none of Docker itself sits in any of the hot paths. You are better off benchmarking the individual components in the actual hot path. Also worth nothing that compute benchmarks will be absolutely worthless as they -will- be 1:1 native performance (because there are actually no additional moving parts).

michaelneale · on July 10, 2014

This comment deserves its own post - and shouting. This is a rather profound point which is lost on almost everyone and makes 99% of docker performance discussions a waste of time for people who should know better.

Blog posts like this one don't start a discussion, don't help anyone, they just get used in random FUD discussions from any and all points of view.

kerr23 · on July 10, 2014

Well, Docker is big news and people are interested in what the performance impact of "using docker" is.

Whether it's accurate or not, people associate Docker with the functionality if facilitates. (i.e., cgroups, namespaces, etc.) so i think it's valid to show the performance impact of running an application from within docker.

Based on my data, they're not 1:1 native performance. I suggest you consider the data before dismissing it.

coolj · on July 10, 2014

> Based on my data, they're not 1:1 native performance. I suggest you consider the data before dismissing it.

I think you're conflating his separate points:

"What you -are- benchmarking is namespaces, cgroups, CoW filesystems and your underlying hardware."

"Also worth nothing that compute benchmarks will be absolutely worthless as they -will- be 1:1 native performance"

sitkack · on July 10, 2014

People don't understand process isolation. Docker manages complexity but it is still just smoke and mirrors (processes and namespaces).

kerr23 · on July 9, 2014

Howdy.

I'm the author of the blog post.

I started the container with --volume in the case of "docker with no virtual I/O"

Which is the same as setting VOLUME in your Dockerfile.

kerr23 · on July 10, 2014

Incidentally. Now that I look at who's post I replied to. Obviously you know about --volume vs VOLUME.

I've updated the blog post to specify that since it was obviously unclear.

shykes · on July 11, 2014

Thanks, I have to admit I didn't think of that (saw VOLUME commented out in your Dockerfile, but didn't think that you might have moved it to a run script).

My point remains that Docker benchmarks are especially easy to spin any way you want, since usually they are really benchmarks of the underlying system configuration (with some parts glued together by Docker, but most parts outside of its control).

asuffield · on July 10, 2014

The big thing to take away from this is not "docker causes a performance change", it's that "there is some way to configure docker that affects performance". Obviously the expected behavior is that docker should have marginal performance impact, and any deviation from that is undesirable.

The numbers showing here are about what I would expect if something in the storage stack is doing more caching than normal. This likely has hidden costs, either in memory usage or robustness. It's worth digging down to find out what is happening, and how to validate docker configurations.

sandGorgon · on July 10, 2014

@shykes - what is the recommended way to manage logs? In this case, postgres would generate logs that need to be rotated (with a sighup to parent process). I'm not sure if I can use the host's logrotate to do that.

Currently I build containers with supervisord as pid 1 and logrotate running inside the container,but with logs being saved to a bind mounted volume.

Is this correct?

P.S. there are no docs, blog posts or articles on this topic. I'm a little puzzled if people are living with ephemeral logs.

nissimk · on July 10, 2014

There are actually a lot of blog posts and docs on this subject. Previous posters already explained some of the solutions, but I was just searching on this topic this morning so I thought I would share what I found:

It seems that the recently recommended setup is to create a container specifically to host volumes for both logs and other persistent data (ie database files). You then connect each container that needs to write to those volumes using the volumes-from directive. This is explained in blog posts and included in the documentation.

This stackoverflow has links to blogs and docs:

http://stackoverflow.com/a/20652410

michaelneale · on July 10, 2014

You can run your docker container supervised with tools like systemd, runit etc - and have stderr/stdout (and signals) forwarded on from the container process to the supervisor - from then on you do what you would normally do if it wasn't running in docker.

Other approaches are to have the app(s) in container log to a VOLUME that is bind-mounted in from the underlying host (where you can access them directly) - yet another approach is to bind mount in syslog or other tools into the container and allow the process(s) inside the container to log to it. All work well.

atombender · on July 10, 2014

Docker captures stderr; you can configure Postgres to simply use stderr logging, and then capture it in the host system.

Another option is syslog, which Postgres also supports, with which you would log to the host's syslog daemon. (You can mount the host's /dev/log if you don't want to deal with setting up the networking.)

sandGorgon · on July 10, 2014

Capturing it in the host system would imply logrotate on the host - which I am not sure about because logrotate needs to issue sighup to log-creator.

Currently, I am bind mounting /dev/log which means syslog is logging to the host system - but again, the same issues.

meatmanek · on July 10, 2014

If you're sending logs out to the host's syslog, then the log creator is your syslog daemon.

Also, processes in containers exist as processes on the host (they have different PIDs inside and outside of the container due to PID namespacing); so logrotate should be able to send signals to containerized processes.

sandGorgon · on July 10, 2014

hey thanks for that comment - could you talk about it in a slight more detail.

I have rsyslog on the host and I'm sending logs to host syslog through bind mounting of /dev/log . Is this what you meant by sending out logs to host ? I'm having a lot of trouble figuring out how to do it any other way.

Could you also confirm that rsyslog on host will be able to send appropriate signals to the container's syslog ... which forwards it to the containerized process ? I'm unable to find documentation in rsyslog (or syslog-ng) that talks about this behavior so I'm not sure.

atombender · on July 10, 2014

No signaling needed other than rsyslog itself.

Sighup is needed when processes (such as Postgres) write directly to log files. These files are open, and when you want to rotate the file, you have to tell the process to close the files so that it will start writing to a new file.

If you tell Postgres to log to rsyslog (or any other syslog daemon), the log data will be sent to rsyslog via UDP, TCP or a Unix socket. Postgres itself will not have any files open, so there is no need to sighup it.

You will have to sighup rsyslog in the host system, though.

noselasd · on July 10, 2014

If you bind mount /dev/log of the host into your container, you only run rsyslogd on the host. There's no need to send any SIGHUP to anything in the container.

hosay123 · on July 9, 2014

So basically, you need to short circuit one of Docker's core features to get decent performance. I like it.

wmf · on July 9, 2014

Even ignoring performance, putting persistent data (like databases) on volumes is a best practice since the normal filesystem inside the container is ephemeral.

dsissitka · on July 10, 2014

"...since the normal filesystem inside the container is ephemeral."

Not by default:

  # docker run -i -t ubuntu bash
  root@48dbef418f4c:/# echo foo > /bar
  root@48dbef418f4c:/# exit
  # docker restart 48dbef418f4c
  48dbef418f4c
  # docker attach 48dbef418f4c
  root@48dbef418f4c:/# cat /bar
  foo
  root@48dbef418f4c:/#

Docker won't delete a container unless you tell it to.