Tuning Docker with the newest security enhancements

No readers like this yet.
Shipping containers stacked

Wikimedia Commons

It has been a while since I wrote the first two articles in my series on Docker security. This article will give an update on what has been added to Docker since then and cover new functionality that is going through the merge process with upstream Docker.

Adjusting Capabilities

In the previous articles, I covered container separation based on Linux Capabilities.

Linux Capabilities allow you to break apart the power of root into smaller groups of privileges. Currently docker containers by default only get the following capabilities.

CHOWN, DAC_OVERRIDE, FSETID, FOWNER, MKNOD, NET_RAW, SETGID, SETUID, SETFCAP, SETPCAP, NET_BIND_SERVICE, SYS_CHROOT, KILL, AUDIT_WRITE

In some cases you might want to adjust this list, for example, if you were building a container that would run ntpd or crony, which needs to be able to modify the host system time. The container would not run because it requires CAP_SYS_TIME. In older versions of docker, the container would have to run in --privileged mode, which turns off all security.

In docker-1.3 --cap-add, --cap-drop were added. Now in order to run an ntpd container, you could just run:

docker run -d --cap-add SYS_TIME ntpd

Which would only add the SYS_TIME capability to your container.

Another example would be if you container did not change the UID/GID of any processes, you could drop these capabilities from your container, making it more secure.

docker run --cap-drop SETUID --cap-drop SETGID --cap-drop FOWNER fedora /bin/sh

# pscap | grep 2912
5417 2912 root sh chown, dac_override, fsetid, kill, setpcap, net_bind_service, net_raw, sys_chroot, mknod, audit_write, setfcap

Or you could drop all capabilities and add one back.

docker run --cap-drop ALL --cap-add SYS_TIME ntpd /bin/sh

# pscap | grep 2382
5417 2382 root sh sys_time

Adjusting SELinux labels

Similar to capabilities, we have added the ability to adjust the SELinux labels on the fly.

If you have seen the SELinux coloring book, you know that we can separate processes by types and by MCS/MLS Levels. We use types to protect the host from the container. But we could also adjust the types to control what network ports are allowed into and out of the container. Currently, we run all containers with the svirt_net_lxc_t. This type is allowed to listen on all network ports and allowed to connect out on all network ports. We could tighten the security on the container by adjusting the SELinux type label.

With regular SELinux and Apache httpd, we by default only allow the apache process to listen on the Apache ports (http_port_t).

# sudo sepolicy network -t http_port_t

http_port_t: tcp: 80,81,443,488,8008,8009,8443,9000

We also block all outgoing port connections. This helps us lock down the Apache process, and even if a hacker were to subvert an application with a security vulnerability like ShellShock, we could stop the application from becoming a spam bot, or allowing the process to initiate attacks on other systems. It is like Hotel California, "You can check in any time you want, but you can never leave."

With containers, however, if you were running an Apache server application within a container, and the application were subverted, the Apache process would be able to connect to any network ports and become a spam bot, or attack other hosts/containers via the network.

It is fairly simple to create a new policy type to run with your containers using SELinux. First, you could create an SELinux TE (Type Enforcement) file.

# cat > docker_apache.te << _EOF

policy_module(docker_apache,1.0)

# This template interface creates the docker_apache_t type as a
# type which can be run as a docker container. The template
# gives the domain the least privileges required to run.
virt_sandbox_domain_template(docker_apache)

# I know that the apache daemon within the container will require
# some capabilities to run. Luckily I already have policy for
# Apache and I can query SELinux for the capabilities.
# sesearch -AC -s httpd_t -c capability
allow docker_apache_t self: capability { chown dac_override kill setgid setuid net_bind_service sys_chroot sys_nice sys_tty_config } ;

# These are the rules required to allow the container to listen
# to Apache ports on the network.

allow docker_apache_t self:tcp_socket create_stream_socket_perms;
allow docker_apache_t self:udp_socket create_socket_perms;
corenet_tcp_bind_all_nodes(docker_apache_t)
corenet_tcp_bind_http_port(docker_apache_t)
corenet_udp_bind_all_nodes(docker_apache_t)
corenet_udp_bind_http_port(docker_apache_t)

# Apache needs to resolve names against a DNS server
sysnet_dns_name_resolve(docker_apache_t)

# Permissive domains allow processes to not be blocked by SELinux
# While developing and testing your policy you probably want to
# run the container in permissive mode.
# You want to remove this rule, when you are confident in the
# policy.
permissive docker_apache_t;
_EOF

# make -f /usr/share/selinux/devel/Makefile docker_apache.pp
# semodule -i docker_apache.pp

Now run the container with the new type:

# docker run -d --security-opt type:docker_apache_t httpd

Now this container would run with much tighter SELinux security then a normal container. Note you probably would need to watch the audit logs to see if your app needs additional SELinux allow rules.

You could add these rules by using the audit2allow command and appending the rules onto the existing .te file, recompile and install.

# grep docker_apache_t /var/log/audit/audit.log | audit2allow >> docker_apache.te
# make -f /usr/share/selinux/devel/Makefile docker_apache.pp
# semodule -i docker_apache.pp

Multi Level Security mode

Currently, we use MCS Separation to make sure out containers are not allowed to interfere or interact with other container, except if it is through the network. Certain government systems require a different type of policy MLS (Multi Level Security). With MLS, you label the processes based on the level of the data they will be seeing. MLS says that if your container is going to be processing TopSecret data then it should run at TopSecret. We have added options to docker to allow admins to setup the containers to run at a specific level, which should satisfy the needs of MLS systems.

docker run -d --security-opt label:level:TopSecret --security-opt label:type:docker_apache_t httpd

This would enable to docker container to run with both the alternate type, and level, and would prevent the container from using data that was not at the same label. This has not gone through accreditation at this point, but we would be willing to help third parties build solutions for the MLS users.

Adjusting namespaces

In the other security talks, I have discussed how namespaces could be considered a security mechanism, since the eliminated the ability of a process from seeing other processes on the system (PID namespace). The network namespace can eliminate the ability to see other networks from your namespace. IPC (inter-process communications) namespace has the ability to block containers from using other container IPC.

Docker now has the ability to loosen these restrictions. You can share the hosts namespaces with the container:

--pid=host Lets the container share the hosts pid namespace
--net=host Lets the container share the hosts net namespace
--ipc=host Lets the container share the hosts ipc namespace

Note that since sharing the PID or IPC namespaces with the host requires us to disable SELinux separation in order from them to work.

docker run -ti --pid=host --net=host --ipc=host rhel7 /bin/sh

You might want to read additional information on this in the article Super Privileged Containers.

User profile image.
Daniel Walsh has worked in the computer security field for almost 30 years. Dan joined Red Hat in August 2001.

2 Comments

From my experience with OpenVZ you cannot run an NTP process inside a container, the NTP daemon has to run on the hardware node.

Is that different with LXC and Docker?

This is an awesome article and finally shows in practice how to extend on the default Docker SELinux policy.

I used this today to build an image for Chrony / NTP, and I had to find out that the mentioned "--add-cap SYS_TIME" is unfortunately not enough when SELinux is enabled as I could clearly see an AVC message telling me that this capability is denied.

However, together with this guide and this one here: http://developerblog.redhat.com/2015/04/21/introducing-the-atomic-comma… I was able to package an image which gets its SELinux policy module installed during an "atomic install" run.

In general I think that together with Docker and these relatively easy enhancements on SELinux modules, it has become more easy to use and customize SELinux. It is really not as evil as people normally say it is :)

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.