It has been a while since I wrote the first two articles in my series on Docker security. This article will give an update on what has been added to Docker since then and cover new functionality that is going through the merge process with upstream Docker.
Adjusting Capabilities
In the previous articles, I covered container separation based on Linux Capabilities.
Linux Capabilities allow you to break apart the power of root into smaller groups of privileges. Currently docker containers by default only get the following capabilities.
CHOWN, DAC_OVERRIDE, FSETID, FOWNER, MKNOD, NET_RAW, SETGID, SETUID, SETFCAP, SETPCAP, NET_BIND_SERVICE, SYS_CHROOT, KILL, AUDIT_WRITE
In some cases you might want to adjust this list, for example, if you were building a container that would run ntpd or crony, which needs to be able to modify the host system time. The container would not run because it requires CAP_SYS_TIME. In older versions of docker, the container would have to run in --privileged mode, which turns off all security.
In docker-1.3 --cap-add, --cap-drop were added. Now in order to run an ntpd container, you could just run:
docker run -d --cap-add SYS_TIME ntpd
Which would only add the SYS_TIME capability to your container.
Another example would be if you container did not change the UID/GID of any processes, you could drop these capabilities from your container, making it more secure.
docker run --cap-drop SETUID --cap-drop SETGID --cap-drop FOWNER fedora /bin/sh
# pscap | grep 2912
5417 2912 root sh chown, dac_override, fsetid, kill, setpcap, net_bind_service, net_raw, sys_chroot, mknod, audit_write, setfcap
Or you could drop all capabilities and add one back.
docker run --cap-drop ALL --cap-add SYS_TIME ntpd /bin/sh
# pscap | grep 2382
5417 2382 root sh sys_time
Adjusting SELinux labels
Similar to capabilities, we have added the ability to adjust the SELinux labels on the fly.
If you have seen the SELinux coloring book, you know that we can separate processes by types and by MCS/MLS Levels. We use types to protect the host from the container. But we could also adjust the types to control what network ports are allowed into and out of the container. Currently, we run all containers with the svirt_net_lxc_t. This type is allowed to listen on all network ports and allowed to connect out on all network ports. We could tighten the security on the container by adjusting the SELinux type label.
With regular SELinux and Apache httpd, we by default only allow the apache process to listen on the Apache ports (http_port_t).
# sudo sepolicy network -t http_port_t
http_port_t: tcp: 80,81,443,488,8008,8009,8443,9000
We also block all outgoing port connections. This helps us lock down the Apache process, and even if a hacker were to subvert an application with a security vulnerability like ShellShock, we could stop the application from becoming a spam bot, or allowing the process to initiate attacks on other systems. It is like Hotel California, "You can check in any time you want, but you can never leave."
With containers, however, if you were running an Apache server application within a container, and the application were subverted, the Apache process would be able to connect to any network ports and become a spam bot, or attack other hosts/containers via the network.
It is fairly simple to create a new policy type to run with your containers using SELinux. First, you could create an SELinux TE (Type Enforcement) file.
# cat > docker_apache.te << _EOF
policy_module(docker_apache,1.0)
# This template interface creates the docker_apache_t type as a
# type which can be run as a docker container. The template
# gives the domain the least privileges required to run.
virt_sandbox_domain_template(docker_apache)
# I know that the apache daemon within the container will require
# some capabilities to run. Luckily I already have policy for
# Apache and I can query SELinux for the capabilities.
# sesearch -AC -s httpd_t -c capability
allow docker_apache_t self: capability { chown dac_override kill setgid setuid net_bind_service sys_chroot sys_nice sys_tty_config } ;
# These are the rules required to allow the container to listen
# to Apache ports on the network.allow docker_apache_t self:tcp_socket create_stream_socket_perms;
allow docker_apache_t self:udp_socket create_socket_perms;
corenet_tcp_bind_all_nodes(docker_apache_t)
corenet_tcp_bind_http_port(docker_apache_t)
corenet_udp_bind_all_nodes(docker_apache_t)
corenet_udp_bind_http_port(docker_apache_t)
# Apache needs to resolve names against a DNS server
sysnet_dns_name_resolve(docker_apache_t)
# Permissive domains allow processes to not be blocked by SELinux
# While developing and testing your policy you probably want to
# run the container in permissive mode.
# You want to remove this rule, when you are confident in the
# policy.
permissive docker_apache_t;
_EOF
# make -f /usr/share/selinux/devel/Makefile docker_apache.pp
# semodule -i docker_apache.pp
Now run the container with the new type:
# docker run -d --security-opt type:docker_apache_t httpd
Now this container would run with much tighter SELinux security then a normal container. Note you probably would need to watch the audit logs to see if your app needs additional SELinux allow rules.
You could add these rules by using the audit2allow command and appending the rules onto the existing .te file, recompile and install.
# grep docker_apache_t /var/log/audit/audit.log | audit2allow >> docker_apache.te
# make -f /usr/share/selinux/devel/Makefile docker_apache.pp
# semodule -i docker_apache.pp
Multi Level Security mode
Currently, we use MCS Separation to make sure out containers are not allowed to interfere or interact with other container, except if it is through the network. Certain government systems require a different type of policy MLS (Multi Level Security). With MLS, you label the processes based on the level of the data they will be seeing. MLS says that if your container is going to be processing TopSecret data then it should run at TopSecret. We have added options to docker to allow admins to setup the containers to run at a specific level, which should satisfy the needs of MLS systems.
docker run -d --security-opt label:level:TopSecret --security-opt label:type:docker_apache_t httpd
This would enable to docker container to run with both the alternate type, and level, and would prevent the container from using data that was not at the same label. This has not gone through accreditation at this point, but we would be willing to help third parties build solutions for the MLS users.
Adjusting namespaces
In the other security talks, I have discussed how namespaces could be considered a security mechanism, since the eliminated the ability of a process from seeing other processes on the system (PID namespace). The network namespace can eliminate the ability to see other networks from your namespace. IPC (inter-process communications) namespace has the ability to block containers from using other container IPC.
Docker now has the ability to loosen these restrictions. You can share the hosts namespaces with the container:
--pid=host Lets the container share the hosts pid namespace
--net=host Lets the container share the hosts net namespace
--ipc=host Lets the container share the hosts ipc namespace
Note that since sharing the PID or IPC namespaces with the host requires us to disable SELinux separation in order from them to work.
docker run -ti --pid=host --net=host --ipc=host rhel7 /bin/sh
You might want to read additional information on this in the article Super Privileged Containers.
2 Comments