Update security docs for seccomp/apparmor
| 1 | 1 |
deleted file mode 100644 |
| ... | ... |
@@ -1,284 +0,0 @@ |
| 1 |
-<!--[metadata]> |
|
| 2 |
-+++ |
|
| 3 |
-title = "Docker security" |
|
| 4 |
-description = "Review of the Docker Daemon attack surface" |
|
| 5 |
-keywords = ["Docker, Docker documentation, security"] |
|
| 6 |
-[menu.main] |
|
| 7 |
-parent = "smn_administrate" |
|
| 8 |
-weight = 2 |
|
| 9 |
-+++ |
|
| 10 |
-<![end-metadata]--> |
|
| 11 |
- |
|
| 12 |
-# Docker security |
|
| 13 |
- |
|
| 14 |
-There are three major areas to consider when reviewing Docker security: |
|
| 15 |
- |
|
| 16 |
- - the intrinsic security of the kernel and its support for |
|
| 17 |
- namespaces and cgroups; |
|
| 18 |
- - the attack surface of the Docker daemon itself; |
|
| 19 |
- - loopholes in the container configuration profile, either by default, |
|
| 20 |
- or when customized by users. |
|
| 21 |
- - the "hardening" security features of the kernel and how they |
|
| 22 |
- interact with containers. |
|
| 23 |
- |
|
| 24 |
-## Kernel namespaces |
|
| 25 |
- |
|
| 26 |
-Docker containers are very similar to LXC containers, and they have |
|
| 27 |
-similar security features. When you start a container with |
|
| 28 |
-`docker run`, behind the scenes Docker creates a set of namespaces and control |
|
| 29 |
-groups for the container. |
|
| 30 |
- |
|
| 31 |
-**Namespaces provide the first and most straightforward form of |
|
| 32 |
-isolation**: processes running within a container cannot see, and even |
|
| 33 |
-less affect, processes running in another container, or in the host |
|
| 34 |
-system. |
|
| 35 |
- |
|
| 36 |
-**Each container also gets its own network stack**, meaning that a |
|
| 37 |
-container doesn't get privileged access to the sockets or interfaces |
|
| 38 |
-of another container. Of course, if the host system is setup |
|
| 39 |
-accordingly, containers can interact with each other through their |
|
| 40 |
-respective network interfaces — just like they can interact with |
|
| 41 |
-external hosts. When you specify public ports for your containers or use |
|
| 42 |
-[*links*](../userguide/networking/default_network/dockerlinks.md) |
|
| 43 |
-then IP traffic is allowed between containers. They can ping each other, |
|
| 44 |
-send/receive UDP packets, and establish TCP connections, but that can be |
|
| 45 |
-restricted if necessary. From a network architecture point of view, all |
|
| 46 |
-containers on a given Docker host are sitting on bridge interfaces. This |
|
| 47 |
-means that they are just like physical machines connected through a |
|
| 48 |
-common Ethernet switch; no more, no less. |
|
| 49 |
- |
|
| 50 |
-How mature is the code providing kernel namespaces and private |
|
| 51 |
-networking? Kernel namespaces were introduced [between kernel version |
|
| 52 |
-2.6.15 and |
|
| 53 |
-2.6.26](http://lxc.sourceforge.net/index.php/about/kernel-namespaces/). |
|
| 54 |
-This means that since July 2008 (date of the 2.6.26 release, now 7 years |
|
| 55 |
-ago), namespace code has been exercised and scrutinized on a large |
|
| 56 |
-number of production systems. And there is more: the design and |
|
| 57 |
-inspiration for the namespaces code are even older. Namespaces are |
|
| 58 |
-actually an effort to reimplement the features of [OpenVZ]( |
|
| 59 |
-http://en.wikipedia.org/wiki/OpenVZ) in such a way that they could be |
|
| 60 |
-merged within the mainstream kernel. And OpenVZ was initially released |
|
| 61 |
-in 2005, so both the design and the implementation are pretty mature. |
|
| 62 |
- |
|
| 63 |
-## Control groups |
|
| 64 |
- |
|
| 65 |
-Control Groups are another key component of Linux Containers. They |
|
| 66 |
-implement resource accounting and limiting. They provide many |
|
| 67 |
-useful metrics, but they also help ensure that each container gets |
|
| 68 |
-its fair share of memory, CPU, disk I/O; and, more importantly, that a |
|
| 69 |
-single container cannot bring the system down by exhausting one of those |
|
| 70 |
-resources. |
|
| 71 |
- |
|
| 72 |
-So while they do not play a role in preventing one container from |
|
| 73 |
-accessing or affecting the data and processes of another container, they |
|
| 74 |
-are essential to fend off some denial-of-service attacks. They are |
|
| 75 |
-particularly important on multi-tenant platforms, like public and |
|
| 76 |
-private PaaS, to guarantee a consistent uptime (and performance) even |
|
| 77 |
-when some applications start to misbehave. |
|
| 78 |
- |
|
| 79 |
-Control Groups have been around for a while as well: the code was |
|
| 80 |
-started in 2006, and initially merged in kernel 2.6.24. |
|
| 81 |
- |
|
| 82 |
-## Docker daemon attack surface |
|
| 83 |
- |
|
| 84 |
-Running containers (and applications) with Docker implies running the |
|
| 85 |
-Docker daemon. This daemon currently requires `root` privileges, and you |
|
| 86 |
-should therefore be aware of some important details. |
|
| 87 |
- |
|
| 88 |
-First of all, **only trusted users should be allowed to control your |
|
| 89 |
-Docker daemon**. This is a direct consequence of some powerful Docker |
|
| 90 |
-features. Specifically, Docker allows you to share a directory between |
|
| 91 |
-the Docker host and a guest container; and it allows you to do so |
|
| 92 |
-without limiting the access rights of the container. This means that you |
|
| 93 |
-can start a container where the `/host` directory will be the `/` directory |
|
| 94 |
-on your host; and the container will be able to alter your host filesystem |
|
| 95 |
-without any restriction. This is similar to how virtualization systems |
|
| 96 |
-allow filesystem resource sharing. Nothing prevents you from sharing your |
|
| 97 |
-root filesystem (or even your root block device) with a virtual machine. |
|
| 98 |
- |
|
| 99 |
-This has a strong security implication: for example, if you instrument Docker |
|
| 100 |
-from a web server to provision containers through an API, you should be |
|
| 101 |
-even more careful than usual with parameter checking, to make sure that |
|
| 102 |
-a malicious user cannot pass crafted parameters causing Docker to create |
|
| 103 |
-arbitrary containers. |
|
| 104 |
- |
|
| 105 |
-For this reason, the REST API endpoint (used by the Docker CLI to |
|
| 106 |
-communicate with the Docker daemon) changed in Docker 0.5.2, and now |
|
| 107 |
-uses a UNIX socket instead of a TCP socket bound on 127.0.0.1 (the |
|
| 108 |
-latter being prone to cross-site-scripting attacks if you happen to run |
|
| 109 |
-Docker directly on your local machine, outside of a VM). You can then |
|
| 110 |
-use traditional UNIX permission checks to limit access to the control |
|
| 111 |
-socket. |
|
| 112 |
- |
|
| 113 |
-You can also expose the REST API over HTTP if you explicitly decide to do so. |
|
| 114 |
-However, if you do that, being aware of the above mentioned security |
|
| 115 |
-implication, you should ensure that it will be reachable only from a |
|
| 116 |
-trusted network or VPN; or protected with e.g., `stunnel` and client SSL |
|
| 117 |
-certificates. You can also secure them with [HTTPS and |
|
| 118 |
-certificates](../articles/https/). |
|
| 119 |
- |
|
| 120 |
-The daemon is also potentially vulnerable to other inputs, such as image |
|
| 121 |
-loading from either disk with 'docker load', or from the network with |
|
| 122 |
-'docker pull'. This has been a focus of improvement in the community, |
|
| 123 |
-especially for 'pull' security. While these overlap, it should be noted |
|
| 124 |
-that 'docker load' is a mechanism for backup and restore and is not |
|
| 125 |
-currently considered a secure mechanism for loading images. As of |
|
| 126 |
-Docker 1.3.2, images are now extracted in a chrooted subprocess on |
|
| 127 |
-Linux/Unix platforms, being the first-step in a wider effort toward |
|
| 128 |
-privilege separation. |
|
| 129 |
- |
|
| 130 |
-Eventually, it is expected that the Docker daemon will run restricted |
|
| 131 |
-privileges, delegating operations well-audited sub-processes, |
|
| 132 |
-each with its own (very limited) scope of Linux capabilities, |
|
| 133 |
-virtual network setup, filesystem management, etc. That is, most likely, |
|
| 134 |
-pieces of the Docker engine itself will run inside of containers. |
|
| 135 |
- |
|
| 136 |
-Finally, if you run Docker on a server, it is recommended to run |
|
| 137 |
-exclusively Docker in the server, and move all other services within |
|
| 138 |
-containers controlled by Docker. Of course, it is fine to keep your |
|
| 139 |
-favorite admin tools (probably at least an SSH server), as well as |
|
| 140 |
-existing monitoring/supervision processes (e.g., NRPE, collectd, etc). |
|
| 141 |
- |
|
| 142 |
-## Linux kernel capabilities |
|
| 143 |
- |
|
| 144 |
-By default, Docker starts containers with a restricted set of |
|
| 145 |
-capabilities. What does that mean? |
|
| 146 |
- |
|
| 147 |
-Capabilities turn the binary "root/non-root" dichotomy into a |
|
| 148 |
-fine-grained access control system. Processes (like web servers) that |
|
| 149 |
-just need to bind on a port below 1024 do not have to run as root: they |
|
| 150 |
-can just be granted the `net_bind_service` capability instead. And there |
|
| 151 |
-are many other capabilities, for almost all the specific areas where root |
|
| 152 |
-privileges are usually needed. |
|
| 153 |
- |
|
| 154 |
-This means a lot for container security; let's see why! |
|
| 155 |
- |
|
| 156 |
-Your average server (bare metal or virtual machine) needs to run a bunch |
|
| 157 |
-of processes as root. Those typically include SSH, cron, syslogd; |
|
| 158 |
-hardware management tools (e.g., load modules), network configuration |
|
| 159 |
-tools (e.g., to handle DHCP, WPA, or VPNs), and much more. A container is |
|
| 160 |
-very different, because almost all of those tasks are handled by the |
|
| 161 |
-infrastructure around the container: |
|
| 162 |
- |
|
| 163 |
- - SSH access will typically be managed by a single server running on |
|
| 164 |
- the Docker host; |
|
| 165 |
- - `cron`, when necessary, should run as a user |
|
| 166 |
- process, dedicated and tailored for the app that needs its |
|
| 167 |
- scheduling service, rather than as a platform-wide facility; |
|
| 168 |
- - log management will also typically be handed to Docker, or by |
|
| 169 |
- third-party services like Loggly or Splunk; |
|
| 170 |
- - hardware management is irrelevant, meaning that you never need to |
|
| 171 |
- run `udevd` or equivalent daemons within |
|
| 172 |
- containers; |
|
| 173 |
- - network management happens outside of the containers, enforcing |
|
| 174 |
- separation of concerns as much as possible, meaning that a container |
|
| 175 |
- should never need to perform `ifconfig`, |
|
| 176 |
- `route`, or ip commands (except when a container |
|
| 177 |
- is specifically engineered to behave like a router or firewall, of |
|
| 178 |
- course). |
|
| 179 |
- |
|
| 180 |
-This means that in most cases, containers will not need "real" root |
|
| 181 |
-privileges *at all*. And therefore, containers can run with a reduced |
|
| 182 |
-capability set; meaning that "root" within a container has much less |
|
| 183 |
-privileges than the real "root". For instance, it is possible to: |
|
| 184 |
- |
|
| 185 |
- - deny all "mount" operations; |
|
| 186 |
- - deny access to raw sockets (to prevent packet spoofing); |
|
| 187 |
- - deny access to some filesystem operations, like creating new device |
|
| 188 |
- nodes, changing the owner of files, or altering attributes (including |
|
| 189 |
- the immutable flag); |
|
| 190 |
- - deny module loading; |
|
| 191 |
- - and many others. |
|
| 192 |
- |
|
| 193 |
-This means that even if an intruder manages to escalate to root within a |
|
| 194 |
-container, it will be much harder to do serious damage, or to escalate |
|
| 195 |
-to the host. |
|
| 196 |
- |
|
| 197 |
-This won't affect regular web apps; but malicious users will find that |
|
| 198 |
-the arsenal at their disposal has shrunk considerably! By default Docker |
|
| 199 |
-drops all capabilities except [those |
|
| 200 |
-needed](https://github.com/docker/docker/blob/87de5fdd5972343a11847922e0f41d9898b5cff7/daemon/execdriver/native/template/default_template_linux.go#L16-L29), |
|
| 201 |
-a whitelist instead of a blacklist approach. You can see a full list of |
|
| 202 |
-available capabilities in [Linux |
|
| 203 |
-manpages](http://man7.org/linux/man-pages/man7/capabilities.7.html). |
|
| 204 |
- |
|
| 205 |
-One primary risk with running Docker containers is that the default set |
|
| 206 |
-of capabilities and mounts given to a container may provide incomplete |
|
| 207 |
-isolation, either independently, or when used in combination with |
|
| 208 |
-kernel vulnerabilities. |
|
| 209 |
- |
|
| 210 |
-Docker supports the addition and removal of capabilities, allowing use |
|
| 211 |
-of a non-default profile. This may make Docker more secure through |
|
| 212 |
-capability removal, or less secure through the addition of capabilities. |
|
| 213 |
-The best practice for users would be to remove all capabilities except |
|
| 214 |
-those explicitly required for their processes. |
|
| 215 |
- |
|
| 216 |
-## Other kernel security features |
|
| 217 |
- |
|
| 218 |
-Capabilities are just one of the many security features provided by |
|
| 219 |
-modern Linux kernels. It is also possible to leverage existing, |
|
| 220 |
-well-known systems like TOMOYO, AppArmor, SELinux, GRSEC, etc. with |
|
| 221 |
-Docker. |
|
| 222 |
- |
|
| 223 |
-While Docker currently only enables capabilities, it doesn't interfere |
|
| 224 |
-with the other systems. This means that there are many different ways to |
|
| 225 |
-harden a Docker host. Here are a few examples. |
|
| 226 |
- |
|
| 227 |
- - You can run a kernel with GRSEC and PAX. This will add many safety |
|
| 228 |
- checks, both at compile-time and run-time; it will also defeat many |
|
| 229 |
- exploits, thanks to techniques like address randomization. It doesn't |
|
| 230 |
- require Docker-specific configuration, since those security features |
|
| 231 |
- apply system-wide, independent of containers. |
|
| 232 |
- - If your distribution comes with security model templates for |
|
| 233 |
- Docker containers, you can use them out of the box. For instance, we |
|
| 234 |
- ship a template that works with AppArmor and Red Hat comes with SELinux |
|
| 235 |
- policies for Docker. These templates provide an extra safety net (even |
|
| 236 |
- though it overlaps greatly with capabilities). |
|
| 237 |
- - You can define your own policies using your favorite access control |
|
| 238 |
- mechanism. |
|
| 239 |
- |
|
| 240 |
-Just like there are many third-party tools to augment Docker containers |
|
| 241 |
-with e.g., special network topologies or shared filesystems, you can |
|
| 242 |
-expect to see tools to harden existing Docker containers without |
|
| 243 |
-affecting Docker's core. |
|
| 244 |
- |
|
| 245 |
-Recent improvements in Linux namespaces will soon allow to run |
|
| 246 |
-full-featured containers without root privileges, thanks to the new user |
|
| 247 |
-namespace. This is covered in detail [here]( |
|
| 248 |
-http://s3hh.wordpress.com/2013/07/19/creating-and-using-containers-without-privilege/). |
|
| 249 |
-Moreover, this will solve the problem caused by sharing filesystems |
|
| 250 |
-between host and guest, since the user namespace allows users within |
|
| 251 |
-containers (including the root user) to be mapped to other users in the |
|
| 252 |
-host system. |
|
| 253 |
- |
|
| 254 |
-Today, Docker does not directly support user namespaces, but they |
|
| 255 |
-may still be utilized by Docker containers on supported kernels, |
|
| 256 |
-by directly using the clone syscall, or utilizing the 'unshare' |
|
| 257 |
-utility. Using this, some users may find it possible to drop |
|
| 258 |
-more capabilities from their process as user namespaces provide |
|
| 259 |
-an artificial capabilities set. Likewise, however, this artificial |
|
| 260 |
-capabilities set may require use of 'capsh' to restrict the |
|
| 261 |
-user-namespace capabilities set when using 'unshare'. |
|
| 262 |
- |
|
| 263 |
-Eventually, it is expected that Docker will have direct, native support |
|
| 264 |
-for user-namespaces, simplifying the process of hardening containers. |
|
| 265 |
- |
|
| 266 |
-## Conclusions |
|
| 267 |
- |
|
| 268 |
-Docker containers are, by default, quite secure; especially if you take |
|
| 269 |
-care of running your processes inside the containers as non-privileged |
|
| 270 |
-users (i.e., non-`root`). |
|
| 271 |
- |
|
| 272 |
-You can add an extra layer of safety by enabling AppArmor, SELinux, |
|
| 273 |
-GRSEC, or your favorite hardening solution. |
|
| 274 |
- |
|
| 275 |
-Last but not least, if you see interesting security features in other |
|
| 276 |
-containerization systems, these are simply kernels features that may |
|
| 277 |
-be implemented in Docker as well. We welcome users to submit issues, |
|
| 278 |
-pull requests, and communicate via the mailing list. |
|
| 279 |
- |
|
| 280 |
-References: |
|
| 281 |
- |
|
| 282 |
-* [Docker Containers: How Secure Are They? (2013)]( |
|
| 283 |
-http://blog.docker.com/2013/08/containers-docker-how-secure-are-they/). |
|
| 284 |
-* [On the Security of Containers (2014)](https://medium.com/@ewindisch/on-the-security-of-containers-2c60ffe25a9e). |
| ... | ... |
@@ -186,7 +186,7 @@ need to add `sudo` to all the client commands. |
| 186 | 186 |
|
| 187 | 187 |
> **Warning**: |
| 188 | 188 |
> The *docker* group (or the group specified with `-G`) is root-equivalent; |
| 189 |
-> see [*Docker Daemon Attack Surface*](../articles/security.md#docker-daemon-attack-surface) details. |
|
| 189 |
+> see [*Docker Daemon Attack Surface*](../security/security.md#docker-daemon-attack-surface) details. |
|
| 190 | 190 |
|
| 191 | 191 |
## Upgrades |
| 192 | 192 |
|
| ... | ... |
@@ -134,7 +134,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group. |
| 134 | 134 |
|
| 135 | 135 |
>**Warning**: The `docker` group is equivalent to the `root` user; For details |
| 136 | 136 |
>on how this impacts security in your system, see [*Docker Daemon Attack |
| 137 |
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details. |
|
| 137 |
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details. |
|
| 138 | 138 |
|
| 139 | 139 |
To create the `docker` group and add your user: |
| 140 | 140 |
|
| ... | ... |
@@ -133,7 +133,7 @@ use the `-G` flag to specify an alternative group. |
| 133 | 133 |
|
| 134 | 134 |
> **Warning**: |
| 135 | 135 |
> The `docker` group (or the group specified with the `-G` flag) is |
| 136 |
-> `root`-equivalent; see [*Docker Daemon Attack Surface*](../articles/security.md#docker-daemon-attack-surface) details. |
|
| 136 |
+> `root`-equivalent; see [*Docker Daemon Attack Surface*](../security/security.md#docker-daemon-attack-surface) details. |
|
| 137 | 137 |
|
| 138 | 138 |
**Example:** |
| 139 | 139 |
|
| ... | ... |
@@ -128,7 +128,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group. |
| 128 | 128 |
|
| 129 | 129 |
>**Warning**: The `docker` group is equivalent to the `root` user; For details |
| 130 | 130 |
>on how this impacts security in your system, see [*Docker Daemon Attack |
| 131 |
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details. |
|
| 131 |
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details. |
|
| 132 | 132 |
|
| 133 | 133 |
To create the `docker` group and add your user: |
| 134 | 134 |
|
| ... | ... |
@@ -99,7 +99,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group. |
| 99 | 99 |
|
| 100 | 100 |
>**Warning**: The `docker` group is equivalent to the `root` user; For details |
| 101 | 101 |
>on how this impacts security in your system, see [*Docker Daemon Attack |
| 102 |
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details. |
|
| 102 |
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details. |
|
| 103 | 103 |
|
| 104 | 104 |
To create the `docker` group and add your user: |
| 105 | 105 |
|
| ... | ... |
@@ -126,7 +126,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group. |
| 126 | 126 |
|
| 127 | 127 |
>**Warning**: The `docker` group is equivalent to the `root` user; For details |
| 128 | 128 |
>on how this impacts security in your system, see [*Docker Daemon Attack |
| 129 |
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details. |
|
| 129 |
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details. |
|
| 130 | 130 |
|
| 131 | 131 |
To create the `docker` group and add your user: |
| 132 | 132 |
|
| ... | ... |
@@ -225,7 +225,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group. |
| 225 | 225 |
|
| 226 | 226 |
>**Warning**: The `docker` group is equivalent to the `root` user; For details |
| 227 | 227 |
>on how this impacts security in your system, see [*Docker Daemon Attack |
| 228 |
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details. |
|
| 228 |
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details. |
|
| 229 | 229 |
|
| 230 | 230 |
To create the `docker` group and add your user: |
| 231 | 231 |
|
| ... | ... |
@@ -30,7 +30,7 @@ adding the server name. |
| 30 | 30 |
`docker login` requires user to use `sudo` or be `root`, except when: |
| 31 | 31 |
|
| 32 | 32 |
1. connecting to a remote daemon, such as a `docker-machine` provisioned `docker engine`. |
| 33 |
-2. user is added to the `docker` group. This will impact the security of your system; the `docker` group is `root` equivalent. See [Docker Daemon Attack Surface](https://docs.docker.com/articles/security/#docker-daemon-attack-surface) for details. |
|
| 33 |
+2. user is added to the `docker` group. This will impact the security of your system; the `docker` group is `root` equivalent. See [Docker Daemon Attack Surface](https://docs.docker.com/security/security/#docker-daemon-attack-surface) for details. |
|
| 34 | 34 |
|
| 35 | 35 |
You can log into any public or private repository for which you have |
| 36 | 36 |
credentials. When you log in, the command stores encoded credentials in |
| ... | ... |
@@ -1,47 +1,74 @@ |
| 1 | 1 |
<!-- [metadata]> |
| 2 | 2 |
+++ |
| 3 |
-draft = true |
|
| 3 |
+title = "AppArmor security profiles for Docker" |
|
| 4 |
+description = "Enabling AppArmor in Docker" |
|
| 5 |
+keywords = ["AppArmor, security, docker, documentation"] |
|
| 6 |
+[menu.main] |
|
| 7 |
+parent= "smn_secure_docker" |
|
| 4 | 8 |
+++ |
| 5 | 9 |
<![end-metadata]--> |
| 6 | 10 |
|
| 7 |
-AppArmor security profiles for Docker |
|
| 11 |
+# AppArmor security profiles for Docker |
|
| 8 | 12 |
|
| 9 |
-AppArmor (Application Armor) is a security module that allows a system |
|
| 10 |
-administrator to associate a security profile with each program. Docker |
|
| 13 |
+AppArmor (Application Armor) is a Linux security module that protects an |
|
| 14 |
+operating system and its applications from security threats. To use it, a system |
|
| 15 |
+administrator associates an AppArmor security profile with each program. Docker |
|
| 11 | 16 |
expects to find an AppArmor policy loaded and enforced. |
| 12 | 17 |
|
| 13 |
-Container profiles are loaded automatically by Docker. A profile |
|
| 14 |
-for the Docker Engine itself also exists and is installed |
|
| 15 |
-with the official *.deb* packages. Advanced users and package |
|
| 16 |
-managers may find the profile for */usr/bin/docker* underneath |
|
| 17 |
-[contrib/apparmor](https://github.com/docker/docker/tree/master/contrib/apparmor) |
|
| 18 |
-in the Docker Engine source repository. |
|
| 18 |
+Docker automatically loads container profiles. A profile for the Docker Engine |
|
| 19 |
+itself also exists and is installed with the official *.deb* packages in |
|
| 20 |
+`/etc/apparmor.d/docker` file. |
|
| 21 |
+ |
|
| 19 | 22 |
|
| 23 |
+## Understand the policies |
|
| 24 |
+ |
|
| 25 |
+The `docker-default` profile is the default for running containers. It is |
|
| 26 |
+moderately protective while providing wide application compatibility. The |
|
| 27 |
+profile is the following: |
|
| 28 |
+ |
|
| 29 |
+``` |
|
| 30 |
+#include <tunables/global> |
|
| 20 | 31 |
|
| 21 |
-Understand the policies |
|
| 22 | 32 |
|
| 23 |
-The `docker-default` profile the default for running |
|
| 24 |
-containers. It is moderately protective while |
|
| 25 |
-providing wide application compatibility. |
|
| 33 |
+profile docker-default flags=(attach_disconnected,mediate_deleted) {
|
|
| 26 | 34 |
|
| 27 |
-The system's standard `unconfined` profile inherits all |
|
| 28 |
-system-wide policies, applying path-based policies |
|
| 29 |
-intended for the host system inside of containers. |
|
| 30 |
-This was the default for privileged containers |
|
| 31 |
-prior to Docker 1.8. |
|
| 35 |
+ #include <abstractions/base> |
|
| 32 | 36 |
|
| 33 | 37 |
|
| 34 |
-Overriding the profile for a container |
|
| 38 |
+ network, |
|
| 39 |
+ capability, |
|
| 40 |
+ file, |
|
| 41 |
+ umount, |
|
| 35 | 42 |
|
| 36 |
-Users may override the AppArmor profile using the |
|
| 37 |
-`security-opt` option (per-container). |
|
| 43 |
+ deny @{PROC}/{*,**^[0-9*],sys/kernel/shm*} wkx,
|
|
| 44 |
+ deny @{PROC}/sysrq-trigger rwklx,
|
|
| 45 |
+ deny @{PROC}/mem rwklx,
|
|
| 46 |
+ deny @{PROC}/kmem rwklx,
|
|
| 47 |
+ deny @{PROC}/kcore rwklx,
|
|
| 38 | 48 |
|
| 39 |
-For example, the following explicitly specifies the default policy: |
|
| 49 |
+ deny mount, |
|
| 40 | 50 |
|
| 51 |
+ deny /sys/[^f]*/** wklx, |
|
| 52 |
+ deny /sys/f[^s]*/** wklx, |
|
| 53 |
+ deny /sys/fs/[^c]*/** wklx, |
|
| 54 |
+ deny /sys/fs/c[^g]*/** wklx, |
|
| 55 |
+ deny /sys/fs/cg[^r]*/** wklx, |
|
| 56 |
+ deny /sys/firmware/efi/efivars/** rwklx, |
|
| 57 |
+ deny /sys/kernel/security/** rwklx, |
|
| 58 |
+} |
|
| 41 | 59 |
``` |
| 60 |
+ |
|
| 61 |
+When you run a container, it uses the `docker-default` policy unless you |
|
| 62 |
+override it with the `security-opt` option. For example, the following |
|
| 63 |
+explicitly specifies the default policy: |
|
| 64 |
+ |
|
| 65 |
+```bash |
|
| 42 | 66 |
$ docker run --rm -it --security-opt apparmor:docker-default hello-world |
| 43 | 67 |
``` |
| 44 | 68 |
|
| 69 |
+## Contributing to AppArmor code in Docker |
|
| 70 |
+ |
|
| 71 |
+Advanced users and package managers can find a profile for `/usr/bin/docker` |
|
| 72 |
+underneath |
|
| 73 |
+[contrib/apparmor](https://github.com/docker/docker/tree/master/contrib/apparmor) |
|
| 74 |
+in the Docker Engine source repository. |
| 45 | 75 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,20 @@ |
| 0 |
+<!-- [metadata]> |
|
| 1 |
+title = "Work with Docker security" |
|
| 2 |
+description = "Sec" |
|
| 3 |
+keywords = ["seccomp, security, docker, documentation"] |
|
| 4 |
+[menu.main] |
|
| 5 |
+identifier="smn_secure_docker" |
|
| 6 |
+parent= "mn_use_docker" |
|
| 7 |
+<![end-metadata]--> |
|
| 8 |
+ |
|
| 9 |
+# Work with Docker security |
|
| 10 |
+ |
|
| 11 |
+This section discusses the security features you can configure and use within your Docker Engine installation. |
|
| 12 |
+ |
|
| 13 |
+* You can configure Docker's trust features so that your users can push and pull trusted images. To learn how to do this, see [Use trusted images](trust/index.md) in this section. |
|
| 14 |
+ |
|
| 15 |
+* You can configure secure computing mode (Seccomp) policies to secure system calls in a container. For more information, see [Seccomp security profiles for Docker](seccomp.md). |
|
| 16 |
+ |
|
| 17 |
+* An AppArmor profile for Docker is installed with the official *.deb* packages. For information about this profile and overriding it, see [AppArmor security profiles for Docker](apparmor.md). |
| ... | ... |
@@ -3,27 +3,26 @@ |
| 3 | 3 |
title = "Seccomp security profiles for Docker" |
| 4 | 4 |
description = "Enabling seccomp in Docker" |
| 5 | 5 |
keywords = ["seccomp, security, docker, documentation"] |
| 6 |
+[menu.main] |
|
| 7 |
+parent= "smn_secure_docker" |
|
| 6 | 8 |
+++ |
| 7 | 9 |
<![end-metadata]--> |
| 8 | 10 |
|
| 9 |
-Seccomp security profiles for Docker |
|
| 11 |
+# Seccomp security profiles for Docker |
|
| 10 | 12 |
|
| 11 |
-The seccomp() system call operates on the Secure Computing (seccomp) |
|
| 12 |
-state of the calling process. |
|
| 13 |
+Secure computing mode (Seccomp) is a Linux kernel feature. You can use it to |
|
| 14 |
+restrict the actions available within the container. The `seccomp()` system |
|
| 15 |
+call operates on the seccomp state of the calling process. You can use this |
|
| 16 |
+feature to restrict your application's access. |
|
| 13 | 17 |
|
| 14 |
-This operation is available only if the kernel is configured |
|
| 15 |
-with `CONFIG_SECCOMP` enabled. |
|
| 18 |
+This feature is available only if the kernel is configured with `CONFIG_SECCOMP` |
|
| 19 |
+enabled. |
|
| 16 | 20 |
|
| 17 |
-This allows for allowing or denying of certain syscalls in a container. |
|
| 21 |
+## Passing a profile for a container |
|
| 18 | 22 |
|
| 19 |
-Passing a profile for a container |
|
| 20 |
- |
|
| 21 |
-Users may pass a seccomp profile using the `security-opt` option |
|
| 22 |
-(per-container). |
|
| 23 |
- |
|
| 24 |
-The profile has layout in the following form: |
|
| 23 |
+The default seccomp profile provides a sane default for running containers with |
|
| 24 |
+seccomp. It is moderately protective while providing wide application |
|
| 25 |
+compatibility. The default Docker profile has layout in the following form: |
|
| 25 | 26 |
|
| 26 | 27 |
``` |
| 27 | 28 |
{
|
| ... | ... |
@@ -57,30 +56,14 @@ The profile has layout in the following form: |
| 57 | 57 |
} |
| 58 | 58 |
``` |
| 59 | 59 |
|
| 60 |
-Then you can run with: |
|
| 60 |
+When you run a container, it uses the default profile unless you override |
|
| 61 |
+it with the `security-opt` option. For example, the following explicitly |
|
| 62 |
+specifies the default policy: |
|
| 61 | 63 |
|
| 62 | 64 |
``` |
| 63 | 65 |
$ docker run --rm -it --security-opt seccomp:/path/to/seccomp/profile.json hello-world |
| 64 | 66 |
``` |
| 65 | 67 |
|
| 66 |
-Default Profile |
|
| 67 |
- |
|
| 68 |
-The default seccomp profile provides a sane default for running |
|
| 69 |
-containers with seccomp. It is moderately protective while |
|
| 70 |
-providing wide application compatibility. |
|
| 71 |
- |
|
| 72 |
- |
|
| 73 |
-### Overriding the default profile for a container |
|
| 74 |
- |
|
| 75 |
-You can pass `unconfined` to run a container without the default seccomp |
|
| 76 |
-profile. |
|
| 77 |
- |
|
| 78 |
-``` |
|
| 79 |
-$ docker run --rm -it --security-opt seccomp:unconfined debian:jessie \ |
|
| 80 |
- unshare --map-root-user --user sh -c whoami |
|
| 81 |
-``` |
|
| 82 |
- |
|
| 83 | 68 |
### Syscalls blocked by the default profile |
| 84 | 69 |
|
| 85 | 70 |
Docker's default seccomp profile is a whitelist which specifies the calls that |
| ... | ... |
@@ -91,55 +74,65 @@ the reason each syscall is blocked rather than white-listed. |
| 91 | 91 |
| Syscall | Description | |
| 92 | 92 |
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------| |
| 93 | 93 |
| `acct` | Accounting syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_PACCT`. | |
| 94 |
-| `add_key` | Prevent containers from using the kernel keyring, which is not namespaced. | |
|
| 95 |
-| `adjtimex` | Similar to `clock_settime` and `settimeofday`, time/date is not namespaced. | |
|
| 96 |
-| `bpf` | Deny loading potentially persistent bpf programs into kernel, already gated by `CAP_SYS_ADMIN`. | |
|
| 97 |
-| `clock_adjtime` | Time/date is not namespaced. | |
|
| 98 |
-| `clock_settime` | Time/date is not namespaced. | |
|
| 99 |
-| `clone` | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE_* flags, except `CLONE_USERNS`. | |
|
| 100 |
-| `create_module` | Deny manipulation and functions on kernel modules. | |
|
| 101 |
-| `delete_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | |
|
| 102 |
-| `finit_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | |
|
| 103 |
-| `get_kernel_syms` | Deny retrieval of exported kernel and module symbols. | |
|
| 104 |
-| `get_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | |
|
| 105 |
-| `init_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | |
|
| 106 |
-| `ioperm` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | |
|
| 107 |
-| `iopl` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | |
|
| 108 |
-| `kcmp` | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`. | |
|
| 109 |
-| `kexec_file_load` | Sister syscall of `kexec_load` that does the same thing, slightly different arguments. | |
|
| 110 |
-| `kexec_load` | Deny loading a new kernel for later execution. | |
|
| 111 |
-| `keyctl` | Prevent containers from using the kernel keyring, which is not namespaced. | |
|
| 112 |
-| `lookup_dcookie` | Tracing/profiling syscall, which could leak a lot of information on the host. | |
|
| 113 |
-| `mbind` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | |
|
| 114 |
-| `modify_ldt` | Old syscall only used in 16-bit code and a potential information leak. | |
|
| 115 |
-| `mount` | Deny mounting, already gated by `CAP_SYS_ADMIN`. | |
|
| 116 |
-| `move_pages` | Syscall that modifies kernel memory and NUMA settings. | |
|
| 117 |
-| `name_to_handle_at` | Sister syscall to `open_by_handle_at`. Already gated by `CAP_SYS_NICE`. | |
|
| 118 |
-| `nfsservctl` | Deny interaction with the kernel nfs daemon. | |
|
| 119 |
-| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`. | |
|
| 120 |
-| `perf_event_open` | Tracing/profiling syscall, which could leak a lot of information on the host. | |
|
| 121 |
-| `personality` | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns. | |
|
| 122 |
-| `pivot_root` | Deny `pivot_root`, should be privileged operation. | |
|
| 123 |
-| `process_vm_readv` | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`. | |
|
| 124 |
-| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`. | |
|
| 125 |
-| `ptrace` | Tracing/profiling syscall, which could leak a lot of information on the host. Already blocked by dropping `CAP_PTRACE`. | |
|
| 126 |
-| `query_module` | Deny manipulation and functions on kernel modules. | |
|
| 127 |
-| `quotactl` | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`. | |
|
| 128 |
-| `reboot` | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`. | |
|
| 94 |
+| `add_key` | Prevent containers from using the kernel keyring, which is not namespaced. | |
|
| 95 |
+| `adjtimex` | Similar to `clock_settime` and `settimeofday`, time/date is not namespaced. | |
|
| 96 |
+| `bpf` | Deny loading potentially persistent bpf programs into kernel, already gated by `CAP_SYS_ADMIN`. | |
|
| 97 |
+| `clock_adjtime` | Time/date is not namespaced. | |
|
| 98 |
+| `clock_settime` | Time/date is not namespaced. | |
|
| 99 |
+| `clone` | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE_* flags, except `CLONE_USERNS`. | |
|
| 100 |
+| `create_module` | Deny manipulation and functions on kernel modules. | |
|
| 101 |
+| `delete_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | |
|
| 102 |
+| `finit_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | |
|
| 103 |
+| `get_kernel_syms` | Deny retrieval of exported kernel and module symbols. | |
|
| 104 |
+| `get_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | |
|
| 105 |
+| `init_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | |
|
| 106 |
+| `ioperm` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | |
|
| 107 |
+| `iopl` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | |
|
| 108 |
+| `kcmp` | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`. | |
|
| 109 |
+| `kexec_file_load` | Sister syscall of `kexec_load` that does the same thing, slightly different arguments. | |
|
| 110 |
+| `kexec_load` | Deny loading a new kernel for later execution. | |
|
| 111 |
+| `keyctl` | Prevent containers from using the kernel keyring, which is not namespaced. | |
|
| 112 |
+| `lookup_dcookie` | Tracing/profiling syscall, which could leak a lot of information on the host. | |
|
| 113 |
+| `mbind` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | |
|
| 114 |
+| `modify_ldt` | Old syscall only used in 16-bit code and a potential information leak. | |
|
| 115 |
+| `mount` | Deny mounting, already gated by `CAP_SYS_ADMIN`. | |
|
| 116 |
+| `move_pages` | Syscall that modifies kernel memory and NUMA settings. | |
|
| 117 |
+| `name_to_handle_at` | Sister syscall to `open_by_handle_at`. Already gated by `CAP_SYS_NICE`. | |
|
| 118 |
+| `nfsservctl` | Deny interaction with the kernel nfs daemon. | |
|
| 119 |
+| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`. | |
|
| 120 |
+| `perf_event_open` | Tracing/profiling syscall, which could leak a lot of information on the host. | |
|
| 121 |
+| `personality` | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns. | |
|
| 122 |
+| `pivot_root` | Deny `pivot_root`, should be privileged operation. | |
|
| 123 |
+| `process_vm_readv` | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`. | |
|
| 124 |
+| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`. | |
|
| 125 |
+| `ptrace` | Tracing/profiling syscall, which could leak a lot of information on the host. Already blocked by dropping `CAP_PTRACE`. | |
|
| 126 |
+| `query_module` | Deny manipulation and functions on kernel modules. | |
|
| 127 |
+| `quotactl` | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`. | |
|
| 128 |
+| `reboot` | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`. | |
|
| 129 | 129 |
| `restart_syscall` | Don't allow containers to restart a syscall. Possible seccomp bypass see: https://code.google.com/p/chromium/issues/detail?id=408827. | |
| 130 |
-| `request_key` | Prevent containers from using the kernel keyring, which is not namespaced. | |
|
| 131 |
-| `set_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | |
|
| 132 |
-| `setns` | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`. | |
|
| 133 |
-| `settimeofday` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | |
|
| 134 |
-| `stime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | |
|
| 135 |
-| `swapon` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | |
|
| 136 |
-| `swapoff` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | |
|
| 137 |
-| `sysfs` | Obsolete syscall. | |
|
| 138 |
-| `_sysctl` | Obsolete, replaced by /proc/sys. | |
|
| 139 |
-| `umount` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. | |
|
| 140 |
-| `umount2` | Should be a privileged operation. | |
|
| 141 |
-| `unshare` | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`. | |
|
| 142 |
-| `uselib` | Older syscall related to shared libraries, unused for a long time. | |
|
| 143 |
-| `ustat` | Obsolete syscall. | |
|
| 144 |
-| `vm86` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | |
|
| 145 |
-| `vm86old` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | |
|
| 130 |
+| `request_key` | Prevent containers from using the kernel keyring, which is not namespaced. | |
|
| 131 |
+| `set_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | |
|
| 132 |
+| `setns` | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`. | |
|
| 133 |
+| `settimeofday` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | |
|
| 134 |
+| `stime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | |
|
| 135 |
+| `swapon` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | |
|
| 136 |
+| `swapoff` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | |
|
| 137 |
+| `sysfs` | Obsolete syscall. | |
|
| 138 |
+| `_sysctl` | Obsolete, replaced by /proc/sys. | |
|
| 139 |
+| `umount` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. | |
|
| 140 |
+| `umount2` | Should be a privileged operation. | |
|
| 141 |
+| `unshare` | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`. | |
|
| 142 |
+| `uselib` | Older syscall related to shared libraries, unused for a long time. | |
|
| 143 |
+| `ustat` | Obsolete syscall. | |
|
| 144 |
+| `vm86` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | |
|
| 145 |
+| `vm86old` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | |
|
| 146 |
+ |
|
| 147 |
+## Run without the default seccomp profile |
|
| 148 |
+ |
|
| 149 |
+You can pass `unconfined` to run a container without the default seccomp |
|
| 150 |
+profile. |
|
| 151 |
+ |
|
| 152 |
+``` |
|
| 153 |
+$ docker run --rm -it --security-opt seccomp:unconfined debian:jessie \ |
|
| 154 |
+ unshare --map-root-user --user sh -c whoami |
|
| 155 |
+``` |
| 146 | 156 |
new file mode 100644 |
| ... | ... |
@@ -0,0 +1,286 @@ |
| 0 |
+<!--[metadata]> |
|
| 1 |
+aliases = ["/engine/articles/security/"] |
|
| 2 |
+title = "Docker security" |
|
| 3 |
+description = "Review of the Docker Daemon attack surface" |
|
| 4 |
+keywords = ["Docker, Docker documentation, security"] |
|
| 5 |
+[menu.main] |
|
| 6 |
+parent = "smn_secure_docker" |
|
| 7 |
+weight =-99 |
|
| 8 |
+<![end-metadata]--> |
|
| 9 |
+ |
|
| 10 |
+# Docker security |
|
| 11 |
+ |
|
| 12 |
+There are three major areas to consider when reviewing Docker security: |
|
| 13 |
+ |
|
| 14 |
+ - the intrinsic security of the kernel and its support for |
|
| 15 |
+ namespaces and cgroups; |
|
| 16 |
+ - the attack surface of the Docker daemon itself; |
|
| 17 |
+ - loopholes in the container configuration profile, either by default, |
|
| 18 |
+ or when customized by users. |
|
| 19 |
+ - the "hardening" security features of the kernel and how they |
|
| 20 |
+ interact with containers. |
|
| 21 |
+ |
|
| 22 |
+## Kernel namespaces |
|
| 23 |
+ |
|
| 24 |
+Docker containers are very similar to LXC containers, and they have |
|
| 25 |
+similar security features. When you start a container with |
|
| 26 |
+`docker run`, behind the scenes Docker creates a set of namespaces and control |
|
| 27 |
+groups for the container. |
|
| 28 |
+ |
|
| 29 |
+**Namespaces provide the first and most straightforward form of |
|
| 30 |
+isolation**: processes running within a container cannot see, and even |
|
| 31 |
+less affect, processes running in another container, or in the host |
|
| 32 |
+system. |
|
| 33 |
+ |
|
| 34 |
+**Each container also gets its own network stack**, meaning that a |
|
| 35 |
+container doesn't get privileged access to the sockets or interfaces |
|
| 36 |
+of another container. Of course, if the host system is setup |
|
| 37 |
+accordingly, containers can interact with each other through their |
|
| 38 |
+respective network interfaces — just like they can interact with |
|
| 39 |
+external hosts. When you specify public ports for your containers or use |
|
| 40 |
+[*links*](../userguide/networking/default_network/dockerlinks.md) |
|
| 41 |
+then IP traffic is allowed between containers. They can ping each other, |
|
| 42 |
+send/receive UDP packets, and establish TCP connections, but that can be |
|
| 43 |
+restricted if necessary. From a network architecture point of view, all |
|
| 44 |
+containers on a given Docker host are sitting on bridge interfaces. This |
|
| 45 |
+means that they are just like physical machines connected through a |
|
| 46 |
+common Ethernet switch; no more, no less. |
|
| 47 |
+ |
|
| 48 |
+How mature is the code providing kernel namespaces and private |
|
| 49 |
+networking? Kernel namespaces were introduced [between kernel version |
|
| 50 |
+2.6.15 and |
|
| 51 |
+2.6.26](http://lxc.sourceforge.net/index.php/about/kernel-namespaces/). |
|
| 52 |
+This means that since July 2008 (date of the 2.6.26 release, now 7 years |
|
| 53 |
+ago), namespace code has been exercised and scrutinized on a large |
|
| 54 |
+number of production systems. And there is more: the design and |
|
| 55 |
+inspiration for the namespaces code are even older. Namespaces are |
|
| 56 |
+actually an effort to reimplement the features of [OpenVZ]( |
|
| 57 |
+http://en.wikipedia.org/wiki/OpenVZ) in such a way that they could be |
|
| 58 |
+merged within the mainstream kernel. And OpenVZ was initially released |
|
| 59 |
+in 2005, so both the design and the implementation are pretty mature. |
|
| 60 |
+ |
|
| 61 |
+## Control groups |
|
| 62 |
+ |
|
| 63 |
+Control Groups are another key component of Linux Containers. They |
|
| 64 |
+implement resource accounting and limiting. They provide many |
|
| 65 |
+useful metrics, but they also help ensure that each container gets |
|
| 66 |
+its fair share of memory, CPU, disk I/O; and, more importantly, that a |
|
| 67 |
+single container cannot bring the system down by exhausting one of those |
|
| 68 |
+resources. |
|
| 69 |
+ |
|
| 70 |
+So while they do not play a role in preventing one container from |
|
| 71 |
+accessing or affecting the data and processes of another container, they |
|
| 72 |
+are essential to fend off some denial-of-service attacks. They are |
|
| 73 |
+particularly important on multi-tenant platforms, like public and |
|
| 74 |
+private PaaS, to guarantee a consistent uptime (and performance) even |
|
| 75 |
+when some applications start to misbehave. |
|
| 76 |
+ |
|
| 77 |
+Control Groups have been around for a while as well: the code was |
|
| 78 |
+started in 2006, and initially merged in kernel 2.6.24. |
|
| 79 |
+ |
|
| 80 |
+## Docker daemon attack surface |
|
| 81 |
+ |
|
| 82 |
+Running containers (and applications) with Docker implies running the |
|
| 83 |
+Docker daemon. This daemon currently requires `root` privileges, and you |
|
| 84 |
+should therefore be aware of some important details. |
|
| 85 |
+ |
|
| 86 |
+First of all, **only trusted users should be allowed to control your |
|
| 87 |
+Docker daemon**. This is a direct consequence of some powerful Docker |
|
| 88 |
+features. Specifically, Docker allows you to share a directory between |
|
| 89 |
+the Docker host and a guest container; and it allows you to do so |
|
| 90 |
+without limiting the access rights of the container. This means that you |
|
| 91 |
+can start a container where the `/host` directory will be the `/` directory |
|
| 92 |
+on your host; and the container will be able to alter your host filesystem |
|
| 93 |
+without any restriction. This is similar to how virtualization systems |
|
| 94 |
+allow filesystem resource sharing. Nothing prevents you from sharing your |
|
| 95 |
+root filesystem (or even your root block device) with a virtual machine. |
|
| 96 |
+ |
|
| 97 |
+This has a strong security implication: for example, if you instrument Docker |
|
| 98 |
+from a web server to provision containers through an API, you should be |
|
| 99 |
+even more careful than usual with parameter checking, to make sure that |
|
| 100 |
+a malicious user cannot pass crafted parameters causing Docker to create |
|
| 101 |
+arbitrary containers. |
|
| 102 |
+ |
|
| 103 |
+For this reason, the REST API endpoint (used by the Docker CLI to |
|
| 104 |
+communicate with the Docker daemon) changed in Docker 0.5.2, and now |
|
| 105 |
+uses a UNIX socket instead of a TCP socket bound on 127.0.0.1 (the |
|
| 106 |
+latter being prone to cross-site-scripting attacks if you happen to run |
|
| 107 |
+Docker directly on your local machine, outside of a VM). You can then |
|
| 108 |
+use traditional UNIX permission checks to limit access to the control |
|
| 109 |
+socket. |
|
| 110 |
+ |
|
| 111 |
+You can also expose the REST API over HTTP if you explicitly decide to do so. |
|
| 112 |
+However, if you do that, being aware of the above mentioned security |
|
| 113 |
+implication, you should ensure that it will be reachable only from a |
|
| 114 |
+trusted network or VPN; or protected with e.g., `stunnel` and client SSL |
|
| 115 |
+certificates. You can also secure them with [HTTPS and |
|
| 116 |
+certificates](../articles/https/). |
|
| 117 |
+ |
|
| 118 |
+The daemon is also potentially vulnerable to other inputs, such as image |
|
| 119 |
+loading from either disk with 'docker load', or from the network with |
|
| 120 |
+'docker pull'. This has been a focus of improvement in the community, |
|
| 121 |
+especially for 'pull' security. While these overlap, it should be noted |
|
| 122 |
+that 'docker load' is a mechanism for backup and restore and is not |
|
| 123 |
+currently considered a secure mechanism for loading images. As of |
|
| 124 |
+Docker 1.3.2, images are now extracted in a chrooted subprocess on |
|
| 125 |
+Linux/Unix platforms, being the first-step in a wider effort toward |
|
| 126 |
+privilege separation. |
|
| 127 |
+ |
|
| 128 |
+Eventually, it is expected that the Docker daemon will run restricted |
|
| 129 |
+privileges, delegating operations well-audited sub-processes, |
|
| 130 |
+each with its own (very limited) scope of Linux capabilities, |
|
| 131 |
+virtual network setup, filesystem management, etc. That is, most likely, |
|
| 132 |
+pieces of the Docker engine itself will run inside of containers. |
|
| 133 |
+ |
|
| 134 |
+Finally, if you run Docker on a server, it is recommended to run |
|
| 135 |
+exclusively Docker in the server, and move all other services within |
|
| 136 |
+containers controlled by Docker. Of course, it is fine to keep your |
|
| 137 |
+favorite admin tools (probably at least an SSH server), as well as |
|
| 138 |
+existing monitoring/supervision processes (e.g., NRPE, collectd, etc). |
|
| 139 |
+ |
|
| 140 |
+## Linux kernel capabilities |
|
| 141 |
+ |
|
| 142 |
+By default, Docker starts containers with a restricted set of |
|
| 143 |
+capabilities. What does that mean? |
|
| 144 |
+ |
|
| 145 |
+Capabilities turn the binary "root/non-root" dichotomy into a |
|
| 146 |
+fine-grained access control system. Processes (like web servers) that |
|
| 147 |
+just need to bind on a port below 1024 do not have to run as root: they |
|
| 148 |
+can just be granted the `net_bind_service` capability instead. And there |
|
| 149 |
+are many other capabilities, for almost all the specific areas where root |
|
| 150 |
+privileges are usually needed. |
|
| 151 |
+ |
|
| 152 |
+This means a lot for container security; let's see why! |
|
| 153 |
+ |
|
| 154 |
+Your average server (bare metal or virtual machine) needs to run a bunch |
|
| 155 |
+of processes as root. Those typically include SSH, cron, syslogd; |
|
| 156 |
+hardware management tools (e.g., load modules), network configuration |
|
| 157 |
+tools (e.g., to handle DHCP, WPA, or VPNs), and much more. A container is |
|
| 158 |
+very different, because almost all of those tasks are handled by the |
|
| 159 |
+infrastructure around the container: |
|
| 160 |
+ |
|
| 161 |
+ - SSH access will typically be managed by a single server running on |
|
| 162 |
+ the Docker host; |
|
| 163 |
+ - `cron`, when necessary, should run as a user |
|
| 164 |
+ process, dedicated and tailored for the app that needs its |
|
| 165 |
+ scheduling service, rather than as a platform-wide facility; |
|
| 166 |
+ - log management will also typically be handed to Docker, or by |
|
| 167 |
+ third-party services like Loggly or Splunk; |
|
| 168 |
+ - hardware management is irrelevant, meaning that you never need to |
|
| 169 |
+ run `udevd` or equivalent daemons within |
|
| 170 |
+ containers; |
|
| 171 |
+ - network management happens outside of the containers, enforcing |
|
| 172 |
+ separation of concerns as much as possible, meaning that a container |
|
| 173 |
+ should never need to perform `ifconfig`, |
|
| 174 |
+ `route`, or ip commands (except when a container |
|
| 175 |
+ is specifically engineered to behave like a router or firewall, of |
|
| 176 |
+ course). |
|
| 177 |
+ |
|
| 178 |
+This means that in most cases, containers will not need "real" root |
|
| 179 |
+privileges *at all*. And therefore, containers can run with a reduced |
|
| 180 |
+capability set; meaning that "root" within a container has much less |
|
| 181 |
+privileges than the real "root". For instance, it is possible to: |
|
| 182 |
+ |
|
| 183 |
+ - deny all "mount" operations; |
|
| 184 |
+ - deny access to raw sockets (to prevent packet spoofing); |
|
| 185 |
+ - deny access to some filesystem operations, like creating new device |
|
| 186 |
+ nodes, changing the owner of files, or altering attributes (including |
|
| 187 |
+ the immutable flag); |
|
| 188 |
+ - deny module loading; |
|
| 189 |
+ - and many others. |
|
| 190 |
+ |
|
| 191 |
+This means that even if an intruder manages to escalate to root within a |
|
| 192 |
+container, it will be much harder to do serious damage, or to escalate |
|
| 193 |
+to the host. |
|
| 194 |
+ |
|
| 195 |
+This won't affect regular web apps; but malicious users will find that |
|
| 196 |
+the arsenal at their disposal has shrunk considerably! By default Docker |
|
| 197 |
+drops all capabilities except [those |
|
| 198 |
+needed](https://github.com/docker/docker/blob/87de5fdd5972343a11847922e0f41d9898b5cff7/daemon/execdriver/native/template/default_template_linux.go#L16-L29), |
|
| 199 |
+a whitelist instead of a blacklist approach. You can see a full list of |
|
| 200 |
+available capabilities in [Linux |
|
| 201 |
+manpages](http://man7.org/linux/man-pages/man7/capabilities.7.html). |
|
| 202 |
+ |
|
| 203 |
+One primary risk with running Docker containers is that the default set |
|
| 204 |
+of capabilities and mounts given to a container may provide incomplete |
|
| 205 |
+isolation, either independently, or when used in combination with |
|
| 206 |
+kernel vulnerabilities. |
|
| 207 |
+ |
|
| 208 |
+Docker supports the addition and removal of capabilities, allowing use |
|
| 209 |
+of a non-default profile. This may make Docker more secure through |
|
| 210 |
+capability removal, or less secure through the addition of capabilities. |
|
| 211 |
+The best practice for users would be to remove all capabilities except |
|
| 212 |
+those explicitly required for their processes. |
|
| 213 |
+ |
|
| 214 |
+## Other kernel security features |
|
| 215 |
+ |
|
| 216 |
+Capabilities are just one of the many security features provided by |
|
| 217 |
+modern Linux kernels. It is also possible to leverage existing, |
|
| 218 |
+well-known systems like TOMOYO, AppArmor, SELinux, GRSEC, etc. with |
|
| 219 |
+Docker. |
|
| 220 |
+ |
|
| 221 |
+While Docker currently only enables capabilities, it doesn't interfere |
|
| 222 |
+with the other systems. This means that there are many different ways to |
|
| 223 |
+harden a Docker host. Here are a few examples. |
|
| 224 |
+ |
|
| 225 |
+ - You can run a kernel with GRSEC and PAX. This will add many safety |
|
| 226 |
+ checks, both at compile-time and run-time; it will also defeat many |
|
| 227 |
+ exploits, thanks to techniques like address randomization. It doesn't |
|
| 228 |
+ require Docker-specific configuration, since those security features |
|
| 229 |
+ apply system-wide, independent of containers. |
|
| 230 |
+ - If your distribution comes with security model templates for |
|
| 231 |
+ Docker containers, you can use them out of the box. For instance, we |
|
| 232 |
+ ship a template that works with AppArmor and Red Hat comes with SELinux |
|
| 233 |
+ policies for Docker. These templates provide an extra safety net (even |
|
| 234 |
+ though it overlaps greatly with capabilities). |
|
| 235 |
+ - You can define your own policies using your favorite access control |
|
| 236 |
+ mechanism. |
|
| 237 |
+ |
|
| 238 |
+Just like there are many third-party tools to augment Docker containers |
|
| 239 |
+with e.g., special network topologies or shared filesystems, you can |
|
| 240 |
+expect to see tools to harden existing Docker containers without |
|
| 241 |
+affecting Docker's core. |
|
| 242 |
+ |
|
| 243 |
+Recent improvements in Linux namespaces will soon allow to run |
|
| 244 |
+full-featured containers without root privileges, thanks to the new user |
|
| 245 |
+namespace. This is covered in detail [here]( |
|
| 246 |
+http://s3hh.wordpress.com/2013/07/19/creating-and-using-containers-without-privilege/). |
|
| 247 |
+Moreover, this will solve the problem caused by sharing filesystems |
|
| 248 |
+between host and guest, since the user namespace allows users within |
|
| 249 |
+containers (including the root user) to be mapped to other users in the |
|
| 250 |
+host system. |
|
| 251 |
+ |
|
| 252 |
+Today, Docker does not directly support user namespaces, but they |
|
| 253 |
+may still be utilized by Docker containers on supported kernels, |
|
| 254 |
+by directly using the clone syscall, or utilizing the 'unshare' |
|
| 255 |
+utility. Using this, some users may find it possible to drop |
|
| 256 |
+more capabilities from their process as user namespaces provide |
|
| 257 |
+an artificial capabilities set. Likewise, however, this artificial |
|
| 258 |
+capabilities set may require use of 'capsh' to restrict the |
|
| 259 |
+user-namespace capabilities set when using 'unshare'. |
|
| 260 |
+ |
|
| 261 |
+Eventually, it is expected that Docker will have direct, native support |
|
| 262 |
+for user-namespaces, simplifying the process of hardening containers. |
|
| 263 |
+ |
|
| 264 |
+## Conclusions |
|
| 265 |
+ |
|
| 266 |
+Docker containers are, by default, quite secure; especially if you take |
|
| 267 |
+care of running your processes inside the containers as non-privileged |
|
| 268 |
+users (i.e., non-`root`). |
|
| 269 |
+ |
|
| 270 |
+You can add an extra layer of safety by enabling AppArmor, SELinux, |
|
| 271 |
+GRSEC, or your favorite hardening solution. |
|
| 272 |
+ |
|
| 273 |
+Last but not least, if you see interesting security features in other |
|
| 274 |
+containerization systems, these are simply kernels features that may |
|
| 275 |
+be implemented in Docker as well. We welcome users to submit issues, |
|
| 276 |
+pull requests, and communicate via the mailing list. |
|
| 277 |
+ |
|
| 278 |
+## Related Information |
|
| 279 |
+ |
|
| 280 |
+* [Use trusted images](../security/trust/index.md) |
|
| 281 |
+* [Seccomp security profiles for Docker](../security/seccomp.md) |
|
| 282 |
+* [AppArmor security profiles for Docker](../security/apparmor.md) |
|
| 283 |
+* [On the Security of Containers (2014)](https://medium.com/@ewindisch/on-the-security-of-containers-2c60ffe25a9e) |