Browse code

Merge pull request #18452 from moxiegirl/carrry-doc-17989

Update security docs for seccomp/apparmor

Sebastiaan van Stijn authored on 2016/01/15 07:42:37
Showing 13 changed files
1 1
deleted file mode 100644
... ...
@@ -1,284 +0,0 @@
1
-<!--[metadata]>
2
-+++
3
-title = "Docker security"
4
-description = "Review of the Docker Daemon attack surface"
5
-keywords = ["Docker, Docker documentation,  security"]
6
-[menu.main]
7
-parent = "smn_administrate"
8
-weight = 2
9
-+++
10
-<![end-metadata]-->
11
-
12
-# Docker security
13
-
14
-There are three major areas to consider when reviewing Docker security:
15
-
16
- - the intrinsic security of the kernel and its support for
17
-   namespaces and cgroups;
18
- - the attack surface of the Docker daemon itself;
19
- - loopholes in the container configuration profile, either by default,
20
-   or when customized by users.
21
- - the "hardening" security features of the kernel and how they
22
-   interact with containers.
23
-
24
-## Kernel namespaces
25
-
26
-Docker containers are very similar to LXC containers, and they have
27
-similar security features. When you start a container with
28
-`docker run`, behind the scenes Docker creates a set of namespaces and control
29
-groups for the container.
30
-
31
-**Namespaces provide the first and most straightforward form of
32
-isolation**: processes running within a container cannot see, and even
33
-less affect, processes running in another container, or in the host
34
-system.
35
-
36
-**Each container also gets its own network stack**, meaning that a
37
-container doesn't get privileged access to the sockets or interfaces
38
-of another container. Of course, if the host system is setup
39
-accordingly, containers can interact with each other through their
40
-respective network interfaces — just like they can interact with
41
-external hosts. When you specify public ports for your containers or use
42
-[*links*](../userguide/networking/default_network/dockerlinks.md)
43
-then IP traffic is allowed between containers. They can ping each other,
44
-send/receive UDP packets, and establish TCP connections, but that can be
45
-restricted if necessary. From a network architecture point of view, all
46
-containers on a given Docker host are sitting on bridge interfaces. This
47
-means that they are just like physical machines connected through a
48
-common Ethernet switch; no more, no less.
49
-
50
-How mature is the code providing kernel namespaces and private
51
-networking? Kernel namespaces were introduced [between kernel version
52
-2.6.15 and
53
-2.6.26](http://lxc.sourceforge.net/index.php/about/kernel-namespaces/).
54
-This means that since July 2008 (date of the 2.6.26 release, now 7 years
55
-ago), namespace code has been exercised and scrutinized on a large
56
-number of production systems. And there is more: the design and
57
-inspiration for the namespaces code are even older. Namespaces are
58
-actually an effort to reimplement the features of [OpenVZ](
59
-http://en.wikipedia.org/wiki/OpenVZ) in such a way that they could be
60
-merged within the mainstream kernel. And OpenVZ was initially released
61
-in 2005, so both the design and the implementation are pretty mature.
62
-
63
-## Control groups
64
-
65
-Control Groups are another key component of Linux Containers. They
66
-implement resource accounting and limiting. They provide many
67
-useful metrics, but they also help ensure that each container gets
68
-its fair share of memory, CPU, disk I/O; and, more importantly, that a
69
-single container cannot bring the system down by exhausting one of those
70
-resources.
71
-
72
-So while they do not play a role in preventing one container from
73
-accessing or affecting the data and processes of another container, they
74
-are essential to fend off some denial-of-service attacks. They are
75
-particularly important on multi-tenant platforms, like public and
76
-private PaaS, to guarantee a consistent uptime (and performance) even
77
-when some applications start to misbehave.
78
-
79
-Control Groups have been around for a while as well: the code was
80
-started in 2006, and initially merged in kernel 2.6.24.
81
-
82
-## Docker daemon attack surface
83
-
84
-Running containers (and applications) with Docker implies running the
85
-Docker daemon. This daemon currently requires `root` privileges, and you
86
-should therefore be aware of some important details.
87
-
88
-First of all, **only trusted users should be allowed to control your
89
-Docker daemon**. This is a direct consequence of some powerful Docker
90
-features. Specifically, Docker allows you to share a directory between
91
-the Docker host and a guest container; and it allows you to do so
92
-without limiting the access rights of the container. This means that you
93
-can start a container where the `/host` directory will be the `/` directory
94
-on your host; and the container will be able to alter your host filesystem
95
-without any restriction. This is similar to how virtualization systems
96
-allow filesystem resource sharing. Nothing prevents you from sharing your
97
-root filesystem (or even your root block device) with a virtual machine.
98
-
99
-This has a strong security implication: for example, if you instrument Docker
100
-from a web server to provision containers through an API, you should be
101
-even more careful than usual with parameter checking, to make sure that
102
-a malicious user cannot pass crafted parameters causing Docker to create
103
-arbitrary containers.
104
-
105
-For this reason, the REST API endpoint (used by the Docker CLI to
106
-communicate with the Docker daemon) changed in Docker 0.5.2, and now
107
-uses a UNIX socket instead of a TCP socket bound on 127.0.0.1 (the
108
-latter being prone to cross-site-scripting attacks if you happen to run
109
-Docker directly on your local machine, outside of a VM). You can then
110
-use traditional UNIX permission checks to limit access to the control
111
-socket.
112
-
113
-You can also expose the REST API over HTTP if you explicitly decide to do so.
114
-However, if you do that, being aware of the above mentioned security
115
-implication, you should ensure that it will be reachable only from a
116
-trusted network or VPN; or protected with e.g., `stunnel` and client SSL
117
-certificates. You can also secure them with [HTTPS and
118
-certificates](../articles/https/).
119
-
120
-The daemon is also potentially vulnerable to other inputs, such as image
121
-loading from either disk with 'docker load', or from the network with
122
-'docker pull'. This has been a focus of improvement in the community,
123
-especially for 'pull' security. While these overlap, it should be noted
124
-that 'docker load' is a mechanism for backup and restore and is not
125
-currently considered a secure mechanism for loading images. As of
126
-Docker 1.3.2, images are now extracted in a chrooted subprocess on
127
-Linux/Unix platforms, being the first-step in a wider effort toward
128
-privilege separation.
129
-
130
-Eventually, it is expected that the Docker daemon will run restricted
131
-privileges, delegating operations well-audited sub-processes,
132
-each with its own (very limited) scope of Linux capabilities,
133
-virtual network setup, filesystem management, etc. That is, most likely,
134
-pieces of the Docker engine itself will run inside of containers.
135
-
136
-Finally, if you run Docker on a server, it is recommended to run
137
-exclusively Docker in the server, and move all other services within
138
-containers controlled by Docker. Of course, it is fine to keep your
139
-favorite admin tools (probably at least an SSH server), as well as
140
-existing monitoring/supervision processes (e.g., NRPE, collectd, etc).
141
-
142
-## Linux kernel capabilities
143
-
144
-By default, Docker starts containers with a restricted set of
145
-capabilities. What does that mean?
146
-
147
-Capabilities turn the binary "root/non-root" dichotomy into a
148
-fine-grained access control system. Processes (like web servers) that
149
-just need to bind on a port below 1024 do not have to run as root: they
150
-can just be granted the `net_bind_service` capability instead. And there
151
-are many other capabilities, for almost all the specific areas where root
152
-privileges are usually needed.
153
-
154
-This means a lot for container security; let's see why!
155
-
156
-Your average server (bare metal or virtual machine) needs to run a bunch
157
-of processes as root. Those typically include SSH, cron, syslogd;
158
-hardware management tools (e.g., load modules), network configuration
159
-tools (e.g., to handle DHCP, WPA, or VPNs), and much more. A container is
160
-very different, because almost all of those tasks are handled by the
161
-infrastructure around the container:
162
-
163
- - SSH access will typically be managed by a single server running on
164
-   the Docker host;
165
- - `cron`, when necessary, should run as a user
166
-   process, dedicated and tailored for the app that needs its
167
-   scheduling service, rather than as a platform-wide facility;
168
- - log management will also typically be handed to Docker, or by
169
-   third-party services like Loggly or Splunk;
170
- - hardware management is irrelevant, meaning that you never need to
171
-   run `udevd` or equivalent daemons within
172
-   containers;
173
- - network management happens outside of the containers, enforcing
174
-   separation of concerns as much as possible, meaning that a container
175
-   should never need to perform `ifconfig`,
176
-   `route`, or ip commands (except when a container
177
-   is specifically engineered to behave like a router or firewall, of
178
-   course).
179
-
180
-This means that in most cases, containers will not need "real" root
181
-privileges *at all*. And therefore, containers can run with a reduced
182
-capability set; meaning that "root" within a container has much less
183
-privileges than the real "root". For instance, it is possible to:
184
-
185
- - deny all "mount" operations;
186
- - deny access to raw sockets (to prevent packet spoofing);
187
- - deny access to some filesystem operations, like creating new device
188
-   nodes, changing the owner of files, or altering attributes (including
189
-   the immutable flag);
190
- - deny module loading;
191
- - and many others.
192
-
193
-This means that even if an intruder manages to escalate to root within a
194
-container, it will be much harder to do serious damage, or to escalate
195
-to the host.
196
-
197
-This won't affect regular web apps; but malicious users will find that
198
-the arsenal at their disposal has shrunk considerably! By default Docker
199
-drops all capabilities except [those
200
-needed](https://github.com/docker/docker/blob/87de5fdd5972343a11847922e0f41d9898b5cff7/daemon/execdriver/native/template/default_template_linux.go#L16-L29),
201
-a whitelist instead of a blacklist approach. You can see a full list of
202
-available capabilities in [Linux
203
-manpages](http://man7.org/linux/man-pages/man7/capabilities.7.html).
204
-
205
-One primary risk with running Docker containers is that the default set
206
-of capabilities and mounts given to a container may provide incomplete
207
-isolation, either independently, or when used in combination with
208
-kernel vulnerabilities.
209
-
210
-Docker supports the addition and removal of capabilities, allowing use
211
-of a non-default profile. This may make Docker more secure through
212
-capability removal, or less secure through the addition of capabilities.
213
-The best practice for users would be to remove all capabilities except
214
-those explicitly required for their processes.
215
-
216
-## Other kernel security features
217
-
218
-Capabilities are just one of the many security features provided by
219
-modern Linux kernels. It is also possible to leverage existing,
220
-well-known systems like TOMOYO, AppArmor, SELinux, GRSEC, etc. with
221
-Docker.
222
-
223
-While Docker currently only enables capabilities, it doesn't interfere
224
-with the other systems. This means that there are many different ways to
225
-harden a Docker host. Here are a few examples.
226
-
227
- - You can run a kernel with GRSEC and PAX. This will add many safety
228
-   checks, both at compile-time and run-time; it will also defeat many
229
-   exploits, thanks to techniques like address randomization. It doesn't
230
-   require Docker-specific configuration, since those security features
231
-   apply system-wide, independent of containers.
232
- - If your distribution comes with security model templates for
233
-   Docker containers, you can use them out of the box. For instance, we
234
-   ship a template that works with AppArmor and Red Hat comes with SELinux
235
-   policies for Docker. These templates provide an extra safety net (even
236
-   though it overlaps greatly with capabilities).
237
- - You can define your own policies using your favorite access control
238
-   mechanism.
239
-
240
-Just like there are many third-party tools to augment Docker containers
241
-with e.g., special network topologies or shared filesystems, you can
242
-expect to see tools to harden existing Docker containers without
243
-affecting Docker's core.
244
-
245
-Recent improvements in Linux namespaces will soon allow to run
246
-full-featured containers without root privileges, thanks to the new user
247
-namespace. This is covered in detail [here](
248
-http://s3hh.wordpress.com/2013/07/19/creating-and-using-containers-without-privilege/).
249
-Moreover, this will solve the problem caused by sharing filesystems
250
-between host and guest, since the user namespace allows users within
251
-containers (including the root user) to be mapped to other users in the
252
-host system.
253
-
254
-Today, Docker does not directly support user namespaces, but they
255
-may still be utilized by Docker containers on supported kernels,
256
-by directly using the clone syscall, or utilizing the 'unshare'
257
-utility. Using this, some users may find it possible to drop
258
-more capabilities from their process as user namespaces provide
259
-an artificial capabilities set. Likewise, however, this artificial
260
-capabilities set may require use of 'capsh' to restrict the
261
-user-namespace capabilities set when using 'unshare'.
262
-
263
-Eventually, it is expected that Docker will have direct, native support
264
-for user-namespaces, simplifying the process of hardening containers.
265
-
266
-## Conclusions
267
-
268
-Docker containers are, by default, quite secure; especially if you take
269
-care of running your processes inside the containers as non-privileged
270
-users (i.e., non-`root`).
271
-
272
-You can add an extra layer of safety by enabling AppArmor, SELinux,
273
-GRSEC, or your favorite hardening solution.
274
-
275
-Last but not least, if you see interesting security features in other
276
-containerization systems, these are simply kernels features that may
277
-be implemented in Docker as well. We welcome users to submit issues,
278
-pull requests, and communicate via the mailing list.
279
-
280
-References:
281
-
282
-* [Docker Containers: How Secure Are They? (2013)](
283
-http://blog.docker.com/2013/08/containers-docker-how-secure-are-they/).
284
-* [On the Security of Containers (2014)](https://medium.com/@ewindisch/on-the-security-of-containers-2c60ffe25a9e).
... ...
@@ -186,7 +186,7 @@ need to add `sudo` to all the client commands.
186 186
 
187 187
 > **Warning**: 
188 188
 > The *docker* group (or the group specified with `-G`) is root-equivalent;
189
-> see [*Docker Daemon Attack Surface*](../articles/security.md#docker-daemon-attack-surface) details.
189
+> see [*Docker Daemon Attack Surface*](../security/security.md#docker-daemon-attack-surface) details.
190 190
 
191 191
 ## Upgrades
192 192
 
... ...
@@ -134,7 +134,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group.
134 134
 
135 135
 >**Warning**: The `docker` group is equivalent to the `root` user; For details
136 136
 >on how this impacts security in your system, see [*Docker Daemon Attack
137
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details.
137
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details.
138 138
 
139 139
 To create the `docker` group and add your user:
140 140
 
... ...
@@ -133,7 +133,7 @@ use the `-G` flag to specify an alternative group.
133 133
 
134 134
 > **Warning**:
135 135
 > The `docker` group (or the group specified with the `-G` flag) is
136
-> `root`-equivalent; see [*Docker Daemon Attack Surface*](../articles/security.md#docker-daemon-attack-surface) details.
136
+> `root`-equivalent; see [*Docker Daemon Attack Surface*](../security/security.md#docker-daemon-attack-surface) details.
137 137
 
138 138
 **Example:**
139 139
 
... ...
@@ -128,7 +128,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group.
128 128
 
129 129
 >**Warning**: The `docker` group is equivalent to the `root` user; For details
130 130
 >on how this impacts security in your system, see [*Docker Daemon Attack
131
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details.
131
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details.
132 132
 
133 133
 To create the `docker` group and add your user:
134 134
 
... ...
@@ -99,7 +99,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group.
99 99
 
100 100
 >**Warning**: The `docker` group is equivalent to the `root` user; For details
101 101
 >on how this impacts security in your system, see [*Docker Daemon Attack
102
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details.
102
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details.
103 103
 
104 104
 To create the `docker` group and add your user:
105 105
 
... ...
@@ -126,7 +126,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group.
126 126
 
127 127
 >**Warning**: The `docker` group is equivalent to the `root` user; For details
128 128
 >on how this impacts security in your system, see [*Docker Daemon Attack
129
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details.
129
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details.
130 130
 
131 131
 To create the `docker` group and add your user:
132 132
 
... ...
@@ -225,7 +225,7 @@ makes the ownership of the Unix socket read/writable by the `docker` group.
225 225
 
226 226
 >**Warning**: The `docker` group is equivalent to the `root` user; For details
227 227
 >on how this impacts security in your system, see [*Docker Daemon Attack
228
->Surface*](../articles/security.md#docker-daemon-attack-surface) for details.
228
+>Surface*](../security/security.md#docker-daemon-attack-surface) for details.
229 229
 
230 230
 To create the `docker` group and add your user:
231 231
 
... ...
@@ -30,7 +30,7 @@ adding the server name.
30 30
 `docker login` requires user to use `sudo` or be `root`, except when: 
31 31
 
32 32
 1.  connecting to a remote daemon, such as a `docker-machine` provisioned `docker engine`.
33
-2.  user is added to the `docker` group.  This will impact the security of your system; the `docker` group is `root` equivalent.  See [Docker Daemon Attack Surface](https://docs.docker.com/articles/security/#docker-daemon-attack-surface) for details. 
33
+2.  user is added to the `docker` group.  This will impact the security of your system; the `docker` group is `root` equivalent.  See [Docker Daemon Attack Surface](https://docs.docker.com/security/security/#docker-daemon-attack-surface) for details. 
34 34
 
35 35
 You can log into any public or private repository for which you have
36 36
 credentials.  When you log in, the command stores encoded credentials in
... ...
@@ -1,47 +1,74 @@
1 1
 <!-- [metadata]>
2 2
 +++
3
-draft = true
3
+title = "AppArmor security profiles for Docker"
4
+description = "Enabling AppArmor in Docker"
5
+keywords = ["AppArmor, security, docker, documentation"]
6
+[menu.main]
7
+parent= "smn_secure_docker"
4 8
 +++
5 9
 <![end-metadata]-->
6 10
 
7
-AppArmor security profiles for Docker
11
+# AppArmor security profiles for Docker
8 12
 
9
-AppArmor (Application Armor) is a security module that allows a system
10
-administrator to associate a security profile with each program. Docker
13
+AppArmor (Application Armor) is a Linux security module that protects an
14
+operating system and its applications from security threats. To use it, a system
15
+administrator associates an AppArmor security profile with each program. Docker
11 16
 expects to find an AppArmor policy loaded and enforced.
12 17
 
13
-Container profiles are loaded automatically by Docker. A profile
14
-for the Docker Engine itself also exists and is installed
15
-with the official *.deb* packages. Advanced users and package
16
-managers may find the profile for */usr/bin/docker* underneath
17
-[contrib/apparmor](https://github.com/docker/docker/tree/master/contrib/apparmor)
18
-in the Docker Engine source repository.
18
+Docker automatically loads container profiles. A profile for the Docker Engine
19
+itself also exists and is installed with the official *.deb* packages in
20
+`/etc/apparmor.d/docker` file.
21
+
19 22
 
23
+## Understand the policies
24
+
25
+The `docker-default` profile is the default for running containers. It is
26
+moderately protective while providing wide application compatibility. The
27
+profile is the following:
28
+
29
+```
30
+#include <tunables/global>
20 31
 
21
-Understand the policies
22 32
 
23
-The `docker-default` profile the default for running
24
-containers. It is moderately protective while
25
-providing wide application compatibility.
33
+profile docker-default flags=(attach_disconnected,mediate_deleted) {
26 34
 
27
-The system's standard `unconfined` profile inherits all
28
-system-wide policies, applying path-based policies
29
-intended for the host system inside of containers.
30
-This was the default for privileged containers
31
-prior to Docker 1.8.
35
+  #include <abstractions/base>
32 36
 
33 37
 
34
-Overriding the profile for a container
38
+  network,
39
+  capability,
40
+  file,
41
+  umount,
35 42
 
36
-Users may override the AppArmor profile using the
37
-`security-opt` option (per-container).
43
+  deny @{PROC}/{*,**^[0-9*],sys/kernel/shm*} wkx,
44
+  deny @{PROC}/sysrq-trigger rwklx,
45
+  deny @{PROC}/mem rwklx,
46
+  deny @{PROC}/kmem rwklx,
47
+  deny @{PROC}/kcore rwklx,
38 48
 
39
-For example, the following explicitly specifies the default policy:
49
+  deny mount,
40 50
 
51
+  deny /sys/[^f]*/** wklx,
52
+  deny /sys/f[^s]*/** wklx,
53
+  deny /sys/fs/[^c]*/** wklx,
54
+  deny /sys/fs/c[^g]*/** wklx,
55
+  deny /sys/fs/cg[^r]*/** wklx,
56
+  deny /sys/firmware/efi/efivars/** rwklx,
57
+  deny /sys/kernel/security/** rwklx,
58
+}
41 59
 ```
60
+
61
+When you run a container, it uses the `docker-default` policy unless you
62
+override it with the `security-opt` option. For example, the following
63
+explicitly specifies the default policy:
64
+
65
+```bash
42 66
 $ docker run --rm -it --security-opt apparmor:docker-default hello-world
43 67
 ```
44 68
 
69
+## Contributing to AppArmor code in Docker
70
+
71
+Advanced users and package managers can find a profile for `/usr/bin/docker`
72
+underneath
73
+[contrib/apparmor](https://github.com/docker/docker/tree/master/contrib/apparmor)
74
+in the Docker Engine source repository.
45 75
new file mode 100644
... ...
@@ -0,0 +1,20 @@
0
+<!-- [metadata]>
1
+title = "Work with Docker security"
2
+description = "Sec"
3
+keywords = ["seccomp, security, docker, documentation"]
4
+[menu.main]
5
+identifier="smn_secure_docker"
6
+parent= "mn_use_docker"
7
+<![end-metadata]-->
8
+
9
+# Work with Docker security
10
+
11
+This section discusses the security features you can configure and use within your Docker Engine installation.
12
+
13
+* You can configure Docker's trust features so that your users can push and pull trusted images. To learn how to do this, see [Use trusted images](trust/index.md) in this section.
14
+
15
+* You can configure secure computing mode (Seccomp) policies to secure system calls in a container. For more information, see [Seccomp security profiles for Docker](seccomp.md).
16
+
17
+* An AppArmor profile for Docker is installed with the official *.deb* packages. For information about this profile and overriding it, see [AppArmor security profiles for Docker](apparmor.md).
... ...
@@ -3,27 +3,26 @@
3 3
 title = "Seccomp security profiles for Docker"
4 4
 description = "Enabling seccomp in Docker"
5 5
 keywords = ["seccomp, security, docker, documentation"]
6
+[menu.main]
7
+parent= "smn_secure_docker"
6 8
 +++
7 9
 <![end-metadata]-->
8 10
 
9
-Seccomp security profiles for Docker
11
+# Seccomp security profiles for Docker
10 12
 
11
-The seccomp() system call operates on the Secure Computing (seccomp)
12
-state of the calling process.
13
+Secure computing mode (Seccomp) is a Linux kernel feature. You can use it to
14
+restrict the actions available within the container. The `seccomp()` system
15
+call operates on the seccomp state of the calling process. You can use this
16
+feature to restrict your application's access.
13 17
 
14
-This operation is available only if the kernel is configured
15
-with `CONFIG_SECCOMP` enabled.
18
+This feature is available only if the kernel is configured with `CONFIG_SECCOMP`
19
+enabled.
16 20
 
17
-This allows for allowing or denying of certain syscalls in a container.
21
+## Passing a profile for a container
18 22
 
19
-Passing a profile for a container
20
-
21
-Users may pass a seccomp profile using the `security-opt` option
22
-(per-container).
23
-
24
-The profile has layout in the following form:
23
+The default seccomp profile provides a sane default for running containers with
24
+seccomp. It is moderately protective while providing wide application
25
+compatibility. The default Docker profile has layout in the following form:
25 26
 
26 27
 ```
27 28
 {
... ...
@@ -57,30 +56,14 @@ The profile has layout in the following form:
57 57
 }
58 58
 ```
59 59
 
60
-Then you can run with:
60
+When you run a container, it uses the default profile unless you override
61
+it with the `security-opt` option. For example, the following explicitly
62
+specifies the default policy:
61 63
 
62 64
 ```
63 65
 $ docker run --rm -it --security-opt seccomp:/path/to/seccomp/profile.json hello-world
64 66
 ```
65 67
 
66
-Default Profile
67
-
68
-The default seccomp profile provides a sane default for running
69
-containers with seccomp. It is moderately protective while
70
-providing wide application compatibility.
71
-
72
-
73
-### Overriding the default profile for a container
74
-
75
-You can pass `unconfined` to run a container without the default seccomp
76
-profile.
77
-
78
-```
79
-$ docker run --rm -it --security-opt seccomp:unconfined debian:jessie \
80
-    unshare --map-root-user --user sh -c whoami
81
-```
82
-
83 68
 ### Syscalls blocked by the default profile
84 69
 
85 70
 Docker's default seccomp profile is a whitelist which specifies the calls that
... ...
@@ -91,55 +74,65 @@ the reason each syscall is blocked rather than white-listed.
91 91
 | Syscall             | Description                                                                                                                           |
92 92
 |---------------------|---------------------------------------------------------------------------------------------------------------------------------------|
93 93
 | `acct`              | Accounting syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_PACCT`. |
94
-| `add_key`           | Prevent containers from using the kernel keyring, which is not namespaced.                                                            |
95
-| `adjtimex`          | Similar to `clock_settime` and `settimeofday`, time/date is not namespaced.                                                           |
96
-| `bpf`               | Deny loading potentially persistent bpf programs into kernel, already gated by `CAP_SYS_ADMIN`.                                       |
97
-| `clock_adjtime`     | Time/date is not namespaced.                                                                                                          |
98
-| `clock_settime`     | Time/date is not namespaced.                                                                                                          |
99
-| `clone`             | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE_* flags, except `CLONE_USERNS`.                                  |
100
-| `create_module`     | Deny manipulation and functions on kernel modules.                                                                                    |
101
-| `delete_module`     | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`.                                                    |
102
-| `finit_module`      | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`.                                                    |
103
-| `get_kernel_syms`   | Deny retrieval of exported kernel and module symbols.                                                                                 |
104
-| `get_mempolicy`     | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`.                                               |
105
-| `init_module`       | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`.                                                    |
106
-| `ioperm`            | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`.                                      |
107
-| `iopl`              | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`.                                      |
108
-| `kcmp`              | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`.                                                   |
109
-| `kexec_file_load`   | Sister syscall of `kexec_load` that does the same thing, slightly different arguments.                                                |
110
-| `kexec_load`        | Deny loading a new kernel for later execution.                                                                                        |
111
-| `keyctl`            | Prevent containers from using the kernel keyring, which is not namespaced.                                                            |
112
-| `lookup_dcookie`    | Tracing/profiling syscall, which could leak a lot of information on the host.                                                         |
113
-| `mbind`             | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`.                                               |
114
-| `modify_ldt`        | Old syscall only used in 16-bit code and a potential information leak.                                                                |
115
-| `mount`             | Deny mounting, already gated by `CAP_SYS_ADMIN`.                                                                                      |
116
-| `move_pages`        | Syscall that modifies kernel memory and NUMA settings.                                                                                |
117
-| `name_to_handle_at` | Sister syscall to `open_by_handle_at`. Already gated by `CAP_SYS_NICE`.                                                               |
118
-| `nfsservctl`        | Deny interaction with the kernel nfs daemon.                                                                                          |
119
-| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`.                                                              |
120
-| `perf_event_open`   | Tracing/profiling syscall, which could leak a lot of information on the host.                                                         |
121
-| `personality`       | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns.      |
122
-| `pivot_root`        | Deny `pivot_root`, should be privileged operation.                                                                                    |
123
-| `process_vm_readv`  | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`.                                                   |
124
-| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`.                                                   |
125
-| `ptrace`            | Tracing/profiling syscall, which could leak a lot of information on the host. Already blocked by dropping `CAP_PTRACE`.               |
126
-| `query_module`      | Deny manipulation and functions on kernel modules.                                                                                    |
127
-| `quotactl`          | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`.      |
128
-| `reboot`            | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`.                                                                   |
94
+| `add_key`           | Prevent containers from using the kernel keyring, which is not namespaced.                                   |
95
+| `adjtimex`          | Similar to `clock_settime` and `settimeofday`, time/date is not namespaced.                                  |
96
+| `bpf`               | Deny loading potentially persistent bpf programs into kernel, already gated by `CAP_SYS_ADMIN`.              |
97
+| `clock_adjtime`     | Time/date is not namespaced.                                                                                 |
98
+| `clock_settime`     | Time/date is not namespaced.                                                                                 |
99
+| `clone`             | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE_* flags, except `CLONE_USERNS`.         |
100
+| `create_module`     | Deny manipulation and functions on kernel modules.                                                           |
101
+| `delete_module`     | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`.                           |
102
+| `finit_module`      | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`.                           |
103
+| `get_kernel_syms`   | Deny retrieval of exported kernel and module symbols.                                                        |
104
+| `get_mempolicy`     | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`.                      |
105
+| `init_module`       | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`.                           |
106
+| `ioperm`            | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`.             |
107
+| `iopl`              | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`.             |
108
+| `kcmp`              | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`.                          |
109
+| `kexec_file_load`   | Sister syscall of `kexec_load` that does the same thing, slightly different arguments.                       |
110
+| `kexec_load`        | Deny loading a new kernel for later execution.                                                               |
111
+| `keyctl`            | Prevent containers from using the kernel keyring, which is not namespaced.                                   |
112
+| `lookup_dcookie`    | Tracing/profiling syscall, which could leak a lot of information on the host.                                |
113
+| `mbind`             | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`.                      |
114
+| `modify_ldt`        | Old syscall only used in 16-bit code and a potential information leak.                                       |
115
+| `mount`             | Deny mounting, already gated by `CAP_SYS_ADMIN`.                                                             |
116
+| `move_pages`        | Syscall that modifies kernel memory and NUMA settings.                                                       |
117
+| `name_to_handle_at` | Sister syscall to `open_by_handle_at`. Already gated by `CAP_SYS_NICE`.                                      |
118
+| `nfsservctl`        | Deny interaction with the kernel nfs daemon.                                                                 |
119
+| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`.                                     |
120
+| `perf_event_open`   | Tracing/profiling syscall, which could leak a lot of information on the host.                                |
121
+| `personality`       | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns. |
122
+| `pivot_root`        | Deny `pivot_root`, should be privileged operation.                                                           |
123
+| `process_vm_readv`  | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`.                          |
124
+| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_PTRACE`.                          |
125
+| `ptrace`            | Tracing/profiling syscall, which could leak a lot of information on the host. Already blocked by dropping `CAP_PTRACE`. |
126
+| `query_module`      | Deny manipulation and functions on kernel modules.                                                            |
127
+| `quotactl`          | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`. |
128
+| `reboot`            | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`.                                           |
129 129
 | `restart_syscall`   | Don't allow containers to restart a syscall. Possible seccomp bypass see: https://code.google.com/p/chromium/issues/detail?id=408827. |
130
-| `request_key`       | Prevent containers from using the kernel keyring, which is not namespaced.                                                            |
131
-| `set_mempolicy`     | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`.                                               |
132
-| `setns`             | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`.                                                            |
133
-| `settimeofday`      | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`.                                                                            |
134
-| `stime`             | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`.                                                                            |
135
-| `swapon`            | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`.                                                               |
136
-| `swapoff`           | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`.                                                               |
137
-| `sysfs`             | Obsolete syscall.                                                                                                                     |
138
-| `_sysctl`           | Obsolete, replaced by /proc/sys.                                                                                                      |
139
-| `umount`            | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`.                                                                      |
140
-| `umount2`           | Should be a privileged operation.                                                                                                     |
141
-| `unshare`           | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`.                     |
142
-| `uselib`            | Older syscall related to shared libraries, unused for a long time.                                                                    |
143
-| `ustat`             | Obsolete syscall.                                                                                                                     |
144
-| `vm86`              | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`.                                                               |
145
-| `vm86old`           | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`.                                                               |
130
+| `request_key`       | Prevent containers from using the kernel keyring, which is not namespaced.                                    |
131
+| `set_mempolicy`     | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`.                       |
132
+| `setns`             | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`.                                    |
133
+| `settimeofday`      | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`.                                                    |
134
+| `stime`             | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`.                                                    |
135
+| `swapon`            | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`.                                       |
136
+| `swapoff`           | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`.                                       |
137
+| `sysfs`             | Obsolete syscall.                                                                                             |
138
+| `_sysctl`           | Obsolete, replaced by /proc/sys.                                                                              |
139
+| `umount`            | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`.                                              |
140
+| `umount2`           | Should be a privileged operation.                                                                             |
141
+| `unshare`           | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`. |
142
+| `uselib`            | Older syscall related to shared libraries, unused for a long time.                                            |
143
+| `ustat`             | Obsolete syscall.                                                                                             |
144
+| `vm86`              | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`.                                       |
145
+| `vm86old`           | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`.                                       |
146
+
147
+## Run without the default seccomp profile
148
+
149
+You can pass `unconfined` to run a container without the default seccomp
150
+profile.
151
+
152
+```
153
+$ docker run --rm -it --security-opt seccomp:unconfined debian:jessie \
154
+    unshare --map-root-user --user sh -c whoami
155
+```
146 156
new file mode 100644
... ...
@@ -0,0 +1,286 @@
0
+<!--[metadata]>
1
+aliases = ["/engine/articles/security/"]
2
+title = "Docker security"
3
+description = "Review of the Docker Daemon attack surface"
4
+keywords = ["Docker, Docker documentation,  security"]
5
+[menu.main]
6
+parent = "smn_secure_docker"
7
+weight =-99
8
+<![end-metadata]-->
9
+
10
+# Docker security
11
+
12
+There are three major areas to consider when reviewing Docker security:
13
+
14
+ - the intrinsic security of the kernel and its support for
15
+   namespaces and cgroups;
16
+ - the attack surface of the Docker daemon itself;
17
+ - loopholes in the container configuration profile, either by default,
18
+   or when customized by users.
19
+ - the "hardening" security features of the kernel and how they
20
+   interact with containers.
21
+
22
+## Kernel namespaces
23
+
24
+Docker containers are very similar to LXC containers, and they have
25
+similar security features. When you start a container with
26
+`docker run`, behind the scenes Docker creates a set of namespaces and control
27
+groups for the container.
28
+
29
+**Namespaces provide the first and most straightforward form of
30
+isolation**: processes running within a container cannot see, and even
31
+less affect, processes running in another container, or in the host
32
+system.
33
+
34
+**Each container also gets its own network stack**, meaning that a
35
+container doesn't get privileged access to the sockets or interfaces
36
+of another container. Of course, if the host system is setup
37
+accordingly, containers can interact with each other through their
38
+respective network interfaces — just like they can interact with
39
+external hosts. When you specify public ports for your containers or use
40
+[*links*](../userguide/networking/default_network/dockerlinks.md)
41
+then IP traffic is allowed between containers. They can ping each other,
42
+send/receive UDP packets, and establish TCP connections, but that can be
43
+restricted if necessary. From a network architecture point of view, all
44
+containers on a given Docker host are sitting on bridge interfaces. This
45
+means that they are just like physical machines connected through a
46
+common Ethernet switch; no more, no less.
47
+
48
+How mature is the code providing kernel namespaces and private
49
+networking? Kernel namespaces were introduced [between kernel version
50
+2.6.15 and
51
+2.6.26](http://lxc.sourceforge.net/index.php/about/kernel-namespaces/).
52
+This means that since July 2008 (date of the 2.6.26 release, now 7 years
53
+ago), namespace code has been exercised and scrutinized on a large
54
+number of production systems. And there is more: the design and
55
+inspiration for the namespaces code are even older. Namespaces are
56
+actually an effort to reimplement the features of [OpenVZ](
57
+http://en.wikipedia.org/wiki/OpenVZ) in such a way that they could be
58
+merged within the mainstream kernel. And OpenVZ was initially released
59
+in 2005, so both the design and the implementation are pretty mature.
60
+
61
+## Control groups
62
+
63
+Control Groups are another key component of Linux Containers. They
64
+implement resource accounting and limiting. They provide many
65
+useful metrics, but they also help ensure that each container gets
66
+its fair share of memory, CPU, disk I/O; and, more importantly, that a
67
+single container cannot bring the system down by exhausting one of those
68
+resources.
69
+
70
+So while they do not play a role in preventing one container from
71
+accessing or affecting the data and processes of another container, they
72
+are essential to fend off some denial-of-service attacks. They are
73
+particularly important on multi-tenant platforms, like public and
74
+private PaaS, to guarantee a consistent uptime (and performance) even
75
+when some applications start to misbehave.
76
+
77
+Control Groups have been around for a while as well: the code was
78
+started in 2006, and initially merged in kernel 2.6.24.
79
+
80
+## Docker daemon attack surface
81
+
82
+Running containers (and applications) with Docker implies running the
83
+Docker daemon. This daemon currently requires `root` privileges, and you
84
+should therefore be aware of some important details.
85
+
86
+First of all, **only trusted users should be allowed to control your
87
+Docker daemon**. This is a direct consequence of some powerful Docker
88
+features. Specifically, Docker allows you to share a directory between
89
+the Docker host and a guest container; and it allows you to do so
90
+without limiting the access rights of the container. This means that you
91
+can start a container where the `/host` directory will be the `/` directory
92
+on your host; and the container will be able to alter your host filesystem
93
+without any restriction. This is similar to how virtualization systems
94
+allow filesystem resource sharing. Nothing prevents you from sharing your
95
+root filesystem (or even your root block device) with a virtual machine.
96
+
97
+This has a strong security implication: for example, if you instrument Docker
98
+from a web server to provision containers through an API, you should be
99
+even more careful than usual with parameter checking, to make sure that
100
+a malicious user cannot pass crafted parameters causing Docker to create
101
+arbitrary containers.
102
+
103
+For this reason, the REST API endpoint (used by the Docker CLI to
104
+communicate with the Docker daemon) changed in Docker 0.5.2, and now
105
+uses a UNIX socket instead of a TCP socket bound on 127.0.0.1 (the
106
+latter being prone to cross-site-scripting attacks if you happen to run
107
+Docker directly on your local machine, outside of a VM). You can then
108
+use traditional UNIX permission checks to limit access to the control
109
+socket.
110
+
111
+You can also expose the REST API over HTTP if you explicitly decide to do so.
112
+However, if you do that, being aware of the above mentioned security
113
+implication, you should ensure that it will be reachable only from a
114
+trusted network or VPN; or protected with e.g., `stunnel` and client SSL
115
+certificates. You can also secure them with [HTTPS and
116
+certificates](../articles/https/).
117
+
118
+The daemon is also potentially vulnerable to other inputs, such as image
119
+loading from either disk with 'docker load', or from the network with
120
+'docker pull'. This has been a focus of improvement in the community,
121
+especially for 'pull' security. While these overlap, it should be noted
122
+that 'docker load' is a mechanism for backup and restore and is not
123
+currently considered a secure mechanism for loading images. As of
124
+Docker 1.3.2, images are now extracted in a chrooted subprocess on
125
+Linux/Unix platforms, being the first-step in a wider effort toward
126
+privilege separation.
127
+
128
+Eventually, it is expected that the Docker daemon will run restricted
129
+privileges, delegating operations well-audited sub-processes,
130
+each with its own (very limited) scope of Linux capabilities,
131
+virtual network setup, filesystem management, etc. That is, most likely,
132
+pieces of the Docker engine itself will run inside of containers.
133
+
134
+Finally, if you run Docker on a server, it is recommended to run
135
+exclusively Docker in the server, and move all other services within
136
+containers controlled by Docker. Of course, it is fine to keep your
137
+favorite admin tools (probably at least an SSH server), as well as
138
+existing monitoring/supervision processes (e.g., NRPE, collectd, etc).
139
+
140
+## Linux kernel capabilities
141
+
142
+By default, Docker starts containers with a restricted set of
143
+capabilities. What does that mean?
144
+
145
+Capabilities turn the binary "root/non-root" dichotomy into a
146
+fine-grained access control system. Processes (like web servers) that
147
+just need to bind on a port below 1024 do not have to run as root: they
148
+can just be granted the `net_bind_service` capability instead. And there
149
+are many other capabilities, for almost all the specific areas where root
150
+privileges are usually needed.
151
+
152
+This means a lot for container security; let's see why!
153
+
154
+Your average server (bare metal or virtual machine) needs to run a bunch
155
+of processes as root. Those typically include SSH, cron, syslogd;
156
+hardware management tools (e.g., load modules), network configuration
157
+tools (e.g., to handle DHCP, WPA, or VPNs), and much more. A container is
158
+very different, because almost all of those tasks are handled by the
159
+infrastructure around the container:
160
+
161
+ - SSH access will typically be managed by a single server running on
162
+   the Docker host;
163
+ - `cron`, when necessary, should run as a user
164
+   process, dedicated and tailored for the app that needs its
165
+   scheduling service, rather than as a platform-wide facility;
166
+ - log management will also typically be handed to Docker, or by
167
+   third-party services like Loggly or Splunk;
168
+ - hardware management is irrelevant, meaning that you never need to
169
+   run `udevd` or equivalent daemons within
170
+   containers;
171
+ - network management happens outside of the containers, enforcing
172
+   separation of concerns as much as possible, meaning that a container
173
+   should never need to perform `ifconfig`,
174
+   `route`, or ip commands (except when a container
175
+   is specifically engineered to behave like a router or firewall, of
176
+   course).
177
+
178
+This means that in most cases, containers will not need "real" root
179
+privileges *at all*. And therefore, containers can run with a reduced
180
+capability set; meaning that "root" within a container has much less
181
+privileges than the real "root". For instance, it is possible to:
182
+
183
+ - deny all "mount" operations;
184
+ - deny access to raw sockets (to prevent packet spoofing);
185
+ - deny access to some filesystem operations, like creating new device
186
+   nodes, changing the owner of files, or altering attributes (including
187
+   the immutable flag);
188
+ - deny module loading;
189
+ - and many others.
190
+
191
+This means that even if an intruder manages to escalate to root within a
192
+container, it will be much harder to do serious damage, or to escalate
193
+to the host.
194
+
195
+This won't affect regular web apps; but malicious users will find that
196
+the arsenal at their disposal has shrunk considerably! By default Docker
197
+drops all capabilities except [those
198
+needed](https://github.com/docker/docker/blob/87de5fdd5972343a11847922e0f41d9898b5cff7/daemon/execdriver/native/template/default_template_linux.go#L16-L29),
199
+a whitelist instead of a blacklist approach. You can see a full list of
200
+available capabilities in [Linux
201
+manpages](http://man7.org/linux/man-pages/man7/capabilities.7.html).
202
+
203
+One primary risk with running Docker containers is that the default set
204
+of capabilities and mounts given to a container may provide incomplete
205
+isolation, either independently, or when used in combination with
206
+kernel vulnerabilities.
207
+
208
+Docker supports the addition and removal of capabilities, allowing use
209
+of a non-default profile. This may make Docker more secure through
210
+capability removal, or less secure through the addition of capabilities.
211
+The best practice for users would be to remove all capabilities except
212
+those explicitly required for their processes.
213
+
214
+## Other kernel security features
215
+
216
+Capabilities are just one of the many security features provided by
217
+modern Linux kernels. It is also possible to leverage existing,
218
+well-known systems like TOMOYO, AppArmor, SELinux, GRSEC, etc. with
219
+Docker.
220
+
221
+While Docker currently only enables capabilities, it doesn't interfere
222
+with the other systems. This means that there are many different ways to
223
+harden a Docker host. Here are a few examples.
224
+
225
+ - You can run a kernel with GRSEC and PAX. This will add many safety
226
+   checks, both at compile-time and run-time; it will also defeat many
227
+   exploits, thanks to techniques like address randomization. It doesn't
228
+   require Docker-specific configuration, since those security features
229
+   apply system-wide, independent of containers.
230
+ - If your distribution comes with security model templates for
231
+   Docker containers, you can use them out of the box. For instance, we
232
+   ship a template that works with AppArmor and Red Hat comes with SELinux
233
+   policies for Docker. These templates provide an extra safety net (even
234
+   though it overlaps greatly with capabilities).
235
+ - You can define your own policies using your favorite access control
236
+   mechanism.
237
+
238
+Just like there are many third-party tools to augment Docker containers
239
+with e.g., special network topologies or shared filesystems, you can
240
+expect to see tools to harden existing Docker containers without
241
+affecting Docker's core.
242
+
243
+Recent improvements in Linux namespaces will soon allow to run
244
+full-featured containers without root privileges, thanks to the new user
245
+namespace. This is covered in detail [here](
246
+http://s3hh.wordpress.com/2013/07/19/creating-and-using-containers-without-privilege/).
247
+Moreover, this will solve the problem caused by sharing filesystems
248
+between host and guest, since the user namespace allows users within
249
+containers (including the root user) to be mapped to other users in the
250
+host system.
251
+
252
+Today, Docker does not directly support user namespaces, but they
253
+may still be utilized by Docker containers on supported kernels,
254
+by directly using the clone syscall, or utilizing the 'unshare'
255
+utility. Using this, some users may find it possible to drop
256
+more capabilities from their process as user namespaces provide
257
+an artificial capabilities set. Likewise, however, this artificial
258
+capabilities set may require use of 'capsh' to restrict the
259
+user-namespace capabilities set when using 'unshare'.
260
+
261
+Eventually, it is expected that Docker will have direct, native support
262
+for user-namespaces, simplifying the process of hardening containers.
263
+
264
+## Conclusions
265
+
266
+Docker containers are, by default, quite secure; especially if you take
267
+care of running your processes inside the containers as non-privileged
268
+users (i.e., non-`root`).
269
+
270
+You can add an extra layer of safety by enabling AppArmor, SELinux,
271
+GRSEC, or your favorite hardening solution.
272
+
273
+Last but not least, if you see interesting security features in other
274
+containerization systems, these are simply kernels features that may
275
+be implemented in Docker as well. We welcome users to submit issues,
276
+pull requests, and communicate via the mailing list.
277
+
278
+## Related Information
279
+
280
+* [Use trusted images](../security/trust/index.md)
281
+* [Seccomp security profiles for Docker](../security/seccomp.md)
282
+* [AppArmor security profiles for Docker](../security/apparmor.md)
283
+* [On the Security of Containers (2014)](https://medium.com/@ewindisch/on-the-security-of-containers-2c60ffe25a9e)