Browse code

Add daemon documentation on user namespaces feature

Remove the experimental docs for user namespaces and add similar content
to the `docker daemon` command documentation.

Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)

Phil Estes authored on 2016/01/09 00:07:46
Showing 4 changed files
... ...
@@ -62,6 +62,7 @@ weight = -1
62 62
       --tlscert="~/.docker/cert.pem"         Path to TLS certificate file
63 63
       --tlskey="~/.docker/key.pem"           Path to TLS key file
64 64
       --tlsverify                            Use TLS and verify the remote
65
+      --userns-remap="default"               Enable user namespace remapping
65 66
       --userland-proxy=true                  Use userland proxy for loopback traffic
66 67
 
67 68
 Options with [] may be specified multiple times.
... ...
@@ -628,6 +629,133 @@ For information about how to create an authorization plugin, see [authorization
628 628
 plugin](../../extend/authorization.md) section in the Docker extend section of this documentation.
629 629
 
630 630
 
631
+## Daemon user namespace options
632
+
633
+The Linux kernel [user namespace support](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) provides additional security by enabling
634
+a process, and therefore a container, to have a unique range of user and
635
+group IDs which are outside the traditional user and group range utilized by
636
+the host system. Potentially the most important security improvement is that,
637
+by default, container processes running as the `root` user will have expected
638
+administrative privilege (with some restrictions) inside the container but will
639
+effectively be mapped to an unprivileged `uid` on the host.
640
+
641
+When user namespace support is enabled, Docker creates a single daemon-wide mapping
642
+for all containers running on the same engine instance. The mappings will
643
+utilize the existing subordinate user and group ID feature available on all modern
644
+Linux distributions.
645
+The [`/etc/subuid`](http://man7.org/linux/man-pages/man5/subuid.5.html) and
646
+[`/etc/subgid`](http://man7.org/linux/man-pages/man5/subgid.5.html) files will be
647
+read for the user, and optional group, specified to the `--userns-remap`
648
+parameter.  If you do not wish to specify your own user and/or group, you can
649
+provide `default` as the value to this flag, and a user will be created on your behalf
650
+and provided subordinate uid and gid ranges. This default user will be named
651
+`dockremap`, and entries will be created for it in `/etc/passwd` and
652
+`/etc/group` using your distro's standard user and group creation tools.
653
+
654
+> **Note**: The single mapping per-daemon restriction is in place for now
655
+> because Docker shares image layers from its local cache across all
656
+> containers running on the engine instance.  Since file ownership must be
657
+> the same for all containers sharing the same layer content, the decision
658
+> was made to map the file ownership on `docker pull` to the daemon's user and
659
+> group mappings so that there is no delay for running containers once the
660
+> content is downloaded. This design preserves the same performance for `docker
661
+> pull`, `docker push`, and container startup as users expect with
662
+> user namespaces disabled.
663
+
664
+### Starting the daemon with user namespaces enabled
665
+
666
+To enable user namespace support, start the daemon with the
667
+`--userns-remap` flag, which accepts values in the following formats:
668
+
669
+ - uid
670
+ - uid:gid
671
+ - username
672
+ - username:groupname
673
+
674
+If numeric IDs are provided, translation back to valid user or group names
675
+will occur so that the subordinate uid and gid information can be read, given
676
+these resources are name-based, not id-based.  If the numeric ID information
677
+provided does not exist as entries in `/etc/passwd` or `/etc/group`, daemon
678
+startup will fail with an error message.
679
+
680
+*Example: starting with default Docker user management:*
681
+
682
+```
683
+     $ docker daemon --userns-remap=default
684
+```    
685
+When `default` is provided, Docker will create - or find the existing - user and group
686
+named `dockremap`. If the user is created, and the Linux distribution has
687
+appropriate support, the `/etc/subuid` and `/etc/subgid` files will be populated
688
+with a contiguous 65536 length range of subordinate user and group IDs, starting
689
+at an offset based on prior entries in those files.  For example, Ubuntu will
690
+create the following range, based on an existing user named `user1` already owning
691
+the first 65536 range:
692
+
693
+```
694
+     $ cat /etc/subuid
695
+     user1:100000:65536
696
+     dockremap:165536:65536
697
+```
698
+
699
+> **Note:** On a fresh Fedora install, we had to `touch` the
700
+> `/etc/subuid` and `/etc/subgid` files to have ranges assigned when users
701
+> were created.  Once these files existed, range assignment on user creation
702
+> worked properly.
703
+
704
+If you have a preferred/self-managed user with subordinate ID mappings already
705
+configured, you can provide that username or uid to the `--userns-remap` flag.
706
+If you have a group that doesn't match the username, you may provide the `gid`
707
+or group name as well; otherwise the username will be used as the group name
708
+when querying the system for the subordinate group ID range.
709
+
710
+### Detailed information on `subuid`/`subgid` ranges
711
+
712
+Given potential advanced use of the subordinate ID ranges by power users, the 
713
+following paragraphs define how the Docker daemon currently uses the range entries
714
+found within the subordinate range files.
715
+
716
+The simplest case is that only one contiguous range is defined for the
717
+provided user or group. In this case, Docker will use that entire contiguous
718
+range for the mapping of host uids and gids to the container process.  This
719
+means that the first ID in the range will be the remapped root user, and the
720
+IDs above that initial ID will map host ID 1 through the end of the range.
721
+
722
+From the example `/etc/subid` content shown above, the remapped root
723
+user would be uid 165536.
724
+
725
+If the system administrator has set up multiple ranges for a single user or
726
+group, the Docker daemon will read all the available ranges and use the
727
+following algorithm to create the mapping ranges:
728
+
729
+1. The range segments found for the particular user will be sorted by *start ID* ascending.
730
+2. Map segments will be created from each range in increasing value with a length matching the length of each segment. Therefore the range segment with the lowest numeric starting value will be equal to the remapped root, and continue up through host uid/gid equal to the range segment length. As an example, if the lowest segment starts at ID 1000 and has a length of 100, then a map of 1000 -> 0 (the remapped root) up through 1100 -> 100 will be created from this segment. If the next segment starts at ID 10000, then the next map will start with mapping 10000 -> 101 up to the length of this second segment. This will continue until no more segments are found in the subordinate files for this user.
731
+3. If more than five range segments exist for a single user, only the first five will be utilized, matching the kernel's limitation of only five entries in `/proc/self/uid_map` and `proc/self/gid_map`.
732
+
733
+### User namespace known restrictions
734
+
735
+The following standard Docker features are currently incompatible when
736
+running a Docker daemon with user namespaces enabled:
737
+
738
+ - sharing PID or NET namespaces with the host (`--pid=host` or `--net=host`)
739
+ - sharing a network namespace with an existing container (`--net=container:*other*`)
740
+ - sharing an IPC namespace with an existing container (`--ipc=container:*other*`)
741
+ - A `--readonly` container filesystem (this is a Linux kernel restriction against remounting with modified flags of a currently mounted filesystem when inside a user namespace)
742
+ - external (volume or graph) drivers which are unaware/incapable of using daemon user mappings
743
+ - Using `--privileged` mode flag on `docker run`
744
+
745
+In general, user namespaces are an advanced feature and will require
746
+coordination with other capabilities. For example, if volumes are mounted from
747
+the host, file ownership will have to be pre-arranged if the user or
748
+administrator wishes the containers to have expected access to the volume
749
+contents.
750
+
751
+Finally, while the `root` user inside a user namespaced container process has
752
+many of the expected admin privileges that go along with being the superuser, the
753
+Linux kernel has restrictions based on internal knowledge that this is a user namespaced
754
+process. The most notable restriction that we are aware of at this time is the
755
+inability to use `mknod`. Permission will be denied for device creation even as
756
+container `root` inside a user namespace.
757
+
631 758
 ## Miscellaneous options
632 759
 
633 760
 IP masquerading uses address translation to allow containers without a public
... ...
@@ -72,7 +72,7 @@ to build a Docker binary with the experimental features enabled:
72 72
 ## Current experimental features
73 73
 
74 74
  * [External graphdriver plugins](plugins_graphdriver.md)
75
- * [User namespaces](userns.md)
75
+ * The user namespaces feature has graduated from experimental.
76 76
 
77 77
 ## How to comment on an experimental feature
78 78
 
79 79
deleted file mode 100644
... ...
@@ -1,119 +0,0 @@
1
-# Experimental: User namespace support
2
-
3
-Linux kernel [user namespace support](http://man7.org/linux/man-pages/man7/user_namespaces.7.html) provides additional security by enabling
4
-a process--and therefore a container--to have a unique range of user and
5
-group IDs which are outside the traditional user and group range utilized by
6
-the host system. Potentially the most important security improvement is that,
7
-by default, container processes running as the `root` user will have expected
8
-administrative privilege (with some restrictions) inside the container but will
9
-effectively be mapped to an unprivileged `uid` on the host.
10
-
11
-In this experimental phase, the Docker daemon creates a single daemon-wide mapping
12
-for all containers running on the same engine instance. The mappings will
13
-utilize the existing subordinate user and group ID feature available on all modern
14
-Linux distributions.
15
-The [`/etc/subuid`](http://man7.org/linux/man-pages/man5/subuid.5.html) and
16
-[`/etc/subgid`](http://man7.org/linux/man-pages/man5/subgid.5.html) files will be
17
-read for the user, and optional group, specified to the `--userns-remap`
18
-parameter.  If you do not wish to specify your own user and/or group, you can
19
-provide `default` as the value to this flag, and a user will be created on your behalf
20
-and provided subordinate uid and gid ranges. This default user will be named
21
-`dockremap`, and entries will be created for it in `/etc/passwd` and
22
-`/etc/group` using your distro's standard user and group creation tools.
23
-
24
-> **Note**: The single mapping per-daemon restriction exists for this experimental
25
-> phase because Docker shares image layers from its local cache across all
26
-> containers running on the engine instance.  Since file ownership must be
27
-> the same for all containers sharing the same layer content, the decision
28
-> was made to map the file ownership on `docker pull` to the daemon's user and
29
-> group mappings so that there is no delay for running containers once the
30
-> content is downloaded--exactly the same performance characteristics as with
31
-> user namespaces disabled.
32
-
33
-## Starting the daemon with user namespaces enabled
34
-To enable this experimental user namespace support for a Docker daemon instance,
35
-start the daemon with the aforementioned `--userns-remap` flag, which accepts
36
-values in the following formats:
37
-
38
- - uid
39
- - uid:gid
40
- - username
41
- - username:groupname
42
-
43
-If numeric IDs are provided, translation back to valid user or group names
44
-will occur so that the subordinate uid and gid information can be read, given
45
-these resources are name-based, not id-based.  If the numeric ID information
46
-provided does not exist as entries in `/etc/passwd` or `/etc/group`, daemon
47
-startup will fail with an error message.
48
-
49
-*An example: starting with default Docker user management:*
50
-
51
-```
52
-     $ docker daemon --userns-remap=default
53
-```    
54
-In this case, Docker will create--or find the existing--user and group
55
-named `dockremap`. If the user is created, and the Linux distribution has
56
-appropriate support, the `/etc/subuid` and `/etc/subgid` files will be populated
57
-with a contiguous 65536 length range of subordinate user and group IDs, starting
58
-at an offset based on prior entries in those files.  For example, Ubuntu will
59
-create the following range, based on an existing user already having the first
60
-65536 range:
61
-
62
-```
63
-     $ cat /etc/subuid
64
-     user1:100000:65536
65
-     dockremap:165536:65536
66
-```
67
-
68
-> **Note:** On a fresh Fedora install, we found that we had to `touch` the
69
-> `/etc/subuid` and `/etc/subgid` files to have ranges assigned when users
70
-> were created.  Once these files existed, range assignment on user creation
71
-> worked properly.
72
-
73
-If you have a preferred/self-managed user with subordinate ID mappings already
74
-configured, you can provide that username or uid to the `--userns-remap` flag.
75
-If you have a group that doesn't match the username, you may provide the `gid`
76
-or group name as well; otherwise the username will be used as the group name
77
-when querying the system for the subordinate group ID range.
78
-
79
-## Detailed information on `subuid`/`subgid` ranges
80
-
81
-Given there may be advanced use of the subordinate ID ranges by power users, we will
82
-describe how the Docker daemon uses the range entries within these files under the
83
-current experimental user namespace support.
84
-
85
-The simplest case exists where only one contiguous range is defined for the
86
-provided user or group. In this case, Docker will use that entire contiguous
87
-range for the mapping of host uids and gids to the container process.  This
88
-means that the first ID in the range will be the remapped root user, and the
89
-IDs above that initial ID will map host ID 1 through the end of the range.
90
-
91
-From the example `/etc/subid` content shown above, that means the remapped root
92
-user would be uid 165536.
93
-
94
-If the system administrator has set up multiple ranges for a single user or
95
-group, the Docker daemon will read all the available ranges and use the
96
-following algorithm to create the mapping ranges:
97
-
98
-1. The ranges will be sorted by *start ID* ascending
99
-2. Maps will be created from each range with where the host ID will increment starting at 0 for the first range, 0+*range1* length for the second, and so on.  This means that the lowest range start ID will be the remapped root, and all further ranges will map IDs from 1 through the uid or gid that equals the sum of all range lengths.
100
-3. Ranges segments above five will be ignored as the kernel ignores any ID maps after five (in `/proc/self/{u,g}id_map`)
101
-
102
-## User namespace known restrictions
103
-
104
-The following standard Docker features are currently incompatible when
105
-running a Docker daemon with experimental user namespaces enabled:
106
-
107
- - sharing namespaces with the host (--pid=host, --net=host, etc.)
108
- - sharing namespaces with other containers (--net=container:*other*)
109
- - A `--readonly` container filesystem (a Linux kernel restriction on remount with new flags of a currently mounted filesystem when inside a user namespace)
110
- - external (volume/graph) drivers which are unaware/incapable of using daemon user mappings
111
- - Using `--privileged` mode containers
112
- - volume use without pre-arranging proper file ownership in mounted volumes
113
-
114
-Additionally, while the `root` user inside a user namespaced container
115
-process has many of the privileges of the administrative root user, the
116
-following operations will fail:
117
-
118
- - Use of `mknod` - permission is denied for device creation by the container root
119
- - others will be listed here when fully tested
... ...
@@ -53,6 +53,7 @@ docker-daemon - Enable daemon mode
53 53
 [**--tlskey**[=*~/.docker/key.pem*]]
54 54
 [**--tlsverify**]
55 55
 [**--userland-proxy**[=*true*]]
56
+[**--userns-remap**[=*default*]]
56 57
 
57 58
 # DESCRIPTION
58 59
 **docker** has two distinct functions. It is used for starting the Docker
... ...
@@ -223,6 +224,9 @@ unix://[/path/to/socket] to use.
223 223
 **--userland-proxy**=*true*|*false*
224 224
     Rely on a userland proxy implementation for inter-container and outside-to-container loopback communications. Default is true.
225 225
 
226
+**--userns-remap**=*default*|*uid:gid*|*user:group*|*user*|*uid*
227
+    Enable user namespaces for containers on the daemon. Specifying "default" will cause a new user and group to be created to handle UID and GID range remapping for the user namespace mappings used for contained processes. Specifying a user (or uid) and optionally a group (or gid) will cause the daemon to lookup the user and group's subordinate ID ranges for use as the user namespace mappings for contained processes.
228
+
226 229
 # STORAGE DRIVER OPTIONS
227 230
 
228 231
 Docker uses storage backends (known as "graphdrivers" in the Docker