Browse code

Merge pull request #3758 from metalivedev/2720-running

Fixes 2720

Andy Rothfusz authored on 2014/01/31 09:09:19
Showing 6 changed files
... ...
@@ -12,3 +12,4 @@ Articles
12 12
 
13 13
    security
14 14
    baseimages
15
+   runmetrics
15 16
new file mode 100644
... ...
@@ -0,0 +1,463 @@
0
+:title: Runtime Metrics
1
+:description: Measure the behavior of running containers
2
+:keywords: docker, metrics, CPU, memory, disk, IO, run, runtime
3
+
4
+.. _run_metrics:
5
+
6
+
7
+Runtime Metrics
8
+===============
9
+
10
+Linux Containers rely on `control groups
11
+<https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt>`_ which
12
+not only track groups of processes, but also expose metrics about CPU,
13
+memory, and block I/O usage. You can access those metrics and obtain
14
+network usage metrics as well. This is relevant for "pure" LXC
15
+containers, as well as for Docker containers.
16
+
17
+Control Groups
18
+--------------
19
+
20
+Control groups are exposed through a pseudo-filesystem. In recent
21
+distros, you should find this filesystem under
22
+``/sys/fs/cgroup``. Under that directory, you will see multiple
23
+sub-directories, called devices, freezer, blkio, etc.; each
24
+sub-directory actually corresponds to a different cgroup hierarchy.
25
+
26
+On older systems, the control groups might be mounted on ``/cgroup``,
27
+without distinct hierarchies. In that case, instead of seeing the
28
+sub-directories, you will see a bunch of files in that directory, and
29
+possibly some directories corresponding to existing containers.
30
+
31
+To figure out where your control groups are mounted, you can run:
32
+
33
+::
34
+
35
+  grep cgroup /proc/mounts
36
+
37
+.. _run_findpid:
38
+
39
+Enumerating Cgroups
40
+-------------------
41
+
42
+You can look into ``/proc/cgroups`` to see the different control group
43
+subsystems known to the system, the hierarchy they belong to, and how
44
+many groups they contain.
45
+
46
+You can also look at ``/proc/<pid>/cgroup`` to see which control
47
+groups a process belongs to. The control group will be shown as a path
48
+relative to the root of the hierarchy mountpoint; e.g. ``/`` means
49
+“this process has not been assigned into a particular group”, while
50
+``/lxc/pumpkin`` means that the process is likely to be a member of a
51
+container named ``pumpkin``.
52
+
53
+Finding the Cgroup for a Given Container
54
+----------------------------------------
55
+
56
+For each container, one cgroup will be created in each hierarchy. On
57
+older systems with older versions of the LXC userland tools, the name
58
+of the cgroup will be the name of the container. With more recent
59
+versions of the LXC tools, the cgroup will be ``lxc/<container_name>.``
60
+
61
+For Docker containers using cgroups, the container name will be the
62
+full ID or long ID of the container. If a container shows up as
63
+ae836c95b4c3 in ``docker ps``, its long ID might be something like
64
+``ae836c95b4c3c9e9179e0e91015512da89fdec91612f63cebae57df9a5444c79``. You
65
+can look it up with ``docker inspect`` or ``docker ps -notrunc``.
66
+
67
+Putting everything together to look at the memory metrics for a Docker
68
+container, take a look at ``/sys/fs/cgroup/memory/lxc/<longid>/``.
69
+
70
+Metrics from Cgroups: Memory, CPU, Block IO
71
+-------------------------------------------
72
+
73
+For each subsystem (memory, CPU, and block I/O), you will find one or
74
+more pseudo-files containing statistics.
75
+
76
+Memory Metrics: ``memory.stat``
77
+...............................
78
+
79
+Memory metrics are found in the "memory" cgroup. Note that the memory
80
+control group adds a little overhead, because it does very
81
+fine-grained accounting of the memory usage on your host. Therefore,
82
+many distros chose to not enable it by default. Generally, to enable
83
+it, all you have to do is to add some kernel command-line parameters:
84
+``cgroup_enable=memory swapaccount=1``.
85
+
86
+The metrics are in the pseudo-file ``memory.stat``. Here is what it
87
+will look like:
88
+
89
+::
90
+
91
+  cache 11492564992
92
+  rss 1930993664
93
+  mapped_file 306728960
94
+  pgpgin 406632648
95
+  pgpgout 403355412
96
+  swap 0
97
+  pgfault 728281223
98
+  pgmajfault 1724
99
+  inactive_anon 46608384
100
+  active_anon 1884520448
101
+  inactive_file 7003344896
102
+  active_file 4489052160
103
+  unevictable 32768
104
+  hierarchical_memory_limit 9223372036854775807
105
+  hierarchical_memsw_limit 9223372036854775807
106
+  total_cache 11492564992
107
+  total_rss 1930993664
108
+  total_mapped_file 306728960
109
+  total_pgpgin 406632648
110
+  total_pgpgout 403355412
111
+  total_swap 0
112
+  total_pgfault 728281223
113
+  total_pgmajfault 1724
114
+  total_inactive_anon 46608384
115
+  total_active_anon 1884520448
116
+  total_inactive_file 7003344896
117
+  total_active_file 4489052160
118
+  total_unevictable 32768
119
+
120
+The first half (without the ``total_`` prefix) contains statistics
121
+relevant to the processes within the cgroup, excluding
122
+sub-cgroups. The second half (with the ``total_`` prefix) includes
123
+sub-cgroups as well.
124
+
125
+Some metrics are "gauges", i.e. values that can increase or decrease
126
+(e.g. swap, the amount of swap space used by the members of the
127
+cgroup). Some others are "counters", i.e. values that can only go up,
128
+because they represent occurrences of a specific event (e.g. pgfault,
129
+which indicates the number of page faults which happened since the
130
+creation of the cgroup; this number can never decrease).
131
+
132
+cache 
133
+  the amount of memory used by the processes of this control group
134
+  that can be associated precisely with a block on a block
135
+  device. When you read from and write to files on disk, this amount
136
+  will increase. This will be the case if you use "conventional" I/O
137
+  (``open``, ``read``, ``write`` syscalls) as well as mapped files
138
+  (with ``mmap``). It also accounts for the memory used by ``tmpfs``
139
+  mounts, though the reasons are unclear.
140
+
141
+rss 
142
+  the amount of memory that *doesn't* correspond to anything on
143
+  disk: stacks, heaps, and anonymous memory maps.
144
+
145
+mapped_file 
146
+  indicates the amount of memory mapped by the processes in the
147
+  control group. It doesn't give you information about *how much*
148
+  memory is used; it rather tells you *how* it is used.
149
+
150
+pgfault and pgmajfault 
151
+  indicate the number of times that a process of the cgroup triggered
152
+  a "page fault" and a "major fault", respectively. A page fault
153
+  happens when a process accesses a part of its virtual memory space
154
+  which is nonexistent or protected. The former can happen if the
155
+  process is buggy and tries to access an invalid address (it will
156
+  then be sent a ``SIGSEGV`` signal, typically killing it with the
157
+  famous ``Segmentation fault`` message). The latter can happen when
158
+  the process reads from a memory zone which has been swapped out, or
159
+  which corresponds to a mapped file: in that case, the kernel will
160
+  load the page from disk, and let the CPU complete the memory
161
+  access. It can also happen when the process writes to a
162
+  copy-on-write memory zone: likewise, the kernel will preempt the
163
+  process, duplicate the memory page, and resume the write operation
164
+  on the process' own copy of the page. "Major" faults happen when the
165
+  kernel actually has to read the data from disk. When it just has to
166
+  duplicate an existing page, or allocate an empty page, it's a
167
+  regular (or "minor") fault.
168
+
169
+swap 
170
+  the amount of swap currently used by the processes in this cgroup.
171
+
172
+active_anon and inactive_anon
173
+  the amount of *anonymous* memory that has been identified has
174
+  respectively *active* and *inactive* by the kernel. "Anonymous"
175
+  memory is the memory that is *not* linked to disk pages. In other
176
+  words, that's the equivalent of the rss counter described above. In
177
+  fact, the very definition of the rss counter is **active_anon** +
178
+  **inactive_anon** - **tmpfs** (where tmpfs is the amount of memory
179
+  used up by ``tmpfs`` filesystems mounted by this control
180
+  group). Now, what's the difference between "active" and "inactive"?
181
+  Pages are initially "active"; and at regular intervals, the kernel
182
+  sweeps over the memory, and tags some pages as "inactive". Whenever
183
+  they are accessed again, they are immediately retagged
184
+  "active". When the kernel is almost out of memory, and time comes to
185
+  swap out to disk, the kernel will swap "inactive" pages.
186
+
187
+active_file and inactive_file
188
+  cache memory, with *active* and *inactive* similar to the *anon*
189
+  memory above. The exact formula is cache = **active_file** +
190
+  **inactive_file** + **tmpfs**. The exact rules used by the kernel to
191
+  move memory pages between active and inactive sets are different
192
+  from the ones used for anonymous memory, but the general principle
193
+  is the same. Note that when the kernel needs to reclaim memory, it
194
+  is cheaper to reclaim a clean (=non modified) page from this pool,
195
+  since it can be reclaimed immediately (while anonymous pages and
196
+  dirty/modified pages have to be written to disk first).
197
+
198
+unevictable
199
+  the amount of memory that cannot be reclaimed; generally, it will
200
+  account for memory that has been "locked" with ``mlock``. It is
201
+  often used by crypto frameworks to make sure that secret keys and
202
+  other sensitive material never gets swapped out to disk.
203
+
204
+memory and memsw limits
205
+  These are not really metrics, but a reminder of the limits applied
206
+  to this cgroup. The first one indicates the maximum amount of
207
+  physical memory that can be used by the processes of this control
208
+  group; the second one indicates the maximum amount of RAM+swap.
209
+
210
+Accounting for memory in the page cache is very complex. If two
211
+processes in different control groups both read the same file
212
+(ultimately relying on the same blocks on disk), the corresponding
213
+memory charge will be split between the control groups. It's nice, but
214
+it also means that when a cgroup is terminated, it could increase the
215
+memory usage of another cgroup, because they are not splitting the
216
+cost anymore for those memory pages.
217
+
218
+CPU metrics: ``cpuacct.stat``
219
+.............................
220
+
221
+Now that we've covered memory metrics, everything else will look very
222
+simple in comparison. CPU metrics will be found in the ``cpuacct``
223
+controller.
224
+
225
+For each container, you will find a pseudo-file ``cpuacct.stat``,
226
+containing the CPU usage accumulated by the processes of the
227
+container, broken down between ``user`` and ``system`` time. If you're
228
+not familiar with the distinction, ``user`` is the time during which
229
+the processes were in direct control of the CPU (i.e. executing
230
+process code), and ``system`` is the time during which the CPU was
231
+executing system calls on behalf of those processes.
232
+
233
+Those times are expressed in ticks of 1/100th of a second. Actually,
234
+they are expressed in "user jiffies". There are ``USER_HZ``
235
+*"jiffies"* per second, and on x86 systems, ``USER_HZ`` is 100. This
236
+used to map exactly to the number of scheduler "ticks" per second; but
237
+with the advent of higher frequency scheduling, as well as `tickless
238
+kernels <http://lwn.net/Articles/549580/>`_, the number of kernel
239
+ticks wasn't relevant anymore. It stuck around anyway, mainly for
240
+legacy and compatibility reasons.
241
+
242
+Block I/O metrics
243
+.................
244
+
245
+Block I/O is accounted in the ``blkio`` controller. Different metrics
246
+are scattered across different files. While you can find in-depth
247
+details in the `blkio-controller
248
+<https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt>`_
249
+file in the kernel documentation, here is a short list of the most
250
+relevant ones:
251
+
252
+blkio.sectors 
253
+  contain the number of 512-bytes sectors read and written by the
254
+  processes member of the cgroup, device by device. Reads and writes
255
+  are merged in a single counter.
256
+
257
+blkio.io_service_bytes 
258
+  indicates the number of bytes read and written by the cgroup. It has
259
+  4 counters per device, because for each device, it differentiates
260
+  between synchronous vs. asynchronous I/O, and reads vs. writes.
261
+
262
+blkio.io_serviced
263
+  the number of I/O operations performed, regardless of their size. It
264
+  also has 4 counters per device.
265
+
266
+blkio.io_queued 
267
+  indicates the number of I/O operations currently queued for this
268
+  cgroup. In other words, if the cgroup isn't doing any I/O, this will
269
+  be zero. Note that the opposite is not true. In other words, if
270
+  there is no I/O queued, it does not mean that the cgroup is idle
271
+  (I/O-wise). It could be doing purely synchronous reads on an
272
+  otherwise quiescent device, which is therefore able to handle them
273
+  immediately, without queuing. Also, while it is helpful to figure
274
+  out which cgroup is putting stress on the I/O subsystem, keep in
275
+  mind that is is a relative quantity. Even if a process group does
276
+  not perform more I/O, its queue size can increase just because the
277
+  device load increases because of other devices.
278
+
279
+Network Metrics
280
+---------------
281
+
282
+Network metrics are not exposed directly by control groups. There is a
283
+good explanation for that: network interfaces exist within the context
284
+of *network namespaces*. The kernel could probably accumulate metrics
285
+about packets and bytes sent and received by a group of processes, but
286
+those metrics wouldn't be very useful. You want per-interface metrics
287
+(because traffic happening on the local ``lo`` interface doesn't
288
+really count). But since processes in a single cgroup can belong to
289
+multiple network namespaces, those metrics would be harder to
290
+interpret: multiple network namespaces means multiple ``lo``
291
+interfaces, potentially multiple ``eth0`` interfaces, etc.; so this is
292
+why there is no easy way to gather network metrics with control
293
+groups.
294
+
295
+Instead we can gather network metrics from other sources:
296
+
297
+IPtables
298
+........
299
+
300
+IPtables (or rather, the netfilter framework for which iptables is
301
+just an interface) can do some serious accounting.
302
+
303
+For instance, you can setup a rule to account for the outbound HTTP
304
+traffic on a web server:
305
+
306
+::
307
+
308
+  iptables -I OUTPUT -p tcp --sport 80
309
+
310
+
311
+There is no ``-j`` or ``-g`` flag, so the rule will just count matched
312
+packets and go to the following rule.
313
+
314
+Later, you can check the values of the counters, with:
315
+
316
+::
317
+
318
+   iptables -nxvL OUTPUT
319
+
320
+Technically, ``-n`` is not required, but it will prevent iptables from
321
+doing DNS reverse lookups, which are probably useless in this
322
+scenario.
323
+
324
+Counters include packets and bytes. If you want to setup metrics for
325
+container traffic like this, you could execute a ``for`` loop to add
326
+two ``iptables`` rules per container IP address (one in each
327
+direction), in the ``FORWARD`` chain. This will only meter traffic
328
+going through the NAT layer; you will also have to add traffic going
329
+through the userland proxy.
330
+
331
+Then, you will need to check those counters on a regular basis. If you
332
+happen to use ``collectd``, there is a nice plugin to automate
333
+iptables counters collection.
334
+
335
+Interface-level counters
336
+........................
337
+
338
+Since each container has a virtual Ethernet interface, you might want
339
+to check directly the TX and RX counters of this interface. You will
340
+notice that each container is associated to a virtual Ethernet
341
+interface in your host, with a name like ``vethKk8Zqi``. Figuring out
342
+which interface corresponds to which container is, unfortunately,
343
+difficult.
344
+
345
+But for now, the best way is to check the metrics *from within the
346
+containers*. To accomplish this, you can run an executable from the
347
+host environment within the network namespace of a container using
348
+**ip-netns magic**.
349
+
350
+The ``ip-netns exec`` command will let you execute any program
351
+(present in the host system) within any network namespace visible to
352
+the current process. This means that your host will be able to enter
353
+the network namespace of your containers, but your containers won't be
354
+able to access the host, nor their sibling containers. Containers will
355
+be able to “see” and affect their sub-containers, though.
356
+
357
+The exact format of the command is::
358
+
359
+  ip netns exec <nsname> <command...>
360
+
361
+For example::
362
+
363
+  ip netns exec mycontainer netstat -i
364
+
365
+``ip netns`` finds the "mycontainer" container by using namespaces
366
+pseudo-files. Each process belongs to one network namespace, one PID
367
+namespace, one ``mnt`` namespace, etc., and those namespaces are
368
+materialized under ``/proc/<pid>/ns/``. For example, the network
369
+namespace of PID 42 is materialized by the pseudo-file
370
+``/proc/42/ns/net``.
371
+
372
+When you run ``ip netns exec mycontainer ...``, it expects
373
+``/var/run/netns/mycontainer`` to be one of those
374
+pseudo-files. (Symlinks are accepted.)
375
+
376
+In other words, to execute a command within the network namespace of a
377
+container, we need to:
378
+
379
+* Find out the PID of any process within the container that we want to
380
+  investigate;
381
+* Create a symlink from ``/var/run/netns/<somename>`` to
382
+  ``/proc/<thepid>/ns/net``
383
+* Execute ``ip netns exec <somename> ....``
384
+
385
+Please review :ref:`run_findpid` to learn how to find the cgroup of a
386
+pprocess running in the container of which you want to measure network
387
+usage. From there, you can examine the pseudo-file named ``tasks``,
388
+which containes the PIDs that are in the control group (i.e. in the
389
+container). Pick any one of them.
390
+
391
+Putting everything together, if the "short ID" of a container is held
392
+in the environment variable ``$CID``, then you can do this::
393
+
394
+  TASKS=/sys/fs/cgroup/devices/$CID*/tasks
395
+  PID=$(head -n 1 $TASKS)
396
+  mkdir -p /var/run/netns
397
+  ln -sf /proc/$PID/ns/net /var/run/netns/$CID
398
+  ip netns exec $CID netstat -i
399
+
400
+
401
+Tips for high-performance metric collection
402
+-------------------------------------------
403
+
404
+Note that running a new process each time you want to update metrics
405
+is (relatively) expensive. If you want to collect metrics at high
406
+resolutions, and/or over a large number of containers (think 1000
407
+containers on a single host), you do not want to fork a new process
408
+each time.
409
+
410
+Here is how to collect metrics from a single process. You will have to
411
+write your metric collector in C (or any language that lets you do
412
+low-level system calls). You need to use a special system call,
413
+``setns()``, which lets the current process enter any arbitrary
414
+namespace. It requires, however, an open file descriptor to the
415
+namespace pseudo-file (remember: that’s the pseudo-file in
416
+``/proc/<pid>/ns/net``).
417
+
418
+However, there is a catch: you must not keep this file descriptor
419
+open. If you do, when the last process of the control group exits, the
420
+namespace will not be destroyed, and its network resources (like the
421
+virtual interface of the container) will stay around for ever (or
422
+until you close that file descriptor).
423
+
424
+The right approach would be to keep track of the first PID of each
425
+container, and re-open the namespace pseudo-file each time.
426
+
427
+Collecting metrics when a container exits 
428
+-----------------------------------------
429
+
430
+Sometimes, you do not care about real time metric collection, but when
431
+a container exits, you want to know how much CPU, memory, etc. it has
432
+used.
433
+
434
+Docker makes this difficult because it relies on ``lxc-start``, which
435
+carefully cleans up after itself, but it is still possible. It is
436
+usually easier to collect metrics at regular intervals (e.g. every
437
+minute, with the collectd LXC plugin) and rely on that instead.
438
+
439
+But, if you'd still like to gather the stats when a container stops,
440
+here is how:
441
+
442
+For each container, start a collection process, and move it to the
443
+control groups that you want to monitor by writing its PID to the
444
+tasks file of the cgroup. The collection process should periodically
445
+re-read the tasks file to check if it's the last process of the
446
+control group. (If you also want to collect network statistics as
447
+explained in the previous section, you should also move the process to
448
+the appropriate network namespace.)
449
+
450
+When the container exits, ``lxc-start`` will try to delete the control
451
+groups. It will fail, since the control group is still in use; but
452
+that’s fine. You process should now detect that it is the only one
453
+remaining in the group. Now is the right time to collect all the
454
+metrics you need!
455
+
456
+Finally, your process should move itself back to the root control
457
+group, and remove the container control group. To remove a control
458
+group, just ``rmdir`` its directory. It's counter-intuitive to
459
+``rmdir`` a directory as it still contains files; but remember that
460
+this is a pseudo-filesystem, so usual rules don't apply. After the
461
+cleanup is done, the collection process can exit safely.
462
+
... ...
@@ -1,12 +1,12 @@
1
-:title: Build Images (Dockerfile Reference)
1
+:title: Dockerfile Reference
2 2
 :description: Dockerfiles use a simple DSL which allows you to automate the steps you would normally manually take to create an image.
3 3
 :keywords: builder, docker, Dockerfile, automation, image creation
4 4
 
5 5
 .. _dockerbuilder:
6 6
 
7
-===================================
8
-Build Images (Dockerfile Reference)
9
-===================================
7
+====================
8
+Dockerfile Reference
9
+====================
10 10
 
11 11
 **Docker can act as a builder** and read instructions from a text
12 12
 ``Dockerfile`` to automate the steps you would otherwise take manually
... ...
@@ -18,6 +18,45 @@ To list available commands, either run ``docker`` with no parameters or execute
18 18
 
19 19
     ...
20 20
 
21
+.. _cli_options:
22
+
23
+Types of Options
24
+----------------
25
+
26
+Boolean
27
+~~~~~~~
28
+
29
+Boolean options look like ``-d=false``. The value you see is the
30
+default value which gets set if you do **not** use the boolean
31
+flag. If you do call ``run -d``, that sets the opposite boolean value,
32
+so in this case, ``true``, and so ``docker run -d`` **will** run in
33
+"detached" mode, in the background. Other boolean options are similar
34
+-- specifying them will set the value to the opposite of the default
35
+value.
36
+
37
+Multi
38
+~~~~~
39
+
40
+Options like ``-a=[]`` indicate they can be specified multiple times::
41
+
42
+  docker run -a stdin -a stdout -a stderr -i -t ubuntu /bin/bash
43
+
44
+Sometimes this can use a more complex value string, as for ``-v``::
45
+
46
+  docker run -v /host:/container example/mysql
47
+
48
+Strings and Integers
49
+~~~~~~~~~~~~~~~~~~~~
50
+
51
+Options like ``-name=""`` expect a string, and they can only be
52
+specified once. Options like ``-c=0`` expect an integer, and they can
53
+only be specified once.
54
+
55
+----
56
+
57
+Commands
58
+--------
59
+
21 60
 .. _cli_daemon:
22 61
 
23 62
 ``daemon``
... ...
@@ -14,4 +14,5 @@ Contents:
14 14
 
15 15
    commandline/index
16 16
    builder
17
+   run
17 18
    api/index
18 19
new file mode 100644
... ...
@@ -0,0 +1,419 @@
0
+:title: Docker Run Reference 
1
+:description: Configure containers at runtime
2
+:keywords: docker, run, configure, runtime
3
+
4
+.. _run_docker:
5
+
6
+====================
7
+Docker Run Reference
8
+====================
9
+
10
+**Docker runs processes in isolated containers**.  When an operator
11
+executes ``docker run``, she starts a process with its own file
12
+system, its own networking, and its own isolated process tree. The
13
+:ref:`image_def` which starts the process may define defaults related
14
+to the binary to run, the networking to expose, and more, but ``docker
15
+run`` gives final control to the operator who starts the container
16
+from the image. That's the main reason :ref:`cli_run` has more options
17
+than any other ``docker`` command.
18
+
19
+Every one of the :ref:`example_list` shows running containers, and so
20
+here we try to give more in-depth guidance.
21
+
22
+.. contents:: Table of Contents
23
+   :depth: 2
24
+
25
+.. _run_running:
26
+
27
+General Form
28
+============
29
+
30
+As you've seen in the :ref:`example_list`, the basic `run` command
31
+takes this form::
32
+
33
+  docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]
34
+
35
+To learn how to interpret the types of ``[OPTIONS]``, see
36
+:ref:`cli_options`.
37
+
38
+The list of ``[OPTIONS]`` breaks down into two groups: 
39
+
40
+1. Settings exclusive to operators, including:
41
+
42
+   * Detached or Foreground running,
43
+   * Container Identification,
44
+   * Network settings, and
45
+   * Runtime Constraints on CPU and Memory
46
+   * Privileges and LXC Configuration
47
+
48
+2. Setting shared between operators and developers, where operators
49
+   can override defaults developers set in images at build time.
50
+
51
+Together, the ``docker run [OPTIONS]`` give complete control over
52
+runtime behavior to the operator, allowing them to override all
53
+defaults set by the developer during ``docker build`` and nearly all
54
+the defaults set by the Docker runtime itself.
55
+
56
+Operator Exclusive Options
57
+==========================
58
+
59
+Only the operator (the person executing ``docker run``) can set the
60
+following options.
61
+
62
+.. contents::
63
+   :local:
64
+
65
+Detached vs Foreground
66
+----------------------
67
+
68
+When starting a Docker container, you must first decide if you want to
69
+run the container in the background in a "detached" mode or in the
70
+default foreground mode::
71
+
72
+   -d=false: Detached mode: Run container in the background, print new container id
73
+
74
+Detached (-d)
75
+.............
76
+
77
+In detached mode (``-d=true`` or just ``-d``), all I/O should be done
78
+through network connections or shared volumes because the container is
79
+no longer listening to the commandline where you executed ``docker
80
+run``. You can reattach to a detached container with ``docker``
81
+:ref:`cli_attach`. If you choose to run a container in the detached
82
+mode, then you cannot use the ``-rm`` option.
83
+
84
+Foreground
85
+..........
86
+
87
+In foreground mode (the default when ``-d`` is not specified),
88
+``docker run`` can start the process in the container and attach the
89
+console to the process's standard input, output, and standard
90
+error. It can even pretend to be a TTY (this is what most commandline
91
+executables expect) and pass along signals. All of that is
92
+configurable::
93
+
94
+   -a=[]          : Attach to ``stdin``, ``stdout`` and/or ``stderr``
95
+   -t=false       : Allocate a pseudo-tty
96
+   -sig-proxy=true: Proxify all received signal to the process (even in non-tty mode)
97
+   -i=false       : Keep STDIN open even if not attached
98
+
99
+If you do not specify ``-a`` then Docker will `attach everything
100
+(stdin,stdout,stderr)
101
+<https://github.com/dotcloud/docker/blob/75a7f4d90cde0295bcfb7213004abce8d4779b75/commands.go#L1797>`_. You
102
+can specify to which of the three standard streams (``stdin``, ``stdout``,
103
+``stderr``) you'd like to connect instead, as in::
104
+
105
+   docker run -a stdin -a stdout -i -t ubuntu /bin/bash
106
+
107
+For interactive processes (like a shell) you will typically want a tty
108
+as well as persistent standard input (``stdin``), so you'll use ``-i
109
+-t`` together in most interactive cases.
110
+
111
+Container Identification
112
+------------------------
113
+
114
+Name (-name)
115
+............
116
+
117
+The operator can identify a container in three ways:
118
+
119
+* UUID long identifier ("f78375b1c487e03c9438c729345e54db9d20cfa2ac1fc3494b6eb60872e74778")
120
+* UUID short identifier ("f78375b1c487")
121
+* Name ("evil_ptolemy")
122
+
123
+The UUID identifiers come from the Docker daemon, and if you do not
124
+assign a name to the container with ``-name`` then the daemon will
125
+also generate a random string name too. The name can become a handy
126
+way to add meaning to a container since you can use this name when
127
+defining :ref:`links <working_with_links_names>` (or any other place
128
+you need to identify a container). This works for both background and
129
+foreground Docker containers.
130
+
131
+PID Equivalent
132
+..............
133
+
134
+And finally, to help with automation, you can have Docker write the
135
+container ID out to a file of your choosing. This is similar to how
136
+some programs might write out their process ID to a file (you've seen
137
+them as PID files)::
138
+
139
+      -cidfile="": Write the container ID to the file
140
+
141
+Network Settings
142
+----------------
143
+
144
+::
145
+   -n=true   : Enable networking for this container
146
+   -dns=[]   : Set custom dns servers for the container
147
+
148
+By default, all containers have networking enabled and they can make
149
+any outgoing connections. The operator can completely disable
150
+networking with ``docker run -n`` which disables all incoming and outgoing
151
+networking. In cases like this, you would perform I/O through files or
152
+STDIN/STDOUT only.
153
+
154
+Your container will use the same DNS servers as the host by default,
155
+but you can override this with ``-dns``.
156
+
157
+Clean Up (-rm)
158
+--------------
159
+
160
+By default a container's file system persists even after the container
161
+exits. This makes debugging a lot easier (since you can inspect the
162
+final state) and you retain all your data by default. But if you are
163
+running short-term **foreground** processes, these container file
164
+systems can really pile up. If instead you'd like Docker to
165
+**automatically clean up the container and remove the file system when
166
+the container exits**, you can add the ``-rm`` flag::
167
+
168
+   -rm=false: Automatically remove the container when it exits (incompatible with -d)
169
+
170
+
171
+Runtime Constraints on CPU and Memory
172
+-------------------------------------
173
+
174
+The operator can also adjust the performance parameters of the container::
175
+
176
+   -m="": Memory limit (format: <number><optional unit>, where unit = b, k, m or g)
177
+   -c=0 : CPU shares (relative weight)
178
+
179
+The operator can constrain the memory available to a container easily
180
+with ``docker run -m``. If the host supports swap memory, then the
181
+``-m`` memory setting can be larger than physical RAM.
182
+
183
+Similarly the operator can increase the priority of this container
184
+with the ``-c`` option. By default, all containers run at the same
185
+priority and get the same proportion of CPU cycles, but you can tell
186
+the kernel to give more shares of CPU time to one or more containers
187
+when you start them via Docker.
188
+
189
+Runtime Privilege and LXC Configuration
190
+---------------------------------------
191
+
192
+::
193
+
194
+   -privileged=false: Give extended privileges to this container
195
+   -lxc-conf=[]: Add custom lxc options -lxc-conf="lxc.cgroup.cpuset.cpus = 0,1"
196
+
197
+By default, Docker containers are "unprivileged" and cannot, for
198
+example, run a Docker daemon inside a Docker container. This is
199
+because by default a container is not allowed to access any devices,
200
+but a "privileged" container is given access to all devices (see
201
+lxc-template.go_ and documentation on `cgroups devices
202
+<https://www.kernel.org/doc/Documentation/cgroups/devices.txt>`_).
203
+
204
+When the operator executes ``docker run -privileged``, Docker will
205
+enable to access to all devices on the host as well as set some
206
+configuration in AppArmor to allow the container nearly all the same
207
+access to the host as processes running outside containers on the
208
+host. Additional information about running with ``-privileged`` is
209
+available on the `Docker Blog
210
+<http://blog.docker.io/2013/09/docker-can-now-run-within-docker/>`_.
211
+
212
+An operator can also specify LXC options using one or more
213
+``-lxc-conf`` parameters. These can be new parameters or override
214
+existing parameters from the lxc-template.go_. Note that in the
215
+future, a given host's Docker daemon may not use LXC, so this is an
216
+implementation-specific configuration meant for operators already
217
+familiar with using LXC directly.
218
+
219
+.. _lxc-template.go: https://github.com/dotcloud/docker/blob/master/execdriver/lxc/lxc_template.go
220
+
221
+
222
+Overriding ``Dockerfile`` Image Defaults
223
+========================================
224
+
225
+When a developer builds an image from a :ref:`Dockerfile
226
+<dockerbuilder>` or when she commits it, the developer can set a
227
+number of default parameters that take effect when the image starts up
228
+as a container.
229
+
230
+Four of the ``Dockerfile`` commands cannot be overridden at runtime:
231
+``FROM, MAINTAINER, RUN``, and ``ADD``. Everything else has a
232
+corresponding override in ``docker run``. We'll go through what the
233
+developer might have set in each ``Dockerfile`` instruction and how the
234
+operator can override that setting.
235
+
236
+.. contents::
237
+   :local:
238
+
239
+CMD (Default Command or Options)
240
+--------------------------------
241
+
242
+Recall the optional ``COMMAND`` in the Docker commandline::
243
+
244
+  docker run [OPTIONS] IMAGE[:TAG] [COMMAND] [ARG...]
245
+
246
+This command is optional because the person who created the ``IMAGE``
247
+may have already provided a default ``COMMAND`` using the ``Dockerfile``
248
+``CMD``. As the operator (the person running a container from the
249
+image), you can override that ``CMD`` just by specifying a new
250
+``COMMAND``.
251
+
252
+If the image also specifies an ``ENTRYPOINT`` then the ``CMD`` or
253
+``COMMAND`` get appended as arguments to the ``ENTRYPOINT``.
254
+
255
+
256
+ENTRYPOINT (Default Command to Execute at Runtime
257
+-------------------------------------------------
258
+
259
+::
260
+
261
+   -entrypoint="": Overwrite the default entrypoint set by the image
262
+
263
+The ENTRYPOINT of an image is similar to a ``COMMAND`` because it
264
+specifies what executable to run when the container starts, but it is
265
+(purposely) more difficult to override. The ``ENTRYPOINT`` gives a
266
+container its default nature or behavior, so that when you set an
267
+``ENTRYPOINT`` you can run the container *as if it were that binary*,
268
+complete with default options, and you can pass in more options via
269
+the ``COMMAND``. But, sometimes an operator may want to run something else
270
+inside the container, so you can override the default ``ENTRYPOINT`` at
271
+runtime by using a string to specify the new ``ENTRYPOINT``. Here is an
272
+example of how to run a shell in a container that has been set up to
273
+automatically run something else (like ``/usr/bin/redis-server``)::
274
+
275
+  docker run -i -t -entrypoint /bin/bash example/redis
276
+
277
+or two examples of how to pass more parameters to that ENTRYPOINT::
278
+
279
+  docker run -i -t -entrypoint /bin/bash example/redis -c ls -l
280
+  docker run -i -t -entrypoint /usr/bin/redis-cli example/redis --help
281
+
282
+
283
+EXPOSE (Incoming Ports)
284
+-----------------------
285
+
286
+The ``Dockerfile`` doesn't give much control over networking, only
287
+providing the ``EXPOSE`` instruction to give a hint to the operator
288
+about what incoming ports might provide services. The following
289
+options work with or override the ``Dockerfile``'s exposed defaults::
290
+
291
+   -expose=[]: Expose a port from the container 
292
+               without publishing it to your host
293
+   -P=false  : Publish all exposed ports to the host interfaces
294
+   -p=[]     : Publish a container's port to the host (format: 
295
+               ip:hostPort:containerPort | ip::containerPort | 
296
+               hostPort:containerPort) 
297
+               (use 'docker port' to see the actual mapping)
298
+   -link=""  : Add link to another container (name:alias)
299
+
300
+As mentioned previously, ``EXPOSE`` (and ``-expose``) make a port
301
+available **in** a container for incoming connections. The port number
302
+on the inside of the container (where the service listens) does not
303
+need to be the same number as the port exposed on the outside of the
304
+container (where clients connect), so inside the container you might
305
+have an HTTP service listening on port 80 (and so you ``EXPOSE 80`` in
306
+the ``Dockerfile``), but outside the container the port might be 42800.
307
+
308
+To help a new client container reach the server container's internal
309
+port operator ``-expose``'d by the operator or ``EXPOSE``'d by the
310
+developer, the operator has three choices: start the server container
311
+with ``-P`` or ``-p,`` or start the client container with ``-link``.
312
+
313
+If the operator uses ``-P`` or ``-p`` then Docker will make the
314
+exposed port accessible on the host and the ports will be available to
315
+any client that can reach the host. To find the map between the host
316
+ports and the exposed ports, use ``docker port``)
317
+
318
+If the operator uses ``-link`` when starting the new client container,
319
+then the client container can access the exposed port via a private
320
+networking interface. Docker will set some environment variables in
321
+the client container to help indicate which interface and port to use.
322
+
323
+ENV (Environment Variables)
324
+---------------------------
325
+
326
+The operator can **set any environment variable** in the container by
327
+using one or more ``-e`` flags, even overriding those already defined by the
328
+developer with a Dockefile ``ENV``::
329
+
330
+   $ docker run -e "deep=purple" -rm ubuntu /bin/bash -c export
331
+   declare -x HOME="/"
332
+   declare -x HOSTNAME="85bc26a0e200"
333
+   declare -x OLDPWD
334
+   declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
335
+   declare -x PWD="/"
336
+   declare -x SHLVL="1"
337
+   declare -x container="lxc"
338
+   declare -x deep="purple"
339
+
340
+Similarly the operator can set the **hostname** with ``-h``.
341
+
342
+``-link name:alias`` also sets environment variables, using the
343
+*alias* string to define environment variables within the container
344
+that give the IP and PORT information for connecting to the service
345
+container. Let's imagine we have a container running Redis::
346
+
347
+   # Start the service container, named redis-name
348
+   $ docker run -d -name redis-name dockerfiles/redis
349
+   4241164edf6f5aca5b0e9e4c9eccd899b0b8080c64c0cd26efe02166c73208f3
350
+
351
+   # The redis-name container exposed port 6379
352
+   $ docker ps  
353
+   CONTAINER ID        IMAGE                      COMMAND                CREATED             STATUS              PORTS               NAMES
354
+   4241164edf6f        dockerfiles/redis:latest   /redis-stable/src/re   5 seconds ago       Up 4 seconds        6379/tcp            redis-name  
355
+
356
+   # Note that there are no public ports exposed since we didn't use -p or -P
357
+   $ docker port 4241164edf6f 6379
358
+   2014/01/25 00:55:38 Error: No public port '6379' published for 4241164edf6f
359
+
360
+
361
+Yet we can get information about the Redis container's exposed ports
362
+with ``-link``. Choose an alias that will form a valid environment
363
+variable!
364
+
365
+::
366
+
367
+   $ docker run -rm -link redis-name:redis_alias -entrypoint /bin/bash dockerfiles/redis -c export
368
+   declare -x HOME="/"
369
+   declare -x HOSTNAME="acda7f7b1cdc"
370
+   declare -x OLDPWD
371
+   declare -x PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
372
+   declare -x PWD="/"
373
+   declare -x REDIS_ALIAS_NAME="/distracted_wright/redis"
374
+   declare -x REDIS_ALIAS_PORT="tcp://172.17.0.32:6379"
375
+   declare -x REDIS_ALIAS_PORT_6379_TCP="tcp://172.17.0.32:6379"
376
+   declare -x REDIS_ALIAS_PORT_6379_TCP_ADDR="172.17.0.32"
377
+   declare -x REDIS_ALIAS_PORT_6379_TCP_PORT="6379"
378
+   declare -x REDIS_ALIAS_PORT_6379_TCP_PROTO="tcp"
379
+   declare -x SHLVL="1"
380
+   declare -x container="lxc"
381
+
382
+And we can use that information to connect from another container as a client::
383
+
384
+   $ docker run -i -t -rm -link redis-name:redis_alias -entrypoint /bin/bash dockerfiles/redis -c '/redis-stable/src/redis-cli -h $REDIS_ALIAS_PORT_6379_TCP_ADDR -p $REDIS_ALIAS_PORT_6379_TCP_PORT'
385
+   172.17.0.32:6379>
386
+
387
+VOLUME (Shared Filesystems)
388
+---------------------------
389
+
390
+::
391
+
392
+   -v=[]: Create a bind mount with: [host-dir]:[container-dir]:[rw|ro]. 
393
+          If "container-dir" is missing, then docker creates a new volume.
394
+   -volumes-from="": Mount all volumes from the given container(s)
395
+
396
+The volumes commands are complex enough to have their own
397
+documentation in section :ref:`volume_def`. A developer can define one
398
+or more ``VOLUME``\s associated with an image, but only the operator can
399
+give access from one container to another (or from a container to a
400
+volume mounted on the host).
401
+
402
+USER
403
+----
404
+
405
+The default user within a container is ``root`` (id = 0), but if the
406
+developer created additional users, those are accessible too. The
407
+developer can set a default user to run the first process with the
408
+``Dockerfile USER`` command, but the operator can override it ::
409
+
410
+   -u="": Username or UID
411
+
412
+WORKDIR
413
+-------
414
+
415
+The default working directory for running binaries within a container is the root directory (``/``), but the developer can set a different default with the ``Dockerfile WORKDIR`` command. The operator can override this with::
416
+
417
+   -w="": Working directory inside the container
418
+