GitList

Raw Blame History
Shutting down or killing a container
------------------------------------

From the host, the inject utility can be used to run an appropriate command
within the container to start a graceful shut down. For example

  inject PID /bin/halt

To immediately kill a container and all its processes, it is sufficient to
send the init process a SIGKILL from the host using

  pkill -KILL -P PID

where PID is the process ID of a running container supervisor. It is very
important not to SIGKILL the container supervisor itself or the container
will be orphaned, continuing to run unsupervised as a child of the host
init.


Using cgroups to limit memory and CPU-share available to a container
--------------------------------------------------------------------

If cgroup support including memcg and memcg-swap is compiled into the kernel
and the cgroup filesystem is mounted with the cpu and memory controllers
enabled, it is straightforward to apply memory and CPU-share limits to a
container as it is started. For example, the shell script

  #!/bin/sh -e
  mkdir /sys/fs/cgroup/mycontainer
  echo $$ >/sys/fs/cgroup/mycontainer/tasks
  echo 2G >/sys/fs/cgroup/mycontainer/memory.limit_in_bytes
  echo 2G >/sys/fs/cgroup/mycontainer/memory.memsw.limit_in_bytes
  echo 1000 >sys/fs/cgroup/mycontainer/cpu.shares
  exec contain [...]

applies a limit of 2GB virtual memory and a CPU-share of 1000 before
starting the container. It might also be useful to apply a
memory.kmem.limit_in_bytes setting to prevent a container from using
excessive amounts of kernel memory.

Note that to set the virtual memory limit in memory.memsw.limit_in_bytes, it
is first necessary to set a smaller or equal physical memory limit in
memory.limit_in_bytes.

When a container lives inside a memory cgroup, memory.memsw.usage_in_bytes
gives a measure of the total virtual memory in use by the container, and
memory.usage_in_bytes measures its physical memory footprint. The accounting
policy is explained in linux/kernel/Documentation/cgroups/memory.txt.


Troubleshooting
---------------

The contain/psuedo error message 'Failed to unshare user namespace: Invalid
argument' typically means that your kernel is not compiled with support for
user namespaces, i.e. CONFIG_USER_NS is not set. The contain tool will also
die with a similar message referring to one of the other required namespaces
if support for that is not available in the kernel.

To run these tools you need to be running Linux 3.8 or later with

  CONFIG_UTS_NS=y
  CONFIG_IPC_NS=y
  CONFIG_USER_NS=y
  CONFIG_PID_NS=y
  CONFIG_NET_NS=y

set in the kernel build config. Note that before Linux 3.12, CONFIG_XFS_FS
conflicted with CONFIG_USER_NS, so these tools could not be used where XFS
support was compiled either into the kernel or as a module.

The contain tool will fail to mount /dev/pts unless

  CONFIG_DEVPTS_MULTIPLE_INSTANCES=y

is set in the kernel build config. Both container and host /dev/pts must be
mounted with -o newinstance, with /dev/ptmx symlinked to pts/ptmx.

Linux 3.12 introduced tighter restrictions on mounting proc and sysfs, which
broke older versions of contain. To comply with these new rules, contain
now ensures that procfs and sysfs are mounted in the new mount namespace
before pivoting into the container and detaching the host root.

A bug in Linux 3.12 will prevent contain from mounting /proc in a container
if binfmt_misc is mounted on /proc/sys/fs/binfmt_misc in the host
filesystem. This was fixed in Linux 3.13.

Linux 3.19 introduced restrictions on writing a user namespace GID map as an
unprivileged user unless setgroups() has been permanently disabled, which
broke older versions of contain. Run non-setuid and unprivileged, contain
and pseudo must now disable setgroups() to create containers, but if they
are installed setuid, they will bypass this kernel restriction and leave
setgroups() enabled in the resulting containers.