Browse code

Add docs about how to extend devicemapper thin pool

Signed-off-by: Chun Chen <ramichen@tencent.com>

Update to device mapper
Entering comments

Signed-off-by: Mary Anthony <mary@docker.com>

Chun Chen authored on 2016/04/05 16:35:24
Showing 1 changed files
... ...
@@ -16,12 +16,10 @@ leverages the thin provisioning and snapshotting capabilities of this framework
16 16
 for image and container management. This article refers to the Device Mapper
17 17
 storage driver as `devicemapper`, and the kernel framework as `Device Mapper`.
18 18
 
19
-
20 19
 >**Note**: The [Commercially Supported Docker Engine (CS-Engine) running on RHEL
21 20
 and CentOS Linux](https://www.docker.com/compatibility-maintenance) requires
22 21
 that you use the `devicemapper` storage driver.
23 22
 
24
-
25 23
 ## An alternative to AUFS
26 24
 
27 25
 Docker originally ran on Ubuntu and Debian Linux and used AUFS for its storage
... ...
@@ -61,20 +59,20 @@ With `devicemapper` the high level process for creating images is as follows:
61 61
 
62 62
 1. The `devicemapper` storage driver creates a thin pool.
63 63
 
64
-    The pool is created from block devices or loop mounted sparse files (more
65
-on this later).
64
+	The pool is created from block devices or loop mounted sparse files (more
65
+	on this later).
66 66
 
67 67
 2. Next it creates a *base device*.
68 68
 
69
-    A base device is a thin device with a filesystem. You can see which
70
-filesystem is in use by running the `docker info` command and checking the
71
-`Backing filesystem` value.
69
+	A base device is a thin device with a filesystem. You can see which
70
+	filesystem is in use by running the `docker info` command and checking the
71
+	`Backing filesystem` value.
72 72
 
73 73
 3. Each new image (and image layer) is a snapshot of this base device.
74 74
 
75
-    These are thin provisioned copy-on-write snapshots. This means that they
76
-are initially empty and only consume space from the pool when data is written
77
-to them.
75
+	These are thin provisioned copy-on-write snapshots. This means that they
76
+	are initially empty and only consume space from the pool when data is written
77
+	to them.
78 78
 
79 79
 With `devicemapper`, container layers are snapshots of the image they are
80 80
 created from. Just as with images, container snapshots are thin provisioned
... ...
@@ -109,9 +107,9 @@ block (`0x44f`) in an example container.
109 109
 
110 110
 1. An application makes a read request for block `0x44f` in the container.
111 111
 
112
-    Because the container is a thin snapshot of an image it does not have the
113
-data. Instead, it has a pointer (PTR) to where the data is stored in the image
114
-snapshot lower down in the image stack.
112
+	Because the container is a thin snapshot of an image it does not have the
113
+	data. Instead, it has a pointer (PTR) to where the data is stored in the image
114
+	snapshot lower down in the image stack.
115 115
 
116 116
 2. The storage driver follows the pointer to block `0xf33` in the snapshot
117 117
 relating to image layer `a005...`.
... ...
@@ -121,7 +119,7 @@ snapshot to memory in the container.
121 121
 
122 122
 4. The storage driver returns the data to the requesting application.
123 123
 
124
-### Write examples
124
+## Write examples
125 125
 
126 126
 With the `devicemapper` driver, writing new data to a container is accomplished
127 127
  by an *allocate-on-demand* operation. Updating existing data uses a
... ...
@@ -132,7 +130,7 @@ For example, when making a small change to a large file in a container, the
132 132
 `devicemapper` storage driver does not copy the entire file. It only copies the
133 133
  blocks to be modified. Each block is 64KB.
134 134
 
135
-#### Writing new data
135
+### Writing new data
136 136
 
137 137
 To write 56KB of new data to a container:
138 138
 
... ...
@@ -141,12 +139,12 @@ To write 56KB of new data to a container:
141 141
 2. The allocate-on-demand operation allocates a single new 64KB block to the
142 142
 container's snapshot.
143 143
 
144
-    If the write operation is larger than 64KB, multiple new blocks are
145
-allocated to the container's snapshot.
144
+	If the write operation is larger than 64KB, multiple new blocks are
145
+	allocated to the container's snapshot.
146 146
 
147 147
 3. The data is written to the newly allocated block.
148 148
 
149
-#### Overwriting existing data
149
+### Overwriting existing data
150 150
 
151 151
 To modify existing data for the first time:
152 152
 
... ...
@@ -163,7 +161,7 @@ The application in the container is unaware of any of these
163 163
 allocate-on-demand and copy-on-write operations. However, they may add latency
164 164
 to the application's read and write operations.
165 165
 
166
-## Configuring Docker with Device Mapper
166
+## Configure Docker with devicemapper
167 167
 
168 168
 The `devicemapper` is the default Docker storage driver on some Linux
169 169
 distributions. This includes RHEL and most of its forks. Currently, the
... ...
@@ -182,18 +180,20 @@ deployments should not run under `loop-lvm` mode.
182 182
 
183 183
 You can detect the mode by viewing the `docker info` command:
184 184
 
185
-    $ sudo docker info
186
-    Containers: 0
187
-    Images: 0
188
-    Storage Driver: devicemapper
189
-     Pool Name: docker-202:2-25220302-pool
190
-     Pool Blocksize: 65.54 kB
191
-     Backing Filesystem: xfs
192
-     ...
193
-     Data loop file: /var/lib/docker/devicemapper/devicemapper/data
194
-     Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
195
-     Library Version: 1.02.93-RHEL7 (2015-01-28)
196
-     ...
185
+```bash
186
+$ sudo docker info
187
+Containers: 0
188
+Images: 0
189
+Storage Driver: devicemapper
190
+ Pool Name: docker-202:2-25220302-pool
191
+ Pool Blocksize: 65.54 kB
192
+ Backing Filesystem: xfs
193
+ [...]
194
+ Data loop file: /var/lib/docker/devicemapper/devicemapper/data
195
+ Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
196
+ Library Version: 1.02.93-RHEL7 (2015-01-28)
197
+ [...]
198
+ ```
197 199
 
198 200
 The output above shows a Docker host running with the `devicemapper` storage
199 201
 driver operating in `loop-lvm` mode. This is indicated by the fact that the
... ...
@@ -203,175 +203,141 @@ files.
203 203
 
204 204
 ### Configure direct-lvm mode for production
205 205
 
206
-The preferred configuration for production deployments is `direct lvm`. This
206
+The preferred configuration for production deployments is `direct-lvm`. This
207 207
 mode uses block devices to create the thin pool. The following procedure shows
208 208
 you how to configure a Docker host to use the `devicemapper` storage driver in
209 209
 a `direct-lvm` configuration.
210 210
 
211
-> **Caution:** If you have already run the Engine daemon on your Docker host
211
+> **Caution:** If you have already run the Docker daemon on your Docker host
212 212
 > and have images you want to keep, `push` them Docker Hub or your private
213 213
 > Docker Trusted Registry before attempting this procedure.
214 214
 
215 215
 The procedure below will create a 90GB data volume and 4GB metadata volume to
216 216
 use as backing for the storage pool. It assumes that you have a spare block
217
-device at `/dev/sdd` with enough free space to complete the task. The device
217
+device at `/dev/xvdf` with enough free space to complete the task. The device
218 218
 identifier and volume sizes may be be different in your environment and you
219
-should substitute your own values throughout the procedure.
220
-
221
-The procedure also assumes that the Engine daemon is in the `stopped` state.
222
-Any existing images or data are lost by this process.
223
-
224
-1. Log in to the Docker host you want to configure.
225
-2. If it is running, stop the Engine daemon.
226
-3. Install the logical volume management version 2.
227
-
228
-    ```bash
229
-    $ yum install lvm2
230
-    ```
231
-4. Create a physical volume replacing `/dev/sdd` with your block device.
232
-
233
-    ```bash
234
-    $ pvcreate /dev/sdd
235
-  ```
236
-
237
-5. Create a 'docker' volume group.
238
-
239
-    ```bash
240
-    $ vgcreate docker /dev/sdd
241
-    ```
242
-
243
-6. Create a thin pool named `thinpool`.
244
-
245
-    In this example, the data logical is 95% of the 'docker' volume group size.
246
-    Leaving this free space allows for auto expanding of either the data or
247
-    metadata if space runs low as a temporary stopgap.
248
-
249
-    ```bash
250
-   $ lvcreate --wipesignatures y -n thinpool docker -l 95%VG
251
-   $ lvcreate --wipesignatures y -n thinpoolmeta docker -l 1%VG
252
-   ```
253
-
254
-7. Convert the pool to a thin pool.
255
-
256
-    ```bash
257
-    $ lvconvert -y --zero n -c 512K --thinpool docker/thinpool --poolmetadata docker/thinpoolmeta
258
-    ```
259
-
260
-8. Configure autoextension of thin pools via an `lvm` profile.
261
-
262
-    ```bash
263
-    $ vi /etc/lvm/profile/docker-thinpool.profile
264
-    ```
265
-
266
-9. Specify 'thin_pool_autoextend_threshold' value.
267
-
268
-    The value should be the percentage of space used before `lvm` attempts
269
-    to autoextend the available space (100 = disabled).
219
+should substitute your own values throughout the procedure. The procedure also
220
+assumes that the Docker daemon is in the `stopped` state.
270 221
 
271
-    ```
272
-    thin_pool_autoextend_threshold = 80
273
-    ```
222
+1. Log in to the Docker host you want to configure and stop the Docker daemon.
274 223
 
275
-10. Modify the `thin_pool_autoextend_percent` for when thin pool autoextension occurs.
224
+2. If it exists, delete your existing image store by removing the
225
+`/var/lib/docker` directory.
276 226
 
277
-    The value's setting is the perentage of space to increase the thin pool (100 =
278
-    disabled)
279
-
280
-    ```
281
-    thin_pool_autoextend_percent = 20
282
-    ```
283
-
284
-11. Check your work, your `docker-thinpool.profile` file should appear similar to the following:
285
-
286
-    An example `/etc/lvm/profile/docker-thinpool.profile` file:
227
+	```bash
228
+	$ sudo rm -rf /var/lib/docker
229
+	```
287 230
 
288
-    ```
289
-     activation {
290
-         thin_pool_autoextend_threshold=80
291
-         thin_pool_autoextend_percent=20
292
-     }
293
-     ```
231
+3. Create an LVM physical volume (PV) on your spare block device using the
232
+`pvcreate` command.
294 233
 
295
-12. Apply your new lvm profile
234
+	```bash
235
+	$ sudo pvcreate /dev/xvdf
236
+	Physical volume `/dev/xvdf` successfully created
237
+	```
296 238
 
297
-    ```bash
298
-    $ lvchange --metadataprofile docker-thinpool docker/thinpool
299
-  ```
239
+	The device identifier may be different on your system. Remember to substitute
240
+	your value in the command above. If your host is running on AWS EC2, you may
241
+	need to install `lvm2` and <a href="http://goo.gl/Q5pUwG"
242
+	target="_blank">attach an EBS device</a> to use this procedure.
300 243
 
301
-13. Verify the `lv` is monitored.
244
+4. Create a new volume group (VG) called `vg-docker` using the PV created in
245
+the previous step.
302 246
 
303
-    ```bash
304
-    $ lvs -o+seg_monitor
305
-    ```
247
+	```bash
248
+	$ sudo vgcreate vg-docker /dev/xvdf
249
+	Volume group `vg-docker` successfully created
250
+	```
306 251
 
307
-14. If Engine was previously started, clear your graph driver directory.
252
+5. Create a new 90GB logical volume (LV) called `data` from space in the
253
+`vg-docker` volume group.
308 254
 
309
-    Clearing your graph driver removes any images and containers in your Docker
310
-    installation.
255
+	```bash
256
+	$ sudo lvcreate -L 90G -n data vg-docker
257
+	Logical volume `data` created.
258
+	```
311 259
 
312
-    ```bash
313
-    $ rm -rf /var/lib/docker/*
314
-    ```
260
+	The command creates an LVM logical volume called `data` and an associated
261
+	block device file at `/dev/vg-docker/data`. In a later step, you instruct the
262
+	`devicemapper` storage driver to use this block device to store image and
263
+	container data.
315 264
 
316
-14. Configure the Engine daemon with specific devicemapper options.
265
+	If you receive a signature detection warning, make sure you are working on
266
+	the correct devices before continuing. Signature warnings indicate that the
267
+	device you're working on is currently in use by LVM or has been used by LVM in
268
+	the past.
317 269
 
318
-    There are two ways to do this. You can set options on the commmand line if you start the daemon there:
270
+6. Create a new logical volume (LV) called `metadata` from space in the
271
+`vg-docker` volume group.
319 272
 
320
-    ```bash
321
-    --storage-driver=devicemapper --storage-opt=dm.thinpooldev=/dev/mapper/docker-thinpool --storage-opt dm.use_deferred_removal=true
322
-    ```
273
+	```bash
274
+	$ sudo lvcreate -L 4G -n metadata vg-docker
275
+	Logical volume `metadata` created.
276
+	```
323 277
 
324
-    You can also set them for startup in the `daemon.json` configuration, for example:
278
+	This creates an LVM logical volume called `metadata` and an associated
279
+	block device file at `/dev/vg-docker/metadata`. In the next step you instruct
280
+	the `devicemapper` storage driver to use this block device to store image and
281
+	container metadata.
325 282
 
326
-    ```json
327
-     {
328
-             "storage-driver": "devicemapper",
329
-             "storage-opts": [
330
-                     "dm.thinpooldev=/dev/mapper/docker-thinpool",
331
-                     "dm.use_deferred_removal=true"
332
-             ]
333
-     }
334
-    ```
335
-15. Start the Engine daemon.
283
+7. Start the Docker daemon with the `devicemapper` storage driver and the
284
+`--storage-opt` flags.
336 285
 
337
-    ```bash
338
-    $ systemctl start docker
339
-    ```
286
+	The `data` and `metadata` devices that you pass to the `--storage-opt`
287
+	options were created in the previous steps.
340 288
 
341
-After you start the Engine daemon, ensure you monitor your thin pool and volume
342
-group free space. While the volume group will auto-extend, it can still fill
343
-up. To monitor logical volumes, use `lvs` without options or `lvs -a` to see tha
344
-data and metadata sizes. To monitor volume group free space, use the `vgs` command.
289
+	```bash
290
+	$ sudo docker daemon --storage-driver=devicemapper --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata &
291
+	[1] 2163
292
+	[root@ip-10-0-0-75 centos]# INFO[0000] Listening for HTTP on unix (/var/run/docker.sock)
293
+	INFO[0027] Option DefaultDriver: bridge
294
+	INFO[0027] Option DefaultNetwork: bridge
295
+	<-- output truncated -->
296
+	INFO[0027] Daemon has completed initialization
297
+	INFO[0027] Docker daemon commit=1b09a95 graphdriver=aufs version=1.11.0-dev
298
+	```
345 299
 
346
-Logs can show the auto-extension of the thin pool when it hits the threshold, to
347
-view the logs use:
300
+	It is also possible to set the `--storage-driver` and `--storage-opt` flags
301
+	in the Docker config file and start the daemon normally using the `service` or
302
+	`systemd` commands.
348 303
 
349
-```bash
350
-journalctl -fu dm-event.service
351
-```
304
+8. Use the `docker info` command to verify that the daemon is using `data` and
305
+`metadata` devices you created.
352 306
 
353
-If you run into repeated problems with thin pool, you can use the
354
-`dm.min_free_space` option to tune the Engine behavior. This value ensures that
355
-operations fail with a warning when the free space is at or near the minimum.
356
-For information, see <a
357
-href="https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options"
358
-target="_blank">the storage driver options in the Engine daemon reference</a>.
307
+	```bash
308
+	$ sudo docker info
309
+	INFO[0180] GET /v1.20/info
310
+	Containers: 0
311
+	Images: 0
312
+	Storage Driver: devicemapper
313
+	 Pool Name: docker-202:1-1032-pool
314
+	 Pool Blocksize: 65.54 kB
315
+	 Backing Filesystem: xfs
316
+	 Data file: /dev/vg-docker/data
317
+	 Metadata file: /dev/vg-docker/metadata
318
+	[...]
319
+	```
359 320
 
321
+	The output of the command above shows the storage driver as `devicemapper`.
322
+	The last two lines also confirm that the correct devices are being used for
323
+	the `Data file` and the `Metadata file`.
360 324
 
361 325
 ### Examine devicemapper structures on the host
362 326
 
363 327
 You can use the `lsblk` command to see the device files created above and the
364 328
 `pool` that the `devicemapper` storage driver creates on top of them.
365 329
 
366
-    $ sudo lsblk
367
-    NAME                       MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
368
-    xvda                       202:0    0    8G  0 disk
369
-    └─xvda1                    202:1    0    8G  0 part /
370
-    xvdf                       202:80   0   10G  0 disk
371
-    ├─vg--docker-data          253:0    0   90G  0 lvm
372
-    │ └─docker-202:1-1032-pool 253:2    0   10G  0 dm
373
-    └─vg--docker-metadata      253:1    0    4G  0 lvm
374
-      └─docker-202:1-1032-pool 253:2    0   10G  0 dm
330
+```bash
331
+$ sudo lsblk
332
+NAME                       MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
333
+xvda                       202:0    0    8G  0 disk
334
+└─xvda1                    202:1    0    8G  0 part /
335
+xvdf                       202:80   0   10G  0 disk
336
+├─vg--docker-data          253:0    0   90G  0 lvm
337
+│ └─docker-202:1-1032-pool 253:2    0   10G  0 dm
338
+└─vg--docker-metadata      253:1    0    4G  0 lvm
339
+  └─docker-202:1-1032-pool 253:2    0   10G  0 dm
340
+```
375 341
 
376 342
 The diagram below shows the image from prior examples updated with the detail
377 343
 from the `lsblk` command above.
... ...
@@ -379,8 +345,8 @@ from the `lsblk` command above.
379 379
 ![](http://farm1.staticflickr.com/703/22116692899_0471e5e160_b.jpg)
380 380
 
381 381
 In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data`
382
- and `metadata` devices created earlier. The `devicemapper` constructs the pool
383
- name as follows:
382
+and `metadata` devices created earlier. The `devicemapper` constructs the pool
383
+name as follows:
384 384
 
385 385
 ```
386 386
 Docker-MAJ:MIN-INO-pool
... ...
@@ -440,18 +406,18 @@ Logging Driver: json-file
440 440
 [...]
441 441
 ```
442 442
 
443
-The `Data Space` values show that the pool is 100GiB total. This example extends the pool to 200GiB.
443
+The `Data Space` values show that the pool is 100GB total. This example extends the pool to 200GB.
444 444
 
445 445
 1. List the sizes of the devices.
446 446
 
447 447
 	```bash
448 448
 	$ sudo ls -lh /var/lib/docker/devicemapper/devicemapper/
449
-	total 1.2G
450
-	-rw------- 1 root root 100G Apr 14 08:47 data
451
-	-rw------- 1 root root 2.0G Apr 19 13:27 metadata
449
+	total 1175492
450
+	-rw------- 1 root root 100G Mar 30 05:22 data
451
+	-rw------- 1 root root 2.0G Mar 31 11:17 metadata
452 452
 	```
453 453
 
454
-2. Truncate `data` file to 200GiB.
454
+2. Truncate `data` file to the size of the `metadata` file (approximage 200GB).
455 455
 
456 456
 	```bash
457 457
 	$ sudo truncate -s 214748364800 /var/lib/docker/devicemapper/devicemapper/data
... ...
@@ -460,10 +426,12 @@ The `Data Space` values show that the pool is 100GiB total. This example extends
460 460
 3. Verify the file size changed.
461 461
 
462 462
 	```bash
463
-	$ sudo ls -lh /var/lib/docker/devicemapper/devicemapper/
464
-	total 1.2G
465
-	-rw------- 1 root root 200G Apr 14 08:47 data
466
-	-rw------- 1 root root 2.0G Apr 19 13:27 metadata
463
+	$ sudo ls -al /var/lib/docker/devicemapper/devicemapper/
464
+	total 1175492
465
+	drwx------ 2 root root         4096 Mar 29 02:45 .
466
+	drwx------ 5 root root         4096 Mar 29 02:48 ..
467
+	-rw------- 1 root root 214748364800 Mar 31 11:20 data
468
+	-rw------- 1 root root   2147483648 Mar 31 11:17 metadata
467 469
 	```
468 470
 
469 471
 4. Reload data loop device
... ...
@@ -480,19 +448,19 @@ The `Data Space` values show that the pool is 100GiB total. This example extends
480 480
 
481 481
 	a. Get the pool name first.
482 482
 
483
-		$ sudo dmsetup status | grep pool
484
-		docker-8:1-123141-pool: 0 209715200 thin-pool 91 422/524288 18338/1638400 - rw discard_passdown queue_if_no_space -
483
+		$ sudo dmsetup status docker-8:1-123141-pool: 0 209715200 thin-pool 91
484
+		422/524288 18338/1638400 - rw discard_passdown queue_if_no_space -
485 485
 
486 486
 		The name is the string before the colon.
487 487
 
488
-	b. Dump the device mapper table first.
488
+ 	b. Dump the device mapper table first.
489 489
 
490 490
 		$ sudo dmsetup table docker-8:1-123141-pool
491 491
 		0 209715200 thin-pool 7:1 7:0 128 32768 1 skip_block_zeroing
492 492
 
493 493
 	c. Calculate the real total sectors of the thin pool now.
494 494
 
495
-		Change the second number of the table info (i.e. the number of sectors) to reflect the new number of 512 byte sectors in the disk. For example, as the new loop size is 200GiB, change the second number to 419430400.
495
+		Change the second number of the table info (i.e. the disk end sector) to reflect the new number of 512 byte sectors in the disk. For example, as the new loop size is 200GB, change the second number to 419430400.
496 496
 
497 497
 	d. Reload the thin pool with the new sector number
498 498
 
... ...
@@ -514,7 +482,7 @@ $ ./device_tool resize 200GB
514 514
 ### For a direct-lvm mode configuration
515 515
 
516 516
 In this example, you extend the capacity of a running device that uses the
517
-`direct-lvm` configuration.  This example assumes you are using the `/dev/sdh1`
517
+`direct-lvm` configuration. This example assumes you are using the `/dev/sdh1`
518 518
 disk partition.
519 519
 
520 520
 1. Extend the volume group (VG) `vg-docker`.
... ...
@@ -550,7 +518,7 @@ disk partition.
550 550
 
551 551
 	c. Calculate the real total sectors of the thin pool now. we can use `blockdev` to get the real size of data lv.
552 552
 
553
-		Change the second number of the table info (i.e. the number of sectors) to
553
+		Change the second number of the table info (i.e. the disk end sector) to
554 554
 		reflect the new number of 512 byte sectors in the disk. For example, as the
555 555
 		new data `lv` size is `264132100096` bytes, change the second number to
556 556
 		`515883008`.
... ...
@@ -562,7 +530,6 @@ disk partition.
562 562
 
563 563
 		$ sudo dmsetup suspend docker-253:17-1835016-pool && sudo dmsetup reload docker-253:17-1835016-pool --table  '0 515883008 thin-pool 252:0 252:1 128 32768 1 skip_block_zeroing' && sudo dmsetup resume docker-253:17-1835016-pool
564 564
 
565
-
566 565
 ## Device Mapper and Docker performance
567 566
 
568 567
 It is important to understand the impact that allocate-on-demand and