Browse code

linux, secure: Switch I/O scheduler from 'cfq' to 'deadline' to fix performance issue

The CFQ I/O scheduler has known performance issues when used in
certain OS configurations. For example, we can see between 10x - 30x
drop in I/O throughput running the following command, with the CFQ I/O
scheduler:

dd if=/dev/zero of=/root/test.img bs=512 count=10000 oflags=dsync

Throughput with CFQ: 60 KB/s
Throughput with noop or deadline: 1.5 MB/s - 2 MB/s

This performance drop is caused by the undesirable interaction between
4 different components:

- blkio cgroup controller enabled
- ext4 with the jbd2 kthread running in the root blkio cgroup
- dd running on ext4, in any other blkio cgroup than that of jbd2
- CFQ I/O scheduler with defaults for slice_idle and group_idle

When docker is enabled, systemd creates a blkio cgroup called
system.slice to run system services (and docker) under it, and a
separate blkio cgroup called user.slice for user processes. So, when
dd is invoked, it runs under user.slice.

The dd command above includes the dsync flag, which performs an
fdatasync after every write to the output file. Since dd is writing to
a file on ext4, jbd2 will be active, committing transactions
corresponding to those fdatasync requests from dd. (In other words, dd
depends on jdb2, in order to make forward progress). But jdb2 being a
kernel thread, runs in the root blkio cgroup, as opposed to dd, which
runs under user.slice.

Now, if the I/O scheduler in use for the underlying block device is
CFQ, then its inter-queue/inter-group idling takes effect (via the
slice_idle and group_idle parameters, both of which default to 8ms).
Therefore, everytime CFQ switches between processing requests from dd
vs jbd2, this 8ms idle time is injected, which slows down the overall
throughput tremendously!

Unfortunately, the pre-conditions that cause this performance drop
correspond to most of the common configurations of Photon OS! Fixing
CFQ itself is challenging (and is still being discussed on the linux
kernel mailing list [1]), so switch the default I/O scheduler to
'deadline' in the meantime.

For more details on this problem, as well as the ongoing discussion
around its fix, refer to [1].

[1]. https://lore.kernel.org/lkml/8d72fcf7-bbb4-2965-1a06-e9fc177a8938@csail.mit.edu/

Change-Id: I257deacbfd15cfe35f99072440da4e09472e2ed3
Reviewed-on: http://photon-jenkins.eng.vmware.com:8082/7323
Tested-by: gerrit-photon <photon-checkins@vmware.com>
Reviewed-by: Srinidhi Rao <srinidhir@vmware.com>
Reviewed-by: Alexey Makhalov <amakhalov@vmware.com>
(cherry picked from commit 7974ca9f70e37dee132758ead578700f7369c1c9)
Reviewed-on: http://photon-jenkins.eng.vmware.com:8082/7341
Reviewed-by: Srivatsa S. Bhat <srivatsab@vmware.com>

Srivatsa S. Bhat (VMware) authored on 2019/05/29 08:52:32
Showing 4 changed files
... ...
@@ -1,6 +1,6 @@
1 1
 #
2 2
 # Automatically generated file; DO NOT EDIT.
3
-# Linux/x86 4.19.26 Kernel Configuration
3
+# Linux/x86 4.19.40 Kernel Configuration
4 4
 #
5 5
 
6 6
 #
... ...
@@ -869,10 +869,10 @@ CONFIG_IOSCHED_NOOP=y
869 869
 CONFIG_IOSCHED_DEADLINE=y
870 870
 CONFIG_IOSCHED_CFQ=y
871 871
 CONFIG_CFQ_GROUP_IOSCHED=y
872
-# CONFIG_DEFAULT_DEADLINE is not set
873
-CONFIG_DEFAULT_CFQ=y
872
+CONFIG_DEFAULT_DEADLINE=y
873
+# CONFIG_DEFAULT_CFQ is not set
874 874
 # CONFIG_DEFAULT_NOOP is not set
875
-CONFIG_DEFAULT_IOSCHED="cfq"
875
+CONFIG_DEFAULT_IOSCHED="deadline"
876 876
 CONFIG_MQ_IOSCHED_DEADLINE=y
877 877
 CONFIG_MQ_IOSCHED_KYBER=y
878 878
 # CONFIG_IOSCHED_BFQ is not set
... ...
@@ -1,6 +1,6 @@
1 1
 #
2 2
 # Automatically generated file; DO NOT EDIT.
3
-# Linux/x86 4.19.26 Kernel Configuration
3
+# Linux/x86 4.19.40 Kernel Configuration
4 4
 #
5 5
 
6 6
 #
... ...
@@ -835,10 +835,10 @@ CONFIG_IOSCHED_NOOP=y
835 835
 CONFIG_IOSCHED_DEADLINE=y
836 836
 CONFIG_IOSCHED_CFQ=y
837 837
 CONFIG_CFQ_GROUP_IOSCHED=y
838
-# CONFIG_DEFAULT_DEADLINE is not set
839
-CONFIG_DEFAULT_CFQ=y
838
+CONFIG_DEFAULT_DEADLINE=y
839
+# CONFIG_DEFAULT_CFQ is not set
840 840
 # CONFIG_DEFAULT_NOOP is not set
841
-CONFIG_DEFAULT_IOSCHED="cfq"
841
+CONFIG_DEFAULT_IOSCHED="deadline"
842 842
 # CONFIG_MQ_IOSCHED_DEADLINE is not set
843 843
 # CONFIG_MQ_IOSCHED_KYBER is not set
844 844
 # CONFIG_IOSCHED_BFQ is not set
... ...
@@ -2,7 +2,7 @@
2 2
 Summary:        Kernel
3 3
 Name:           linux-secure
4 4
 Version:        4.19.40
5
-Release:        2%{?kat_build:.%kat_build}%{?dist}
5
+Release:        3%{?kat_build:.%kat_build}%{?dist}
6 6
 License:        GPLv2
7 7
 URL:            http://www.kernel.org/
8 8
 Group:          System Environment/Kernel
... ...
@@ -239,6 +239,8 @@ ln -sf linux-%{uname_r}.cfg /boot/photon.cfg
239 239
 /usr/src/linux-headers-%{uname_r}
240 240
 
241 241
 %changelog
242
+*   Tue May 28 2019 Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> 4.19.40-3
243
+-   Change default I/O scheduler to 'deadline' to fix performance issue.
242 244
 *   Tue May 14 2019 Keerthana K <keerthanak@vmware.com> 4.19.40-2
243 245
 -   Fix to parse through /boot folder and update symlink (/boot/photon.cfg) if
244 246
 -   mulitple kernels are installed and current linux kernel is removed.
... ...
@@ -2,7 +2,7 @@
2 2
 Summary:        Kernel
3 3
 Name:           linux
4 4
 Version:        4.19.40
5
-Release:        2%{?kat_build:.%kat_build}%{?dist}
5
+Release:        3%{?kat_build:.%kat_build}%{?dist}
6 6
 License:    	GPLv2
7 7
 URL:        	http://www.kernel.org/
8 8
 Group:        	System Environment/Kernel
... ...
@@ -442,6 +442,8 @@ ln -sf %{name}-%{uname_r}.cfg /boot/photon.cfg
442 442
 %endif
443 443
 
444 444
 %changelog
445
+*   Tue May 28 2019 Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> 4.19.40-3
446
+-   Change default I/O scheduler to 'deadline' to fix performance issue.
445 447
 *   Tue May 14 2019 Keerthana K <keerthanak@vmware.com> 4.19.40-2
446 448
 -   Fix to parse through /boot folder and update symlink (/boot/photon.cfg) if
447 449
 -   mulitple kernels are installed and current linux kernel is removed.