GitList

Browse code

linux: revert 'Beef up wake_wide' change

Change-Id: Ib1e852a5aefb8ee02f885a5c503ef2a0821aa178
Reviewed-on: http://photon-jenkins.eng.vmware.com/1025
Reviewed-by: Sharath George
Tested-by: Sharath George
(cherry picked from commit 56a89e73bc36f9130daa02c88425d5ff21bd8f73)
Reviewed-on: http://photon-jenkins.eng.vmware.com/1035

Alexey Makhalov authored on 2016/05/27 04:41:51
Showing 2 changed files

SPECS/linux/REVERT-sched-fair-Beef-up-wake_wide.patch index 0000000..6db445f
SPECS/linux/linux.spec index 050544a..4100c4d 100644

SPECS/linux/REVERT-sched-fair-Beef-up-wake_wide.patch

History View file @ f54cf22

                     new file mode 100644
@@ -0,0 +1,174 @@
                     +From 63b0e9edceec10fa41ec33393a1515a5ff444277 Mon Sep 17 00:00:00 2001
                     +From: Mike Galbraith <umgwanakikbuti@gmail.com>
                     +Date: Tue, 14 Jul 2015 17:39:50 +0200
                     +Subject: [PATCH] sched/fair: Beef up wake_wide()
+                    +
                     +Josef Bacik reported that Facebook sees better performance with their
                     +1:N load (1 dispatch/node, N workers/node) when carrying an old patch
                     +to try very hard to wake to an idle CPU.  While looking at wake_wide(),
                     +I noticed that it doesn't pay attention to the wakeup of a many partner
                     +waker, returning 1 only when waking one of its many partners.
+                    +
                     +Correct that, letting explicit domain flags override the heuristic.
+                    +
                     +While at it, adjust task_struct bits, we don't need a 64-bit counter.
+                    +
                     +Tested-by: Josef Bacik <jbacik@fb.com>
                     +Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
                     +[ Tidy things up. ]
                     +Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
                     +Cc: Linus Torvalds <torvalds@linux-foundation.org>
                     +Cc: Mike Galbraith <efault@gmx.de>
                     +Cc: Peter Zijlstra <peterz@infradead.org>
                     +Cc: Thomas Gleixner <tglx@linutronix.de>
                     +Cc: kernel-team<Kernel-team@fb.com>
                     +Cc: morten.rasmussen@arm.com
                     +Cc: riel@redhat.com
                     +Link: http://lkml.kernel.org/r/1436888390.7983.49.camel@gmail.com
                     +Signed-off-by: Ingo Molnar <mingo@kernel.org>
                     +---
                     + include/linux/sched.h |  4 +--
                     + kernel/sched/fair.c   | 67 ++++++++++++++++++++++++++-------------------------
                     + 2 files changed, 36 insertions(+), 35 deletions(-)
+                    +
                     +diff --git b/include/linux/sched.h a/include/linux/sched.h
                     +index 65a8a86..7412070 100644
                     +--- b/include/linux/sched.h
                     +@@ -1359,9 +1359,9 @@ struct task_struct {
                     + #ifdef CONFIG_SMP
                     + 	struct llist_node wake_entry;
                     + 	int on_cpu;
                     +-	unsigned int wakee_flips;
                     +-	unsigned long wakee_flip_decay_ts;
                     + 	struct task_struct *last_wakee;
                     ++	unsigned long wakee_flips;
                     ++	unsigned long wakee_flip_decay_ts;
+                    +
                     + 	int wake_cpu;
                     + #endif
                     +diff --git b/kernel/sched/fair.c a/kernel/sched/fair.c
                     +index ea23f9f..8b384b8d 100644
                     +--- b/kernel/sched/fair.c
                     +@@ -4726,29 +4726,26 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
+                    +
                     + #endif
+                    +
                     +-/*
                     +- * Detect M:N waker/wakee relationships via a switching-frequency heuristic.
                     +- * A waker of many should wake a different task than the one last awakened
                     +- * at a frequency roughly N times higher than one of its wakees.  In order
                     +- * to determine whether we should let the load spread vs consolodating to
                     +- * shared cache, we look for a minimum 'flip' frequency of llc_size in one
                     +- * partner, and a factor of lls_size higher frequency in the other.  With
                     +- * both conditions met, we can be relatively sure that the relationship is
                     +- * non-monogamous, with partner count exceeding socket size.  Waker/wakee
                     +- * being client/server, worker/dispatcher, interrupt source or whatever is
                     +- * irrelevant, spread criteria is apparent partner count exceeds socket size.
                     +- */
                     + static int wake_wide(struct task_struct *p)
                     + {
                     +-	unsigned int master = current->wakee_flips;
                     +-	unsigned int slave = p->wakee_flips;
                     + 	int factor = this_cpu_read(sd_llc_size);
+                    +
                     +-	if (master < slave)
                     +-		swap(master, slave);
                     +-	if (slave < factor || master < slave * factor)
                     +-		return 0;
                     +-	return 1;
                     ++	/*
                     ++	 * Yeah, it's the switching-frequency, could means many wakee or
                     ++	 * rapidly switch, use factor here will just help to automatically
                     ++	 * adjust the loose-degree, so bigger node will lead to more pull.
                     ++	 */
                     ++	if (p->wakee_flips > factor) {
                     ++		/*
                     ++		 * wakee is somewhat hot, it needs certain amount of cpu
                     ++		 * resource, so if waker is far more hot, prefer to leave
                     ++		 * it alone.
                     ++		 */
                     ++		if (current->wakee_flips > (factor * p->wakee_flips))
                     ++			return 1;
                     ++	}
                     ++
                     ++	return 0;
                     + }
+                    +
                     + static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
                     +@@ -4760,6 +4757,13 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
                     + 	unsigned long weight;
                     + 	int balanced;
+                    +
                     ++	/*
                     ++	 * If we wake multiple tasks be careful to not bounce
                     ++	 * ourselves around too much.
                     ++	 */
                     ++	if (wake_wide(p))
                     ++		return 0;
                     ++
                     + 	idx	  = sd->wake_idx;
                     + 	this_cpu  = smp_processor_id();
                     + 	prev_cpu  = task_cpu(p);
                     +@@ -5013,17 +5017,17 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
                     + {
                     + 	struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL;
                     + 	int cpu = smp_processor_id();
                     +-	int new_cpu = prev_cpu;
                     ++	int new_cpu = cpu;
                     + 	int want_affine = 0;
                     + 	int sync = wake_flags & WF_SYNC;
+                    +
                     + 	if (sd_flag & SD_BALANCE_WAKE)
                     +-		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
                     ++		want_affine = cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
+                    +
                     + 	rcu_read_lock();
                     + 	for_each_domain(cpu, tmp) {
                     + 		if (!(tmp->flags & SD_LOAD_BALANCE))
                     +-			break;
                     ++			continue;
+                    +
                     + 		/*
                     + 		 * If both cpu and prev_cpu are part of this domain,
                     +@@ -5037,21 +5041,17 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
+                    +
                     + 		if (tmp->flags & sd_flag)
                     + 			sd = tmp;
                     +-		else if (!want_affine)
                     +-			break;
                     + 	}
+                    +
                     +-	if (affine_sd) {
                     +-		sd = NULL; /* Prefer wake_affine over balance flags */
                     +-		if (cpu != prev_cpu && wake_affine(affine_sd, p, sync))
                     +-			new_cpu = cpu;
                     +-	}
                     ++	if (affine_sd && cpu != prev_cpu && wake_affine(affine_sd, p, sync))
                     ++		prev_cpu = cpu;
+                    +
                     +-	if (!sd) {
                     +-		if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
                     +-			new_cpu = select_idle_sibling(p, new_cpu);
                     ++	if (sd_flag & SD_BALANCE_WAKE) {
                     ++		new_cpu = select_idle_sibling(p, prev_cpu);
                     ++		goto unlock;
                     ++	}
+                    +
                     +-	} else while (sd) {
                     ++	while (sd) {
                     + 		struct sched_group *group;
                     + 		int weight;
+                    +
                     +@@ -5085,6 +5085,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
                     + 		}
                     + 		/* while loop will break here if sd == NULL */
                     + 	}
                     ++unlock:
                     + 	rcu_read_unlock();
+                    +
                     + 	return new_cpu;
                     +--
                     +1.9.1
+                    +

SPECS/linux/linux.spec

History View file @ f54cf22

@@ -2,7 +2,7 @@
                      Summary:        Kernel
                      Name:           linux
                      Version:    	4.4.8
                     -Release:    	5%{?dist}
                     +Release:    	6%{?dist}
                      License:    	GPLv2
                      URL:        	http://www.kernel.org/
                      Group:        	System Environment/Kernel
@@ -22,6 +22,7 @@ Patch6:         net-Driver-Vmxnet3-set-CHECKSUM_UNNECESSARY-for-IPv6-packets.pat
                      Patch7:		netfilter-x_tables-deal-with-bogus-nextoffset-values.patch
                      #fixes CVE-2016-3135
                      Patch8:		netfilter-x_tables-check-for-size-overflow.patch
                     +Patch9:		REVERT-sched-fair-Beef-up-wake_wide.patch
                      BuildRequires:  bc
                      BuildRequires:  kbd
                      BuildRequires:  kmod
@@ -86,6 +87,7 @@ Kernel driver for oprofile, a statistical profiler for Linux systems
                      %patch6 -p1
                      %patch7 -p1
                      %patch8 -p1
                     +%patch9 -p1
                      %build
                      make mrproper
@@ -182,10 +184,12 @@ ln -s /usr/lib/debug/lib/modules/%{version}/vmlinux-%{version}.debug /boot/vmlin
                      /lib/modules/%{version}/kernel/arch/x86/oprofile/
                      %changelog
                     -*	Tue May 24 2016 Priyesh Padmavilasom <ppadmavilasom@vmware.com> 4.4.8-5
                     --	GA - Bump release of all rpms
                     -*	Mon May 23 2016 Harish Udaiya Kumar <hudaiyakumar@vmware.com> 4.4.8-4
                     --	Fixed generation of debug symbols for kernel modules & vmlinux.
                     +*   Thu May 26 2016 Alexey Makhalov <amakhalov@vmware.com> 4.4.8-6
                     +-   patch: REVERT-sched-fair-Beef-up-wake_wide.patch
                     +*   Tue May 24 2016 Priyesh Padmavilasom <ppadmavilasom@vmware.com> 4.4.8-5
                     +-   GA - Bump release of all rpms
                     +*   Mon May 23 2016 Harish Udaiya Kumar <hudaiyakumar@vmware.com> 4.4.8-4
                     +-   Fixed generation of debug symbols for kernel modules & vmlinux.
                      *   Mon May 23 2016 Divya Thaluru <dthaluru@vmware.com> 4.4.8-3
                      -   Added patches to fix CVE-2016-3134, CVE-2016-3135
                      *   Wed May 18 2016 Harish Udaiya Kumar <hudaiyakumar@vmware.com> 4.4.8-2