GitList

Raw Blame History
From 23fbe22ee9dc3c63a494eb11c1f16574e046fa3f Mon Sep 17 00:00:00 2001
From: Brennan Lamoreaux <blamoreaux@vmware.com>
Date: Wed, 26 Apr 2023 20:51:13 +0000
Subject: [PATCH] tuned: don't verify irq 0 on x86_64

IRQ0 on the x86 architecture is the HPET/PIT timer interrupt (arch/x86/kernel/time.c).
This IRQ is setup with IRQF_NOBALANCING, a "flag to exclude this
interrupt from irq balancing" (include/linux/interrupt.h). Without IRQ
balancing, IRQ affinity cannot and should not be set by the kernel or
userspace. For example, irq_can_set_affinity_usr() will check for the
balance flag when determining if the user should be allowed to set the
affinity of the specified interrupt. This also happens other places in
kernel code, including __irq_can_set_affinity()  and irq_setup_affinity().

The exception is that "irqaffinity" on the kernel command line will actually
set the affinity of IRQ0 to be the correct value (mostly). We can see this
in irq_affinity_setup(), where irq_default_affinity is set based on
"irqaffinity" value, and irq_default_affinity is the default affinity mask
for all IRQs (alloc_desc() will use this value when initializing the irq
descriptor structs on boot). However, the boot CPU is always included in
this value, so the affinity for IRQ0 will always include the boot CPU, normally
CPU 0.

IRQ 0 is also set to not be threaded with the IRQF_TIMER flag, which includes
the IRQF_NOTHREAD bit. As well as the flag, we can see from the following code flow:
hpet_time_init() -> setup_default_timer_irq() -> request_irq() ->
    request_threaded_irq( thread_fn=NULL ) -> __setup_irq()

Since thread_fn is NULL as passed from request_irq() to request_threaded_irq(), there
is no way to thread this interrupt and have it end up on different CPUs.

Without balancing/threading, the IRQ will be handled only on the CPU where it occur
in theory this should be CPU 0, and this seems to hold for all VMs I have observed.
Ref: https://github.com/torvalds/linux/commit/3c9aea47425885ec8b1f7b0df88c2ebc6f747c9d

If HPET/PIT is not present, the kernel will not register any interrupt handler for
it, although the interrupt line may be present (and the kernel will still
allocate some data structure for it if so). Although I can't find any
way it is reserved in the kernel, the irq0 line should be reserved
for the system timer by the hardware itself unless in some very rare circumstances.

It is not surprising that tuned fails to verify the affinity of IRQ 0, given
the above info. It seems unnecessary to check the affinity of IRQ 0 on x86 machines,
as it is explicitly defined in kernel code so that it cannot be changed/threaded/balanced,
and the interrupts should always occur on the same CPU. It is doubly silly to worry about
this on Photon, with a kernel that disables these global timer
interrupts after boot anyways.

Patch tuned to skip verification of the smp affinity of irq 0 on x86 machines,
as it is unnecessary and leads to false negatives.
---
 tuned/plugins/plugin_scheduler.py | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tuned/plugins/plugin_scheduler.py b/tuned/plugins/plugin_scheduler.py
index df1561f..3aec69e 100644
--- a/tuned/plugins/plugin_scheduler.py
+++ b/tuned/plugins/plugin_scheduler.py
@@ -1329,7 +1329,13 @@ class SchedulerPlugin(base.Plugin):
 		irq_original = self._storage.get(self._irq_storage_key, None)
 		irqs = procfs.interrupts()
 		res = True
+		arch = os.uname().machine
 		for irq in irqs.keys():
+			# don't bother to check irq 0 on x86 (should be reserved for system timer)
+			# system timer typically only triggers interrupts on CPU 0
+			if irq == '0' and arch[0:3] == "x86":
+				continue
+
 			if irq in irq_original.unchangeable and ignore_missing:
 				description = "IRQ %s does not support changing SMP affinity" % irq
 				log.info(consts.STR_VERIFY_PROFILE_VALUE_MISSING % description)
--
2.39.0