kernel-hacking-2024-linux-s.../kernel/sched
Jan H. Schönherr dd9d384375 sched: Fix cpu_active_mask/cpu_online_mask race
There is a race condition in SMP bootup code, which may result
in

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()
or
    kernel BUG at kernel/smpboot.c:135!

It can be triggered with a bit of luck in Linux guests running
on busy hosts.

	CPU0                        CPUn
	====                        ====

	_cpu_up()
	  __cpu_up()
				    start_secondary()
				      set_cpu_online()
					cpumask_set_cpu(cpu,
						   to_cpumask(cpu_online_bits));
	  cpu_notify(CPU_ONLINE)
	    <do stuff, see below>
					cpumask_set_cpu(cpu,
						   to_cpumask(cpu_active_bits));

During the various CPU_ONLINE callbacks CPUn is online but not
active. Several things can go wrong at that point, depending on
the scheduling of tasks on CPU0.

Variant 1:

  cpu_notify(CPU_ONLINE)
    workqueue_cpu_up_callback()
      rebind_workers()
        set_cpus_allowed_ptr()

  This call fails because it requires an active CPU; rebind_workers()
  ends with a warning:

    WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418
    workqueue_cpu_up_callback()

Variant 2:

  cpu_notify(CPU_ONLINE)
    smpboot_thread_call()
      smpboot_unpark_threads()
       ..
        __kthread_unpark()
          __kthread_bind()
          wake_up_state()
           ..
            select_task_rq()
              select_fallback_rq()

  The ->wake_cpu of the unparked thread is not allowed, making a call
  to select_fallback_rq() necessary. Then, select_fallback_rq() cannot
  find an allowed, active CPU and promptly resets the allowed CPUs, so
  that the task in question ends up on CPU0.

  When those unparked tasks are eventually executed, they run
  immediately into a BUG:

    kernel BUG at kernel/smpboot.c:135!

Just changing the order in which the online/active bits are set
(and adding some memory barriers), would solve the two issues
above. However, it would change the order of operations back to
the one before commit 6acbfb9697 ("sched: Fix hotplug vs.
set_cpus_allowed_ptr()"), thus, reintroducing that particular
problem.

Going further back into history, we have at least the following
commits touching this topic:
- commit 2baab4e904 ("sched: Fix select_fallback_rq() vs cpu_active/cpu_online")
- commit 5fbd036b55 ("sched: Cleanup cpu_active madness")

Together, these give us the following non-working solutions:

  - secondary CPU sets active before online, because active is assumed to
    be a subset of online;

  - secondary CPU sets online before active, because the primary CPU
    assumes that an online CPU is also active;

  - secondary CPU sets online and waits for primary CPU to set active,
    because it might deadlock.

Commit 875ebe940d ("powerpc/smp: Wait until secondaries are
active & online") introduces an arch-specific solution to this
arch-independent problem.

Now, go for a more general solution without explicit waiting and
simply set active twice: once on the secondary CPU after online
was set and once on the primary CPU after online was seen.

set_cpus_allowed_ptr()")

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@vger.kernel.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Wilson <msw@amazon.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6acbfb9697 ("sched: Fix hotplug vs. set_cpus_allowed_ptr()")
Link: http://lkml.kernel.org/r/1439408156-18840-1-git-send-email-jschoenh@amazon.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 12:14:39 +02:00
..
auto_group.c sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE() 2015-05-08 12:11:32 +02:00
auto_group.h sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE() 2015-05-08 12:11:32 +02:00
clock.c kernel/sched/clock.c: add another clock for use with the soft lockup watchdog 2015-02-12 18:54:13 -08:00
completion.c sched/completion: Serialize completion_done() with complete() 2015-02-18 14:27:40 +01:00
core.c sched: Fix cpu_active_mask/cpu_online_mask race 2015-08-25 12:14:39 +02:00
cpuacct.c
cpuacct.h
cpudeadline.c sched/deadline: Remove cpu_active_mask from cpudl_find() 2015-02-04 07:52:29 +01:00
cpudeadline.h sched/deadline: Modify cpudl::free_cpus to reflect rd->online 2015-01-30 19:39:16 +01:00
cpupri.c
cpupri.h
cputime.c sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE() 2015-05-08 12:11:32 +02:00
deadline.c Merge branch 'sched-hrtimers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-06-24 15:09:40 -07:00
debug.c sched/numa: Fix numa balancing stats in /proc/pid/sched 2015-07-04 10:04:33 +02:00
fair.c sched/fair: Test list head instead of list entry in throttle_cfs_rq() 2015-07-06 14:15:08 +02:00
features.h sched/rt: Use IPI to trigger RT task push migration instead of pulling 2015-03-23 10:55:22 +01:00
idle.c sched / idle: Call default_idle_call() from cpuidle_enter_state() 2015-05-14 21:37:47 +02:00
idle_task.c
loadavg.c sched: Move the loadavg code to a more obvious location 2015-05-08 12:04:12 +02:00
Makefile sched: Move the loadavg code to a more obvious location 2015-05-08 12:04:12 +02:00
rt.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-06-22 18:57:44 -07:00
sched.h Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-07-04 08:56:53 -07:00
stats.c sched: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:37 -08:00
stats.h sched/stat: Simplify the sched_info accounting dependency 2015-07-04 10:04:30 +02:00
stop_task.c
wait.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-06-22 15:52:04 -07:00