Linux kernel modifications for the Kernel Hacking exam
Find a file
Tejun Heo f183464684 bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue
From 0aa2e9b921d6db71150633ff290199554f0842a8 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Wed, 23 May 2018 10:29:00 -0700

cgwb_release() punts the actual release to cgwb_release_workfn() on
system_wq.  Depending on the number of cgroups or block devices, there
can be a lot of cgwb_release_workfn() in flight at the same time.

We're periodically seeing close to 256 kworkers getting stuck with the
following stack trace and overtime the entire system gets stuck.

  [<ffffffff810ee40c>] _synchronize_rcu_expedited.constprop.72+0x2fc/0x330
  [<ffffffff810ee634>] synchronize_rcu_expedited+0x24/0x30
  [<ffffffff811ccf23>] bdi_unregister+0x53/0x290
  [<ffffffff811cd1e9>] release_bdi+0x89/0xc0
  [<ffffffff811cd645>] wb_exit+0x85/0xa0
  [<ffffffff811cdc84>] cgwb_release_workfn+0x54/0xb0
  [<ffffffff810a68d0>] process_one_work+0x150/0x410
  [<ffffffff810a71fd>] worker_thread+0x6d/0x520
  [<ffffffff810ad3dc>] kthread+0x12c/0x160
  [<ffffffff81969019>] ret_from_fork+0x29/0x40
  [<ffffffffffffffff>] 0xffffffffffffffff

The events leading to the lockup are...

1. A lot of cgwb_release_workfn() is queued at the same time and all
   system_wq kworkers are assigned to execute them.

2. They all end up calling synchronize_rcu_expedited().  One of them
   wins and tries to perform the expedited synchronization.

3. However, that invovles queueing rcu_exp_work to system_wq and
   waiting for it.  Because #1 is holding all available kworkers on
   system_wq, rcu_exp_work can't be executed.  cgwb_release_workfn()
   is waiting for synchronize_rcu_expedited() which in turn is waiting
   for cgwb_release_workfn() to free up some of the kworkers.

We shouldn't be scheduling hundreds of cgwb_release_workfn() at the
same time.  There's nothing to be gained from that.  This patch
updates cgwb release path to use a dedicated percpu workqueue with
@max_active of 1.

While this resolves the problem at hand, it might be a good idea to
isolate rcu_exp_work to its own workqueue too as it can be used from
various paths and is prone to this sort of indirect A-A deadlocks.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-23 15:28:50 -06:00
arch Remove jsflash driver 2018-05-15 13:56:16 -06:00
block blkdev_report_zones_ioctl(): Use vmalloc() to allocate large buffers 2018-05-22 11:58:07 -06:00
certs
crypto crypto: drbg - set freed buffers to NULL 2018-04-21 00:57:00 +08:00
Documentation DeviceTree fixes for 4.17: 2018-05-07 05:33:29 -10:00
drivers nbd: set discard granularity properly 2018-05-23 15:27:26 -06:00
firmware
fs block: consistently use GFP_NOIO instead of __GFP_NORECLAIM 2018-05-14 08:55:18 -06:00
include block: Split out bio_list_copy_data() 2018-05-14 13:16:10 -06:00
init Fix typo in comment. 2018-05-07 05:41:46 -10:00
ipc ipc/shm: fix use-after-free of shm file via remap_file_pages() 2018-04-13 17:10:27 -07:00
kernel block: consistently use GFP_NOIO instead of __GFP_NORECLAIM 2018-05-14 08:55:18 -06:00
lib sbitmap: fix race in wait batch accounting 2018-05-14 12:17:31 -06:00
LICENSES
mm bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue 2018-05-23 15:28:50 -06:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-05-03 18:57:03 -10:00
samples bpf: sockmap sample use clang flag, -target bpf 2018-04-23 23:42:21 +02:00
scripts DeviceTree fixes for 4.17: 2018-05-07 05:33:29 -10:00
security Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-04-24 17:58:51 -07:00
sound ALSA: pcm: Check PCM state at xfern compat ioctl 2018-05-02 08:54:54 +02:00
tools ACPI fix for 4.17-rc4 2018-05-04 05:43:33 -10:00
usr kbuild: rename built-in.o to built-in.a 2018-03-26 02:01:19 +09:00
virt KVM/arm fixes for 4.17, take #2 2018-05-05 23:05:31 +02:00
.clang-format clang-format: add configuration file 2018-04-11 10:28:35 -07:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap Merge candidates for 4.17 merge window 2018-04-06 17:35:43 -07:00
COPYING
CREDITS
Kbuild
Kconfig
MAINTAINERS block: fix MAINTAINERS email for nbd 2018-05-16 12:40:06 -06:00
Makefile Linux 4.17-rc4 2018-05-06 16:57:38 -10:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.