kernel-hacking-2024-linux-s.../mm
Kirill A. Shutemov 5f7377147c thp: fix deadlock in split_huge_pmd()
split_huge_pmd() tries to munlock page with munlock_vma_page().  That
requires the page to locked.

If the is locked by caller, we would get a deadlock:

	Unable to find swap-space signature
	INFO: task trinity-c85:1907 blocked for more than 120 seconds.
	      Not tainted 4.4.0-00032-gf19d0bdced41-dirty #1606
	"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	trinity-c85     D ffff88084d997608     0  1907    309 0x00000000
	Call Trace:
	  schedule+0x9f/0x1c0
	  schedule_timeout+0x48e/0x600
	  io_schedule_timeout+0x1c3/0x390
	  bit_wait_io+0x29/0xd0
	  __wait_on_bit_lock+0x94/0x140
	  __lock_page+0x1d4/0x280
	  __split_huge_pmd+0x5a8/0x10f0
	  split_huge_pmd_address+0x1d9/0x230
	  try_to_unmap_one+0x540/0xc70
	  rmap_walk_anon+0x284/0x810
	  rmap_walk_locked+0x11e/0x190
	  try_to_unmap+0x1b1/0x4b0
	  split_huge_page_to_list+0x49d/0x18a0
	  follow_page_mask+0xa36/0xea0
	  SyS_move_pages+0xaf3/0x1570
	  entry_SYSCALL_64_fastpath+0x12/0x6b
	2 locks held by trinity-c85/1907:
	 #0:  (&mm->mmap_sem){++++++}, at:  SyS_move_pages+0x933/0x1570
	 #1:  (&anon_vma->rwsem){++++..}, at:  split_huge_page_to_list+0x402/0x18a0

I don't think the deadlock is triggerable without split_huge_page()
simplifilcation patchset.

But munlock_vma_page() here is wrong: we want to munlock the page
unconditionally, no need in rmap lookup, that munlock_vma_page() does.

Let's use clear_page_mlock() instead.  It can be called under ptl.

Fixes: e90309c9f7 ("thp: allow mlocked THP again")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
..
kasan mm: coalesce split strings 2016-03-17 15:09:34 -07:00
backing-dev.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
balloon_compaction.c
bootmem.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
cleancache.c cleancache: constify cleancache_ops structure 2016-01-27 09:09:57 -05:00
cma.c
cma.h
cma_debug.c
compaction.c mm, kswapd: replace kswapd compaction with waking up kcompactd 2016-03-17 15:09:34 -07:00
debug.c mm: introduce page reference manipulation functions 2016-03-17 15:09:34 -07:00
debug_page_ref.c mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
dmapool.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
early_ioremap.c
fadvise.c
failslab.c mm: fault-inject take over bootstrap kmem_cache check 2016-03-15 16:55:16 -07:00
filemap.c mm: remove unnecessary uses of lock_page_memcg() 2016-03-15 16:55:16 -07:00
frame_vector.c
frontswap.c
gup.c mm: retire GUP WARN_ON_ONCE that outlived its usefulness 2016-02-03 08:57:14 -08:00
highmem.c
huge_memory.c thp: fix deadlock in split_huge_pmd() 2016-03-17 15:09:34 -07:00
hugetlb.c mm: convert pr_warning to pr_warn 2016-03-17 15:09:34 -07:00
hugetlb_cgroup.c
hwpoison-inject.c
init-mm.c
internal.h mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
interval_tree.c
Kconfig mm: ZONE_DEVICE depends on SPARSEMEM_VMEMMAP 2016-03-17 15:09:34 -07:00
Kconfig.debug mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
kmemcheck.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak-test.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
kmemleak.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
ksm.c mm/ksm.c: mark stable page dirty 2016-01-15 17:56:32 -08:00
list_lru.c mm: memcontrol: move kmem accounting code to CONFIG_MEMCG 2016-01-20 17:09:18 -08:00
maccess.c
madvise.c mm/madvise: update comment on sys_madvise() 2016-03-15 16:55:16 -07:00
Makefile mm/page_ref: add tracepoint to track down page reference manipulation 2016-03-17 15:09:34 -07:00
memblock.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
memcontrol.c mm: memcontrol: cleanup css_reset callback 2016-03-17 15:09:34 -07:00
memory-failure.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
memory.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
memory_hotplug.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
mempolicy.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
mempool.c mm, mempool: only set __GFP_NOMEMALLOC if there are free elements 2016-03-17 15:09:34 -07:00
memtest.c
migrate.c mm: make remove_migration_ptes() beyond mm/migration.c 2016-03-17 15:09:34 -07:00
mincore.c thp: change pmd_trans_huge_lock() interface to return ptl 2016-01-21 17:20:51 -08:00
mlock.c mm: fix mlock accouting 2016-01-21 17:20:51 -08:00
mm_init.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
mmap.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
mmu_context.c
mmu_notifier.c
mmzone.c
mprotect.c mm, dax: check for pmd_none() after split_huge_pmd() 2016-02-11 18:35:48 -08:00
mremap.c mm: cleanup *pte_alloc* interfaces 2016-03-17 15:09:34 -07:00
msync.c
nobootmem.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
nommu.c mm: deduplicate memory overcommitment code 2016-03-17 15:09:34 -07:00
oom_kill.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
page-writeback.c mm: remove unnecessary uses of lock_page_memcg() 2016-03-15 16:55:16 -07:00
page_alloc.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
page_counter.c
page_ext.c mm/page_poisoning.c: allow for zero poisoning 2016-03-15 16:55:16 -07:00
page_idle.c mm: add page_check_address_transhuge() helper 2016-01-15 17:56:32 -08:00
page_io.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
page_isolation.c mm/page_isolation: do some cleanup in "undo_isolate_page_range" 2016-01-15 17:56:32 -08:00
page_owner.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
page_poison.c mm/page_poisoning.c: allow for zero poisoning 2016-03-15 16:55:16 -07:00
pagewalk.c
percpu-km.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
percpu-vm.c
percpu.c mm: percpu: use pr_fmt to prefix output 2016-03-17 15:09:34 -07:00
pgtable-generic.c mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range 2016-03-17 15:09:34 -07:00
process_vm_access.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-01-20 17:09:18 -08:00
quicklist.c
readahead.c
rmap.c thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers 2016-03-17 15:09:34 -07:00
shmem.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
slab.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
slab.h mm: memcontrol: report slab usage in cgroup2 memory.stat 2016-03-17 15:09:34 -07:00
slab_common.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
slob.c mm: slab: free kmem_cache_node after destroy sysfs file 2016-02-18 16:23:24 -08:00
slub.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
sparse-vmemmap.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
sparse.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
swap.c mm, x86: get_user_pages() for dax mappings 2016-01-15 17:56:32 -08:00
swap_cgroup.c mm: convert printk(KERN_<LEVEL> to pr_<level> 2016-03-17 15:09:34 -07:00
swap_state.c mm: memcontrol: charge swap to cgroup2 2016-01-20 17:09:18 -08:00
swapfile.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
truncate.c mm: remove unnecessary uses of lock_page_memcg() 2016-03-15 16:55:16 -07:00
userfaultfd.c mm: cleanup *pte_alloc* interfaces 2016-03-17 15:09:34 -07:00
util.c mm: deduplicate memory overcommitment code 2016-03-17 15:09:34 -07:00
vmacache.c
vmalloc.c mm: coalesce split strings 2016-03-17 15:09:34 -07:00
vmpressure.c mm/vmpressure.c: fix subtree pressure detection 2016-02-03 08:28:43 -08:00
vmscan.c mm: introduce page reference manipulation functions 2016-03-17 15:09:34 -07:00
vmstat.c thp, vmstats: count deferred split events 2016-03-17 15:09:34 -07:00
workingset.c mm: workingset: make shadow node shrinker memcg aware 2016-03-17 15:09:34 -07:00
zbud.c
zpool.c
zsmalloc.c zsmalloc: fix migrate_zspage-zs_free race condition 2016-01-20 17:09:18 -08:00
zswap.c