kernel-hacking-2024-linux-s.../mm
David Chen 462a8e08e0 Fix page corruption caused by racy check in __free_pages
When we upgraded our kernel, we started seeing some page corruption like
the following consistently:

  BUG: Bad page state in process ganesha.nfsd  pfn:1304ca
  page:0000000022261c55 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0 pfn:0x1304ca
  flags: 0x17ffffc0000000()
  raw: 0017ffffc0000000 ffff8a513ffd4c98 ffffeee24b35ec08 0000000000000000
  raw: 0000000000000000 0000000000000001 00000000ffffff7f 0000000000000000
  page dumped because: nonzero mapcount
  CPU: 0 PID: 15567 Comm: ganesha.nfsd Kdump: loaded Tainted: P    B      O      5.10.158-1.nutanix.20221209.el7.x86_64 #1
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
  Call Trace:
   dump_stack+0x74/0x96
   bad_page.cold+0x63/0x94
   check_new_page_bad+0x6d/0x80
   rmqueue+0x46e/0x970
   get_page_from_freelist+0xcb/0x3f0
   ? _cond_resched+0x19/0x40
   __alloc_pages_nodemask+0x164/0x300
   alloc_pages_current+0x87/0xf0
   skb_page_frag_refill+0x84/0x110
   ...

Sometimes, it would also show up as corruption in the free list pointer
and cause crashes.

After bisecting the issue, we found the issue started from commit
e320d3012d ("mm/page_alloc.c: fix freeing non-compound pages"):

	if (put_page_testzero(page))
		free_the_page(page, order);
	else if (!PageHead(page))
		while (order-- > 0)
			free_the_page(page + (1 << order), order);

So the problem is the check PageHead is racy because at this point we
already dropped our reference to the page.  So even if we came in with
compound page, the page can already be freed and PageHead can return
false and we will end up freeing all the tail pages causing double free.

Fixes: e320d3012d ("mm/page_alloc.c: fix freeing non-compound pages")
Link: https://lore.kernel.org/lkml/BYAPR02MB448855960A9656EEA81141FC94D99@BYAPR02MB4488.namprd02.prod.outlook.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-12 10:30:05 -08:00
..
damon mm/damon: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
kasan kasan: mark kasan_kunit_executing as static 2023-01-11 16:14:21 -08:00
kfence hardening updates for v6.2-rc1 2022-12-14 12:20:00 -08:00
kmsan kmsan: export kmsan_handle_urb 2022-12-21 14:31:52 -08:00
backing-dev.c mm: add /sys/class/bdi/<bdi>/min_ratio_fine knob 2022-11-30 15:59:06 -08:00
balloon_compaction.c
bootmem_info.c
cma.c
cma.h
cma_debug.c
cma_sysfs.c
compaction.c Revert "mm/compaction: fix set skip in fast_find_migrateblock" 2023-01-29 10:38:43 -08:00
debug.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
debug_page_ref.c
debug_vm_pgtable.c mm: remove unused savedwrite infrastructure 2022-11-30 15:58:49 -08:00
dmapool.c
early_ioremap.c
fadvise.c mm/fadvise: use LLONG_MAX instead of -1 for eof 2022-12-11 18:12:11 -08:00
failslab.c
filemap.c filemap: convert replace_page_cache_page() to replace_page_cache_folio() 2022-12-11 18:12:12 -08:00
folio-compat.c folio-compat: remove lru_cache_add() 2022-12-11 18:12:13 -08:00
frontswap.c
gup.c New Feature: 2022-12-17 14:06:53 -06:00
gup_test.c mm/gup_test: free memory allocated via kvcalloc() using kvfree() 2022-12-15 16:37:48 -08:00
gup_test.h
highmem.c
hmm.c mm: Remove pointless barrier() after pmdp_get_lockless() 2022-12-15 10:37:27 -08:00
huge_memory.c ARM64: 2022-12-15 11:12:21 -08:00
hugetlb.c mm: fix a few rare cases of using swapin error pte marker 2023-01-18 17:02:19 -08:00
hugetlb_cgroup.c mm/hugeltb_cgroup: convert hugetlb_cgroup_commit_charge*() to folios 2022-11-30 15:58:43 -08:00
hugetlb_vmemmap.c mm/hugetlb_vmemmap: remap head page to newly allocated page 2022-11-30 15:58:47 -08:00
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
io-mapping.c
ioremap.c
Kconfig New Feature: 2022-12-17 14:06:53 -06:00
Kconfig.debug
khugepaged.c mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookups 2023-01-31 16:44:09 -08:00
kmemleak.c mm: use stack_depot_early_init for kmemleak 2023-01-31 16:44:10 -08:00
ksm.c mm/ksm: convert break_ksm() to use walk_page_range_vma() 2022-12-11 18:12:09 -08:00
list_lru.c
maccess.c
madvise.c mm: update mmap_sem comments to refer to mmap_lock 2023-01-11 16:14:22 -08:00
Makefile
mapping_dirty_helpers.c mm: Rename pmd_read_atomic() 2022-12-15 10:37:27 -08:00
memblock.c Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." 2023-02-07 13:07:37 +02:00
memcontrol.c Revert "mm: add nodes= arg to memory.reclaim" 2023-01-31 16:44:07 -08:00
memfd.c
memory-failure.c mm/memory-failure.c: cleanup in unpoison_memory 2022-11-30 15:59:08 -08:00
memory-tiers.c mm/demotion: fix NULL vs IS_ERR checking in memory_tier_init 2022-11-30 15:58:54 -08:00
memory.c mm: fix a few rare cases of using swapin error pte marker 2023-01-18 17:02:19 -08:00
memory_hotplug.c
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-01-31 16:44:09 -08:00
mempool.c mempool: do not use ksize() for poisoning 2022-11-30 15:58:41 -08:00
memremap.c
memtest.c
migrate.c MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
migrate_device.c
mincore.c
mlock.c
mm_init.c
mm_slot.h
mmap.c mm: update mmap_sem comments to refer to mmap_lock 2023-01-11 16:14:22 -08:00
mmap_lock.c
mmu_gather.c mm: mmu_gather: allow more than one batch of delayed rmaps 2022-12-11 18:12:21 -08:00
mmu_notifier.c
mmzone.c
mprotect.c mm: fix a few rare cases of using swapin error pte marker 2023-01-18 17:02:19 -08:00
mremap.c mm, mremap: fix mremap() expanding for vma's with vm_ops->close() 2023-01-31 16:44:08 -08:00
msync.c
nommu.c nommu: fix split_vma() map_count error 2023-01-11 16:14:23 -08:00
oom_kill.c
page-writeback.c mm: add bdi_set_min_ratio_no_scale() function 2022-11-30 15:59:06 -08:00
page_alloc.c Fix page corruption caused by racy check in __free_pages 2023-02-12 10:30:05 -08:00
page_counter.c
page_ext.c Merge branch 'mm-hotfixes-stable' into mm-stable 2022-11-30 14:58:42 -08:00
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c mm/page_reporting: Add checks for page_reporting_order param 2022-11-28 16:48:19 +00:00
page_reporting.h
page_table_check.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
page_vma_mapped.c
pagewalk.c mm/pagewalk: add walk_page_range_vma() 2022-12-11 18:12:08 -08:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c
process_vm_access.c
ptdump.c
readahead.c
rmap.c mm,thp,rmap: fix races between updates of subpages_mapcount 2022-12-11 18:12:20 -08:00
rodata_test.c
secretmem.c
shmem.c mm/shmem: restore SHMEM_HUGE_DENY precedence over MADV_COLLAPSE 2023-01-11 16:14:20 -08:00
shrinker_debug.c
shuffle.c
shuffle.h
slab.c mm, slab: periodically resched in drain_freelist() 2023-01-02 09:31:05 +01:00
slab.h Merge branch 'slub-tiny-v1r6' into slab/for-next 2022-12-01 00:14:00 +01:00
slab_common.c hardening updates for v6.2-rc1 2022-12-14 12:20:00 -08:00
slob.c
slub.c MM patches for 6.2-rc1. 2022-12-13 19:29:45 -08:00
sparse-vmemmap.c mm/sparse-vmemmap: generalise vmemmap_populate_hugepages() 2022-12-11 18:12:12 -08:00
sparse.c
swap.c mm: teach release_pages() to take an array of encoded page pointers too 2022-11-30 15:58:50 -08:00
swap.h
swap_cgroup.c
swap_slots.c
swap_state.c mm: mmu_gather: prepare to gather encoded page pointers with flags 2022-11-30 15:58:50 -08:00
swapfile.c mm/swapfile: add cond_resched() in get_swap_pages() 2023-01-31 16:44:10 -08:00
truncate.c folio-compat: remove lru_cache_add() 2022-12-11 18:12:13 -08:00
usercopy.c mm: use kstrtobool() instead of strtobool() 2022-11-30 15:58:45 -08:00
userfaultfd.c New Feature: 2022-12-17 14:06:53 -06:00
util.c mm,thp,rmap: simplify compound page mapcount handling 2022-11-30 15:58:46 -08:00
vmalloc.c
vmpressure.c
vmscan.c mm: multi-gen LRU: fix crash during cgroup migration 2023-01-31 16:44:08 -08:00
vmstat.c mm: vmscan: split khugepaged stats from direct reclaim stats 2022-11-30 15:58:41 -08:00
workingset.c folio-compat: remove lru_cache_add() 2022-12-11 18:12:13 -08:00
z3fold.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zbud.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zpool.c zpool: clean out dead code 2022-12-11 18:12:10 -08:00
zsmalloc.c zsmalloc: fix a race with deferred_handles storing 2023-01-31 16:44:07 -08:00
zswap.c zswap: fix writeback lock ordering for zsmalloc 2022-12-11 18:12:09 -08:00