kernel-hacking-2024-linux-s.../mm
Kirill A. Shutemov bd56086f10 thp: fix split_huge_page() after mremap() of THP
Sasha Levin has reported KASAN out-of-bounds bug[1].  It points to "if
(!is_swap_pte(pte[i]))" in unfreeze_page_vma() as a problematic access.

The cause is that split_huge_page() doesn't handle THP correctly if it's
not allingned to PMD boundary.  It can happen after mremap().

Test-case (not always triggers the bug):

	#define _GNU_SOURCE
	#include <stdio.h>
	#include <stdlib.h>
	#include <sys/mman.h>

	#define MB (1024UL*1024)
	#define SIZE (2*MB)
	#define BASE ((void *)0x400000000000)

	int main()
	{
		char *p;

		p = mmap(BASE, SIZE, PROT_READ | PROT_WRITE,
				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
				-1, 0);
		if (p == MAP_FAILED)
			perror("mmap"), exit(1);
		p = mremap(BASE, SIZE, SIZE, MREMAP_FIXED | MREMAP_MAYMOVE,
				BASE + SIZE + 8192);
		if (p == MAP_FAILED)
			perror("mremap"), exit(1);
		system("echo 1 > /sys/kernel/debug/split_huge_pages");
		return 0;
	}

The patch fixes freeze and unfreeze paths to handle page table boundary
crossing.

It also makes mapcount vs count check in split_huge_page_to_list()
stricter:
 - after freeze we don't expect any subpage mapped as we remove them
   from rmap when setting up migration entries;
 - count must be 1, meaning only caller has reference to the page;

[1] https://gist.github.com/sashalevin/c67fbea55e7c0576972a

Signed-off-by: Kirill A.  Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
..
kasan kasan: fix kmemleak false-positive in kasan_module_alloc() 2015-11-20 16:17:32 -08:00
backing-dev.c mm: memcontrol: export root_mem_cgroup 2016-01-14 16:00:49 -08:00
balloon_compaction.c
bootmem.c x86/mm: Introduce max_possible_pfn 2015-12-06 12:46:31 +01:00
cleancache.c
cma.c
cma.h
cma_debug.c
compaction.c mm/compaction.c: __compact_pgdat() code cleanuup 2016-01-14 16:00:49 -08:00
debug-pagealloc.c
debug.c mm: rework mapcount accounting to enable 4k mapping of THPs 2016-01-15 17:56:32 -08:00
dmapool.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
early_ioremap.c
fadvise.c
failslab.c mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM 2015-11-06 17:50:42 -08:00
filemap.c mm: differentiate page_mapped() from page_mapcount() for compound pages 2016-01-15 17:56:32 -08:00
frame_vector.c
frontswap.c
gup.c thp: allow mlocked THP again 2016-01-15 17:56:32 -08:00
highmem.c
huge_memory.c thp: fix split_huge_page() after mremap() of THP 2016-01-15 17:56:32 -08:00
hugetlb.c mm: rework mapcount accounting to enable 4k mapping of THPs 2016-01-15 17:56:32 -08:00
hugetlb_cgroup.c mm: make compound_head() robust 2015-11-06 17:50:42 -08:00
hwpoison-inject.c
init-mm.c
internal.h thp: reintroduce split_huge_page() 2016-01-15 17:56:32 -08:00
interval_tree.c
Kconfig mm: re-enable THP 2016-01-15 17:56:32 -08:00
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c Revert "gfp: add __GFP_NOACCOUNT" 2016-01-14 16:00:49 -08:00
ksm.c mm/ksm.c: mark stable page dirty 2016-01-15 17:56:32 -08:00
list_lru.c
maccess.c
madvise.c mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called 2016-01-15 17:56:32 -08:00
Makefile
memblock.c mm/memblock: introduce for_each_memblock_type() 2016-01-14 16:00:49 -08:00
memcontrol.c mm: rework mapcount accounting to enable 4k mapping of THPs 2016-01-15 17:56:32 -08:00
memory-failure.c mm: hwpoison: adjust for new thp refcounting 2016-01-15 17:56:32 -08:00
memory.c thp: allow mlocked THP again 2016-01-15 17:56:32 -08:00
memory_hotplug.c memory-hotplug: don't BUG() in register_memory_resource() 2016-01-14 16:00:49 -08:00
mempolicy.c migrate_pages: try to split pages on queuing 2016-01-15 17:56:32 -08:00
mempool.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
memtest.c
migrate.c thp: introduce deferred_split_huge_page() 2016-01-15 17:56:32 -08:00
mincore.c mm, thp: remove infrastructure for handling splitting PMDs 2016-01-15 17:56:32 -08:00
mlock.c thp: allow mlocked THP again 2016-01-15 17:56:32 -08:00
mm_init.c
mmap.c mm: rework virtual memory accounting 2016-01-14 16:00:49 -08:00
mmu_context.c
mmu_notifier.c
mmzone.c mm/mmzone.c: memmap_valid_within() can be boolean 2016-01-14 16:00:49 -08:00
mprotect.c thp: rename split_huge_page_pmd() to split_huge_pmd() 2016-01-15 17:56:32 -08:00
mremap.c mm, thp: remove infrastructure for handling splitting PMDs 2016-01-15 17:56:32 -08:00
msync.c
nobootmem.c x86/mm: Introduce max_possible_pfn 2015-12-06 12:46:31 +01:00
nommu.c kmemcg: account certain kmem allocations to memcg 2016-01-14 16:00:49 -08:00
oom_kill.c mm, shmem: add internal shmem resident memory accounting 2016-01-14 16:00:49 -08:00
page-writeback.c mm: page_alloc: generalize the dirty balance reserve 2016-01-14 16:00:49 -08:00
page_alloc.c thp: introduce deferred_split_huge_page() 2016-01-15 17:56:32 -08:00
page_counter.c mm: page_counter: let page_counter_try_charge() return bool 2015-11-05 19:34:48 -08:00
page_ext.c
page_idle.c mm: add page_check_address_transhuge() helper 2016-01-15 17:56:32 -08:00
page_io.c
page_isolation.c mm/page_isolation: use macro to judge the alignment 2016-01-14 16:00:49 -08:00
page_owner.c
pagewalk.c thp: rename split_huge_page_pmd() to split_huge_pmd() 2016-01-15 17:56:32 -08:00
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c mm, thp: remove infrastructure for handling splitting PMDs 2016-01-15 17:56:32 -08:00
process_vm_access.c
quicklist.c
readahead.c mm: move lru_to_page to mm_inline.h 2016-01-14 16:00:49 -08:00
rmap.c mm: support madvise(MADV_FREE) 2016-01-15 17:56:32 -08:00
shmem.c memcg: adjust to support new THP refcounting 2016-01-15 17:56:32 -08:00
slab.c mm/slab.c: add a helper function get_first_slab 2016-01-14 16:00:49 -08:00
slab.h slab: add SLAB_ACCOUNT flag 2016-01-14 16:00:49 -08:00
slab_common.c slab: add SLAB_ACCOUNT flag 2016-01-14 16:00:49 -08:00
slob.c slab/slub: adjust kmem_cache_alloc_bulk API 2015-11-22 11:58:44 -08:00
slub.c page-flags: define PG_locked behavior on compound pages 2016-01-15 17:56:32 -08:00
sparse-vmemmap.c
sparse.c
swap.c mm: move lazily freed pages to inactive list 2016-01-15 17:56:32 -08:00
swap_cgroup.c
swap_state.c mm: support madvise(MADV_FREE) 2016-01-15 17:56:32 -08:00
swapfile.c mm, thp: adjust conditions when we can reuse the page on WP fault 2016-01-15 17:56:32 -08:00
truncate.c
userfaultfd.c memcg: adjust to support new THP refcounting 2016-01-15 17:56:32 -08:00
util.c mm: prepare page_referenced() and page_idle to new THP refcounting 2016-01-15 17:56:32 -08:00
vmacache.c
vmalloc.c mm, vmalloc: remove VM_VPAGES 2016-01-14 16:00:49 -08:00
vmpressure.c memcg: avoid vmpressure oops when memcg disabled 2016-01-14 16:00:49 -08:00
vmscan.c mm: support madvise(MADV_FREE) 2016-01-15 17:56:32 -08:00
vmstat.c mm: support madvise(MADV_FREE) 2016-01-15 17:56:32 -08:00
workingset.c
zbud.c mm/zbud.c: use list_last_entry() instead of list_tail_entry() 2016-01-15 11:40:52 -08:00
zpool.c mm: zsmalloc: constify struct zs_pool name 2015-11-06 17:50:42 -08:00
zsmalloc.c zsmalloc: reorganize struct size_class to pack 4 bytes hole 2016-01-15 11:40:52 -08:00
zswap.c mm/zswap: change incorrect strncmp use to strcmp 2015-12-18 14:25:40 -08:00