kernel-hacking-2024-linux-s.../mm
Jeremy Fitzhardinge 1ea0704e0d mm: add a ptep_modify_prot transaction abstraction
This patch adds an API for doing read-modify-write updates to a pte's
protection bits which may race against hardware updates to the pte.
After reading the pte, the hardware may asynchonously set the accessed
or dirty bits on a pte, which would be lost when writing back the
modified pte value.

The existing technique to handle this race is to use
ptep_get_and_clear() atomically fetch the old pte value and clear it
in memory.  This has the effect of marking the pte as non-present,
which will prevent the hardware from updating its state.  When the new
value is written back, the pte will be present again, and the hardware
can resume updating the access/dirty flags.

When running in a virtualized environment, pagetable updates are
relatively expensive, since they generally involve some trap into the
hypervisor.  To mitigate the cost of these updates, we tend to batch
them.

However, because of the atomic nature of ptep_get_and_clear(), it is
inherently non-batchable.  This new interface allows batching by
giving the underlying implementation enough information to open a
transaction between the read and write phases:

ptep_modify_prot_start() returns the current pte value, and puts the
  pte entry into a state where either the hardware will not update the
  pte, or if it does, the updates will be preserved on commit.

ptep_modify_prot_commit() writes back the updated pte, makes sure that
  any hardware updates made since ptep_modify_prot_start() are
  preserved.

ptep_modify_prot_start() and _commit() must be exactly paired, and
used while holding the appropriate pte lock.  They do not protect
against other software updates of the pte in any way.

The current implementations of ptep_modify_prot_start and _commit are
functionally unchanged from before: _start() uses ptep_get_and_clear()
fetch the pte and zero the entry, preventing any hardware updates.
_commit() simply writes the new pte value back knowing that the
hardware has not updated the pte in the meantime.

The only current user of this interface is mprotect

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-25 15:15:53 +02:00
..
allocpercpu.c
backing-dev.c mm: bdi: fix race in bdi_class device creation 2008-05-20 13:31:53 -07:00
bootmem.c Add return value to reserve_bootmem_node() 2008-06-21 11:25:10 -07:00
bounce.c
dmapool.c
fadvise.c
filemap.c mm: fix infinite loop in filemap_fault 2008-05-14 19:11:13 -07:00
filemap_xip.c
fremap.c
highmem.c
hugetlb.c hugetlb: fix lockdep error 2008-06-06 11:29:09 -07:00
internal.h
Kconfig
maccess.c
madvise.c
Makefile
memcontrol.c memcg: simple stats for memory resource controller 2008-05-01 08:04:02 -07:00
memory.c mm: fix race in COW logic 2008-06-23 11:28:32 -07:00
memory_hotplug.c memory_hotplug: always initialize pageblock bitmap 2008-05-14 19:11:15 -07:00
mempolicy.c
mempool.c
migrate.c Reinstate ZERO_PAGE optimization in 'get_user_pages()' and fix XIP 2008-06-20 11:18:25 -07:00
mincore.c
mlock.c
mmap.c brk: make sys_brk() honor COMPAT_BRK when computing lower bound 2008-06-06 11:29:09 -07:00
mmzone.c
mprotect.c mm: add a ptep_modify_prot transaction abstraction 2008-06-25 15:15:53 +02:00
mremap.c
msync.c
nommu.c nommu: Correct kobjsize() page validity checks. 2008-06-12 07:56:17 -07:00
oom_kill.c
page-writeback.c mm: Add NR_WRITEBACK_TEMP counter 2008-04-30 08:29:50 -07:00
page_alloc.c mm: Minor clean-up of page flags in mm/page_alloc.c 2008-06-09 10:22:24 -07:00
page_io.c
page_isolation.c
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c mm/pdflush.c: merge the same code in two path 2008-05-13 08:02:24 -07:00
prio_tree.c
quicklist.c
readahead.c mm: bdi: export BDI attributes in sysfs 2008-04-30 08:29:49 -07:00
rmap.c
shmem.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
shmem_acl.c
slab.c Slab: Fix memory leak in fallback_alloc() 2008-06-21 16:51:02 -07:00
slob.c slob: Fix to return wrong pointer 2008-05-19 20:55:25 +03:00
slub.c slub: ksize() abuse checks 2008-05-22 19:52:18 +03:00
sparse-vmemmap.c
sparse.c revert "memory hotplug: allocate usemap on the section with pgdat" 2008-04-30 08:29:55 -07:00
swap.c mm: fix atomic_t overflow in vm 2008-05-24 09:56:09 -07:00
swap_state.c mm: bdi: add separate writeback accounting capability 2008-04-30 08:29:50 -07:00
swapfile.c mm: use non-racy method for /proc/swaps creation 2008-04-29 08:06:20 -07:00
thrash.c
tiny-shmem.c
truncate.c
util.c
vmalloc.c docbook: fix vmalloc missing parameter notation 2008-05-01 08:03:59 -07:00
vmscan.c mm: fix incorrect variable type in do_try_to_free_pages() 2008-06-12 18:05:39 -07:00
vmstat.c make vmstat cpu-unplug safe 2008-05-13 08:02:23 -07:00