kernel-hacking-2024-linux-s.../mm
Tejun Heo 48ddbe1946 cgroup: make css->refcnt clearing on cgroup removal optional
Currently, cgroup removal tries to drain all css references.  If there
are active css references, the removal logic waits and retries
->pre_detroy() until either all refs drop to zero or removal is
cancelled.

This semantics is unusual and adds non-trivial complexity to cgroup
core and IMHO is fundamentally misguided in that it couples internal
implementation details (references to internal data structure) with
externally visible operation (rmdir).  To userland, this is a behavior
peculiarity which is unnecessary and difficult to expect (css refs is
otherwise invisible from userland), and, to policy implementations,
this is an unnecessary restriction (e.g. blkcg wants to hold css refs
for caching purposes but can't as that becomes visible as rmdir hang).

Unfortunately, memcg currently depends on ->pre_destroy() retrials and
cgroup removal vetoing and can't be immmediately switched to the new
behavior.  This patch introduces the new behavior of not waiting for
css refs to drain and maintains the old behavior for subsystems which
have __DEPRECATED_clear_css_refs set.

Once, memcg is updated, we can drop the code paths for the old
behavior as proposed in the following patch.  Note that the following
patch is incorrect in that dput work item is in cgroup and may lose
some of dputs when multiples css's are released back-to-back, and
__css_put() triggers check_for_release() when refcnt reaches 0 instead
of 1; however, it shows what part can be removed.

  http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251

Note that, in not-too-distant future, cgroup core will start emitting
warning messages for subsys which require the old behavior, so please
get moving.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
2012-04-01 12:09:56 -07:00
..
backing-dev.c
bootmem.c bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-03-21 17:54:58 -07:00
bounce.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
cleancache.c
compaction.c mm: compaction: make compact_control order signed 2012-03-21 17:54:56 -07:00
debug-pagealloc.c
dmapool.c
fadvise.c
failslab.c
filemap.c radix-tree: use iterators in find_get_pages* functions 2012-03-28 17:14:37 -07:00
filemap_xip.c
fremap.c
highmem.c
huge_memory.c thp: optimize away unnecessary page table locking 2012-03-21 17:54:57 -07:00
hugetlb.c mm: hugetlb: cleanup duplicated code in unmapping vm range 2012-03-23 16:58:31 -07:00
hwpoison-inject.c
init-mm.c
internal.h
Kconfig
Kconfig.debug
kmemcheck.c
kmemleak-test.c
kmemleak.c
ksm.c ksm: cleanup: introduce find_mergeable_vma() 2012-03-21 17:54:59 -07:00
maccess.c
madvise.c coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP 2012-03-23 16:58:42 -07:00
Makefile
memblock.c
memcontrol.c cgroup: make css->refcnt clearing on cgroup removal optional 2012-04-01 12:09:56 -07:00
memory-failure.c Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-03-22 09:42:04 -07:00
memory.c coredump: remove VM_ALWAYSDUMP flag 2012-03-23 16:58:42 -07:00
memory_hotplug.c
mempolicy.c cpuset: mm: reduce large amounts of memory barrier related damage v3 2012-03-21 17:54:59 -07:00
mempool.c
migrate.c mm: fix move/migrate_pages() race on task struct 2012-03-21 17:54:58 -07:00
mincore.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
mlock.c vm: avoid using find_vma_prev() unnecessarily 2012-03-06 18:23:36 -08:00
mm_init.c
mmap.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mmu_context.c mm, counters: remove task argument to sync_mm_rss() and __sync_task_rss_stat() 2012-03-21 17:54:59 -07:00
mmu_notifier.c
mmzone.c
mprotect.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
mremap.c
msync.c
nobootmem.c
nommu.c
oom_kill.c signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig() 2012-03-23 16:58:41 -07:00
page-writeback.c Ext4 commits for 3.3 merge window; mostly cleanups and bug fixes 2012-03-28 10:02:55 -07:00
page_alloc.c mm: only IPI CPUs to drain local pages if they exist 2012-03-28 17:14:35 -07:00
page_cgroup.c page_cgroup: fix horrid swap accounting regression 2012-03-06 08:18:23 -08:00
page_io.c
page_isolation.c
pagewalk.c mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode 2012-03-21 17:54:54 -07:00
percpu-km.c
percpu-vm.c
percpu.c
pgtable-generic.c thp: add HPAGE_PMD_* definitions for !CONFIG_TRANSPARENT_HUGEPAGE 2012-03-21 17:55:02 -07:00
prio_tree.c
process_vm_access.c
quicklist.c
readahead.c
rmap.c memcg: use new logic for page stat accounting 2012-03-21 17:55:01 -07:00
shmem.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-22 09:04:48 -07:00
slab.c Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux 2012-03-28 15:04:26 -07:00
slob.c
slub.c Merge branch 'akpm' (Andrew's patch-bomb) 2012-03-28 17:19:28 -07:00
sparse-vmemmap.c
sparse.c bootmem/sparsemem: remove limit constraint in alloc_bootmem_section 2012-03-21 17:54:58 -07:00
swap.c mm: drain percpu lru add/rotate page-vectors on cpu hot-unplug 2012-03-21 17:54:58 -07:00
swap_state.c mm: make swapin readahead skip over holes 2012-03-21 17:54:56 -07:00
swapfile.c swapon: check validity of swap_flags 2012-03-28 17:14:35 -07:00
thrash.c
truncate.c mm for fs: add truncate_pagecache_range() 2012-03-28 17:14:35 -07:00
util.c procfs: mark thread stack correctly in proc/<pid>/maps 2012-03-21 17:54:58 -07:00
vmalloc.c mm: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:27 +08:00
vmscan.c Fix potential endless loop in kswapd when compaction is not enabled 2012-03-24 12:18:32 -07:00
vmstat.c