kernel-hacking-2024-linux-s.../kernel
Tejun Heo 5db9a4d99b cgroup: fix cgroup hierarchy umount race
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed.  As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.

Destroying a superblock which has dentries with positive refcnts is a
critical bug and triggers BUG() in vfs code.  As each cgroup dentry
holds an s_active reference, any lingering cgroup has both its dentry
and the superblock pinned and thus preventing premature release of
superblock.

Unfortunately, after 48ddbe1946, there's a small window while
releasing a cgroup which is directly under the root of the hierarchy.
When a cgroup directory is released, vfs layer first deletes the
corresponding dentry and then invokes dput() on the parent, which may
recurse further, so when a cgroup directly below root cgroup is
released, the cgroup is first destroyed - which releases the s_active
it was holding - and then the dentry for the root cgroup is dput().

This creates a window where the root dentry's refcnt isn't zero but
superblock's s_active is.  If umount happens before or during this
window, vfs will see the root dentry with non-zero refcnt and trigger
BUG().

Before 48ddbe1946, this problem didn't exist because the last dentry
reference was guaranteed to be put synchronously from rmdir(2)
invocation which holds s_active around the whole process.

Fix it by holding an extra superblock->s_active reference across
dput() from css release, which is the dput() path added by 48ddbe1946
and the only one which doesn't hold an extra s_active ref across the
final cgroup dput().

Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <4FEEA5CB.8070809@huawei.com>
Reported-by: shyju pv <shyju.pv@huawei.com>
Tested-by: shyju pv <shyju.pv@huawei.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Li Zefan <lizefan@huawei.com>
2012-07-07 16:08:18 -07:00
..
debug
events perf: Use css_tryget() to avoid propping up css refcount 2012-06-18 11:45:57 +02:00
gcov
irq Merge branches 'irq-urgent-for-linus' and 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-06-04 11:36:51 -07:00
power PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format 2012-05-18 20:44:59 +02:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-06-08 14:59:29 -07:00
time Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-06-15 16:52:35 -07:00
trace Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2012-07-03 15:45:10 -07:00
.gitignore
acct.c
async.c
audit.c
audit.h
audit_tree.c
audit_watch.c
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c userns: Teach inode_capable to understand inodes whose uids map to other namespaces. 2012-05-15 14:59:24 -07:00
cgroup.c cgroup: fix cgroup hierarchy umount race 2012-07-07 16:08:18 -07:00
cgroup_freezer.c
compat.c new helper: sigsuspend() 2012-05-21 23:52:30 -04:00
configs.c
cpu.c kernel/cpu.c: document clear_tasks_mm_cpumask() 2012-05-31 17:49:30 -07:00
cpu_pm.c kernel/cpu_pm.c: fix various typos 2012-05-31 17:49:27 -07:00
cpuset.c Merge branch 'for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2012-05-22 17:40:19 -07:00
crash_dump.c
cred.c keys: kill task_struct->replacement_session_keyring 2012-05-23 22:11:41 -04:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper 2012-06-20 14:39:36 -07:00
extable.c
fork.c Revert "mm: correctly synchronize rss-counters at exit/exec" 2012-06-07 17:54:07 -07:00
freezer.c
futex.c
futex_compat.c
groups.c
hrtimer.c
hung_task.c
irq_work.c
itimer.c
jump_label.c
kallsyms.c vsprintf: fix %ps on non symbols when using kallsyms 2012-05-29 16:22:32 -07:00
kcmp.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c
kfifo.c
kmod.c kmod.c: fix kernel-doc warning 2012-05-31 17:49:28 -07:00
kprobes.c
ksysfs.c
kthread.c
latencytop.c
lglock.c brlocks/lglocks: turn into functions 2012-05-29 23:28:41 -04:00
lockdep.c
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-06-01 10:34:35 -07:00
module.c Guard check in module loader against integer overflow 2012-05-23 22:28:53 +09:30
mutex-debug.c
mutex-debug.h
mutex.c
mutex.h
notifier.c
nsproxy.c
padata.c
panic.c kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() 2012-05-18 14:02:10 +02:00
params.c
pid.c mm: add a low limit to alloc_large_system_hash 2012-05-24 00:28:21 -04:00
pid_namespace.c pidns: guarantee that the pidns init will be the last pidns process reaped 2012-06-20 14:39:36 -07:00
posix-cpu-timers.c
posix-timers.c
printk.c printk.c: fix kernel-doc warnings 2012-06-30 15:56:40 -07:00
profile.c
ptrace.c
range.c
rcu.h
rcupdate.c
rcutiny.c
rcutiny_plugin.h
rcutorture.c
rcutree.c rcu: Stop rcu_do_batch() from multiplexing the "count" variable 2012-06-25 12:35:25 -07:00
rcutree.h rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure 2012-06-06 20:43:28 -07:00
rcutree_plugin.h rcu: Precompute RCU_FAST_NO_HZ timer offsets 2012-06-06 20:43:28 -07:00
rcutree_trace.c
relay.c splice: fix racy pipe->buffers uses 2012-06-13 21:16:42 +02:00
res_counter.c rescounters: add res_counter_uncharge_until() 2012-05-29 16:22:27 -07:00
resource.c kernel/resource.c: correct the comment of allocate_resource() 2012-05-31 17:49:26 -07:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
seccomp.c
semaphore.c
signal.c new helper: signal_delivered() 2012-06-01 12:58:52 -04:00
smp.c
smpboot.c smpboot, idle: Fix comment mismatch over idle_threads_init() 2012-05-24 22:58:08 +02:00
smpboot.h
softirq.c
spinlock.c
srcu.c
stacktrace.c
stop_machine.c
sys.c c/r: prctl: Move PR_GET_TID_ADDRESS to a proper place 2012-06-20 14:39:36 -07:00
sys_ni.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
sysctl.c
sysctl_binary.c
task_work.c task_work_add: generic process-context callbacks 2012-05-23 22:09:21 -04:00
taskstats.c
test_kprobes.c
time.c
timeconst.pl
timer.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-05-23 17:42:39 -07:00
tracepoint.c
tsacct.c
uid16.c
up.c
user-return-notifier.c
user.c userns: Silence silly gcc warning. 2012-05-19 15:44:40 -06:00
user_namespace.c
utsname.c
utsname_sysctl.c
wait.c
watchdog.c watchdog: Quiet down the boot messages 2012-06-14 12:20:50 +02:00
workqueue.c lockdep: fix oops in processing workqueue 2012-05-15 08:08:31 -07:00
workqueue_sched.h