kernel-hacking-2024-linux-s.../kernel
Steven Rostedt c32e827b25 tracing: add raw trace point recording infrastructure
Impact: lower overhead tracing

The current event tracer can automatically pick up trace points
that are registered with the TRACE_FORMAT macro. But it required
a printf format string and parsing. Although, this adds the ability
to get guaranteed information like task names and such, it took
a hit in overhead processing. This processing can add about 500-1000
nanoseconds overhead, but in some cases that too is considered
too much and we want to shave off as much from this overhead as
possible.

Tom Zanussi recently posted tracing patches to lkml that are based
on a nice idea about capturing the data via C structs using
STRUCT_ENTER, STRUCT_EXIT type of macros.

I liked that method very much, but did not like the implementation
that required a developer to add data/code in several disjoint
locations.

This patch extends the event_tracer macros to do a similar "raw C"
approach that Tom Zanussi did. But instead of having the developers
needing to tweak a bunch of code all over the place, they can do it
all in one macro - preferably placed near the code that it is
tracing. That makes it much more likely that tracepoints will be
maintained on an ongoing basis by the code they modify.

The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
a developer may still utilize the more low level DECLARE_TRACE macros
if they don't care about getting their traces automatically in the event
tracer.)

They can also use the existing TRACE_FORMAT if they don't need to code
the tracepoint in C, but just want to use the convenience of printf.

So if the developer wants to "hardwire" a tracepoint in the fastest
possible way, and wants to acquire their data via a user space utility
in a raw binary format, or wants to see it in the trace output but not
sacrifice any performance, then they can implement the faster but
more complex TRACE_EVENT_FORMAT macro.

Here's what usage looks like:

  TRACE_EVENT_FORMAT(name,
	TPPROTO(proto),
	TPARGS(args),
	TPFMT(fmt, fmt_args),
	TRACE_STUCT(
		TRACE_FIELD(type1, item1, assign1)
		TRACE_FIELD(type2, item2, assign2)
			[...]
	),
	TPRAWFMT(raw_fmt)
	);

Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
uses.

 name: is the unique identifier of the trace point
 proto: The proto type that the trace point uses
 args: the args in the proto type
 fmt: printf format to use with the event printf tracer
 fmt_args: the printf argments to match fmt

 TRACE_STRUCT starts the ability to create a structure.
 Each item in the structure is defined with a TRACE_FIELD

  TRACE_FIELD(type, item, assign)

 type: the C type of item.
 item: the name of the item in the stucture
 assign: what to assign the item in the trace point callback

 raw_fmt is a way to pretty print the struct. It must match
  the order of the items are added in TRACE_STUCT

 An example of this would be:

 TRACE_EVENT_FORMAT(sched_wakeup,
	TPPROTO(struct rq *rq, struct task_struct *p, int success),
	TPARGS(rq, p, success),
	TPFMT("task %s:%d %s",
	      p->comm, p->pid, success?"succeeded":"failed"),
	TRACE_STRUCT(
		TRACE_FIELD(pid_t, pid, p->pid)
		TRACE_FIELD(int, success, success)
	),
	TPRAWFMT("task %d success=%d")
	);

 This creates us a unique struct of:

 struct {
	pid_t		pid;
	int		success;
 };

 And the way the call back would assign these values would be:

	entry->pid = p->pid;
	entry->success = success;

The nice part about this is that the creation of the assignent is done
via macro magic in the event tracer.  Once the TRACE_EVENT_FORMAT is
created, the developer will then have a faster method to record
into the ring buffer. They do not need to worry about the tracer itself.

The developer would only need to touch the files in include/trace/*.h

Again, I would like to give special thanks to Tom Zanussi for this
nice idea.

Idea-from: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-28 03:09:32 -05:00
..
irq tracing, genirq: add irq enter and exit trace events 2009-02-26 18:43:50 +01:00
power PM: Split up sysdev_[suspend|resume] from device_power_[down|up] 2009-02-22 10:33:44 -08:00
time hrtimers: allow the hot-unplugging of all cpus 2009-01-30 22:35:29 +01:00
trace tracing: add raw trace point recording infrastructure 2009-02-28 03:09:32 -05:00
.gitignore
acct.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
async.c async: use list_move_tail 2009-02-08 10:00:26 -08:00
audit.c [PATCH] fix broken timestamps in AVC generated by kernel threads 2008-12-09 02:27:41 -05:00
audit.h fixing audit rule ordering mess, part 1 2009-01-04 15:14:41 -05:00
audit_tree.c audit: validate comparison operations, store them in sane form 2009-01-04 15:14:42 -05:00
auditfilter.c audit: validate comparison operations, store them in sane form 2009-01-04 15:14:42 -05:00
auditsc.c make sure that filterkey of task,always rules is reported 2009-01-04 15:14:42 -05:00
backtracetest.c
bounds.c
capability.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
cgroup.c cgroups: fix possible use after free 2009-02-18 15:37:54 -08:00
cgroup_debug.c cgroups: fix probable race with put_css_set[_taskexit] and find_css_set 2008-10-20 08:52:38 -07:00
cgroup_freezer.c freezer_cg: disable writing freezer.state of root cgroup 2008-11-12 17:17:16 -08:00
compat.c Allow times and time system calls to return small negative values 2009-01-06 15:59:13 -08:00
configs.c kernel/configs.c: remove useless comments 2008-10-20 08:52:34 -07:00
cpu.c stop_machine/cpu hotplug: fix disable_nonboot_cpus 2009-01-07 11:36:14 -08:00
cpuset.c cpuset: fix possible deadlock in async_rebuild_sched_domains 2009-01-19 02:44:00 +01:00
cred-internals.h CRED: Inaugurate COW credentials 2008-11-14 10:39:23 +11:00
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 2009-01-09 13:59:25 -08:00
delayacct.c schedstat: consolidate per-task cpu runtime stats 2008-12-18 13:54:01 +01:00
dma-coherent.c dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy. 2009-01-21 18:51:53 +09:00
dma.c kernel/dma.c: remove a CVS keyword 2008-10-16 11:21:30 -07:00
exec_domain.c [CVE-2009-0029] System call wrappers part 04 2009-01-14 14:15:19 +01:00
exit.c signal: re-add dead task accumulation stats. 2009-02-05 13:04:33 +01:00
extable.c tracing/function-graph-tracer: drop the kernel_text_address check 2009-02-09 10:51:38 +01:00
fork.c Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-02-11 08:24:32 -08:00
freezer.c freezer_cg: use thaw_process() in unfreeze_cgroup() 2008-10-30 11:38:45 -07:00
futex.c futex: fix reference leak 2009-02-11 18:24:08 +01:00
futex_compat.c CRED: Use RCU to access another task's creds and to release a task's own creds 2008-11-14 10:39:19 +11:00
hrtimer.c hrtimer: prevent negative expiry value after clock_was_set() 2009-01-30 22:35:34 +01:00
itimer.c timers: split process wide cpu clocks/timers 2009-02-05 13:04:33 +01:00
kallsyms.c Revert "kbuild: strip generated symbols from *.ko" 2009-01-14 21:38:20 +01:00
Kconfig.freezer container freezer: implement freezer cgroup subsystem 2008-10-20 08:52:34 -07:00
Kconfig.hz
Kconfig.preempt rcu: provide RCU options on non-preempt architectures too 2008-12-25 09:31:28 +01:00
kexec.c PM: Split up sysdev_[suspend|resume] from device_power_[down|up] 2009-02-22 10:33:44 -08:00
kfifo.c
kgdb.c kgdb: call touch_softlockup_watchdog on resume 2008-10-06 13:50:59 -05:00
kmod.c kmod: fix varargs kernel-doc 2009-01-06 15:59:27 -08:00
kprobes.c kprobes: check CONFIG_FREEZER instead of CONFIG_PM 2009-01-16 14:32:17 -05:00
ksysfs.c kernel/ksysfs.c:fix dependence on CONFIG_NET 2009-01-06 10:44:31 -08:00
kthread.c tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() 2008-11-16 09:01:36 +01:00
latencytop.c KSYM_SYMBOL_LEN fixes 2008-12-10 08:01:54 -08:00
lockdep.c Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2008-12-30 16:10:19 -08:00
lockdep_internals.h
lockdep_proc.c lockstat: contend with points 2008-10-20 15:43:10 +02:00
Makefile PM: fix build for CONFIG_PM unset 2009-02-21 14:17:17 -08:00
marker.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
module.c tracing/function-graph-tracer: drop the kernel_text_address check 2009-02-09 10:51:38 +01:00
mutex-debug.c
mutex-debug.h
mutex.c mutex: __used is needed for function referenced only from inline asm 2008-11-24 10:00:28 +01:00
mutex.h
notifier.c Merge commit 'v2.6.28-rc6' into core/debug 2008-11-26 08:22:50 +01:00
ns_cgroup.c ns_cgroup: remove unused spinlock 2009-01-08 08:31:02 -08:00
nsproxy.c User namespaces: set of cleanups (v2) 2008-11-24 18:57:41 -05:00
panic.c oops: increment the oops UUID every time we oops 2009-01-06 15:59:12 -08:00
params.c Fix compile warning in kernel/params.c 2008-10-23 12:09:00 -07:00
pid.c pid: generalize task_active_pid_ns 2009-01-08 08:31:12 -08:00
pid_namespace.c
pm_qos_params.c
posix-cpu-timers.c timers: more consistently use clock vs timer 2009-02-13 13:04:05 +01:00
posix-timers.c [CVE-2009-0029] System call wrappers part 05 2009-01-14 14:15:20 +01:00
printk.c PM: Fix suspend_console and resume_console to use only one semaphore 2009-02-21 14:17:18 -08:00
profile.c profiling: fix broken profiling regression 2009-02-10 00:50:37 +01:00
ptrace.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
rcuclassic.c rcu: add __cpuinit to rcu_init_percpu_data() 2009-01-14 11:26:40 +01:00
rcupdate.c rcu: eliminate synchronize_rcu_xxx macro 2009-01-05 10:18:08 +01:00
rcupreempt.c rcu: eliminate synchronize_rcu_xxx macro 2009-01-05 10:18:08 +01:00
rcupreempt_trace.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
rcutorture.c rcu: fix bug in rcutorture system-shutdown code 2009-01-07 23:36:25 +01:00
rcutree.c rcu: add __cpuinit to rcu_init_percpu_data() 2009-01-14 11:26:40 +01:00
rcutree_trace.c "Tree RCU": scalable classic RCU implementation 2008-12-18 21:56:04 +01:00
relay.c Merge branches 'tracing/ftrace', 'tracing/kmemtrace' and 'linus' into tracing/core 2009-02-03 06:25:38 +01:00
res_counter.c memcg: memory cgroup resource counters for hierarchy 2009-01-08 08:31:05 -08:00
resource.c resources: fix parameter name and kernel-doc 2009-01-15 16:39:38 -08:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c Merge branch 'linus' into tracing/blktrace 2009-02-19 09:00:35 +01:00
sched_clock.c Merge branch 'sched/clock' into tracing/ftrace 2009-02-26 21:21:59 +01:00
sched_cpupri.c sched: fix section mismatch 2009-01-06 11:07:15 +01:00
sched_cpupri.h sched: convert struct cpupri_vec cpumask_var_t. 2008-11-24 17:52:22 +01:00
sched_debug.c sched: partly revert "sched debug: remove NULL checking in print_cfs_rt_rq()" 2009-01-11 02:40:32 +01:00
sched_fair.c sched: revert recent sync wakeup changes 2009-02-11 14:43:35 +01:00
sched_features.h sched: backward looking buddy 2008-11-05 10:30:14 +01:00
sched_idletask.c sched: add CONFIG_SMP consistency 2008-10-22 10:01:52 +02:00
sched_rt.c sched_rt: don't use first_cpu on cpumask created with cpumask_and 2009-02-01 10:49:52 +01:00
sched_stats.h timers: split process wide cpu clocks/timers 2009-02-05 13:04:33 +01:00
seccomp.c
semaphore.c
signal.c signal: re-add dead task accumulation stats. 2009-02-05 13:04:33 +01:00
smp.c generic-ipi: use per cpu data for single cpu ipi calls 2009-01-30 18:31:08 +01:00
softirq.c trace, lockdep: manual preempt count adding for local_bh_disable 2009-01-23 11:10:57 +01:00
softlockup.c softlock: fix false panic which can occur if softlockup_thresh is reduced 2009-01-14 11:48:07 +01:00
spinlock.c
srcu.c
stacktrace.c stacktrace: provide save_stack_trace_tsk() weak alias 2008-12-25 11:44:43 +01:00
stop_machine.c stop_machine: introduce stop_machine_create/destroy. 2009-01-05 08:40:14 +10:30
sys.c revert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY" 2009-02-05 12:56:47 -08:00
sys_ni.c [CVE-2009-0029] Make sys_syslog a conditional system call 2009-01-14 14:15:16 +01:00
sysctl.c mm: fix dirty_bytes/dirty_background_bytes sysctls on 64bit arches 2009-02-11 14:25:35 -08:00
sysctl_check.c [XFS] remove restricted chown parameter from xfs linux 2008-10-30 18:30:09 +11:00
taskstats.c cpumask: convert rest of files in kernel/ 2009-01-01 10:12:28 +10:30
test_kprobes.c kprobes: add tests for register_kprobes 2009-01-06 15:59:20 -08:00
time.c [CVE-2009-0029] System call wrappers part 01 2009-01-14 14:15:18 +01:00
timeconst.pl
timer.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
tracepoint.c markers/tracpoints: fix non-modular build 2008-11-16 09:52:03 +01:00
tsacct.c mm: introduce get_mm_hiwater_xxx(), fix taskstats->hiwater_xxx accounting 2009-01-06 15:59:09 -08:00
uid16.c [CVE-2009-0029] System call wrappers part 19 2009-01-14 14:15:26 +01:00
up.c smp_call_function_single(): be slightly less stupid, fix #2 2009-01-12 16:04:37 +01:00
user.c User namespaces: Only put the userns when we unhash the uid 2009-02-13 08:07:40 -08:00
user_namespace.c User namespaces: set of cleanups (v2) 2008-11-24 18:57:41 -05:00
utsname.c
utsname_sysctl.c sysctl: simplify ->strategy 2008-10-16 11:21:47 -07:00
wait.c wait: prevent exclusive waiter starvation 2009-02-05 12:56:48 -08:00
workqueue.c Merge branches 'tracing/ftrace', 'tracing/kmemtrace' and 'linus' into tracing/core 2009-02-03 06:25:38 +01:00