aboutsummaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2009-06-02hw-breakpoints: ftrace plugin for kernel symbol tracing using HW Breakpoint ↵K.Prasad
interfaces This patch adds an ftrace plugin to detect and profile memory access over kernel variables. It uses HW Breakpoint interfaces to 'watch memory addresses. Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02tracing/stat: remove unappropriate safe walk on listFrederic Weisbecker
register_stat_tracer() uses list_for_each_entry_safe to check whether a tracer is already present in the list. But we don't delete anything from the list here, so we don't need the safe version [ Impact: cleanup list use is stat tracing ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02tracing/stat: do some cleanupsLi Zefan
- remove duplicate code in stat_seq_init() - update comments to reflect the change from stat list to stat rbtree [ Impact: clean up ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02tracing/stat: remember to free root nodeLi Zefan
When closing a trace_stat file, we destroy the rbtree constructed during file open, but there is memory leak that the root node is not freed. [ Impact: fix memory leak when closing a trace_stat file ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02tracing/stat: change dummpy_cmp() to return -1Li Zefan
Currently the output of trace_stat/workqueues is totally reversed: # cat /debug/tracing/trace_stat/workqueues ... 1 17 17 210 37 `-blk_unplug_work+0x0/0x57 1 3779 3779 181 11 |-cfq_kick_queue+0x0/0x2f 1 3796 3796 kblockd/1:120 ... The correct output should be: 1 3796 3796 kblockd/1:120 1 3779 3779 181 11 |-cfq_kick_queue+0x0/0x2f 1 17 17 210 37 `-blk_unplug_work+0x0/0x57 It's caused by "tracing/stat: replace linked list by an rbtree for sorting" (53059c9b67a62a3dc8c80204d3da42b9267ea5a0). dummpy_cmp() should return -1, so rb_node will always be inserted as right-most node in the rbtree, thus we sort the output in ascending order. [ Impact: fix the output of trace_stat/workqueues ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02tracing/stat: replace linked list by an rbtree for sortingFrederic Weisbecker
When the stat tracing framework prepares the entries from a tracer to output them to the user, it starts by computing a linear sort through a linked list to give the entries ordered by relevance to the user. This is quite ugly and causes a small latency when we begin to read the file. This patch changes that by turning the linked list into a red-black tree. Athough the whole iteration using the start and next tracer callbacks while opening the file remain the same, it is now much more fast and scalable. The rbtree guarantees O(log(n)) insertions whereas a linked list with linear sorting brought us a O(n) despair. Now the (visible) latency has disapeared. [ Impact: kill the latency while starting to read a stat tracer file ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02tracing/stat: replace trace_stat_session by stat_sessionFrederic Weisbecker
The "trace" prefix in struct trace_stat_session type is annoying while reading the trace_stat.c file. It makes the lines longer, and is not that much useful to explain the sense of this type. Just keep "struct stat_session" for this type. [ Impact: make the code a bit more readable ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02trace_workqueue: remove blank line between each cpuZhaolei
The blankline between each cpu's workqueue stat is not necessary, because the cpu number is enough to part them by eye. Old style also caused a blankline below headline, and made code complex by using lock, disableirq and get cpu var. Old style: # CPU INSERTED EXECUTED NAME # | | | | 0 8644 8644 events/0 0 0 0 cpuset ... 0 1 1 kdmflush 1 35365 35365 events/1 ... New style: # CPU INSERTED EXECUTED NAME # | | | | 0 8644 8644 events/0 0 0 0 cpuset ... 0 1 1 kdmflush 1 35365 35365 events/1 ... [ Impact: provide more readable code ] Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02trace_workqueue: remove cpu_workqueue_stats->first_entryZhaolei
cpu_workqueue_stats->first_entry is useless because we can retrieve the header of a cpu workqueue using: if (&cpu_workqueue_stats->list == workqueue_cpu_stat(cpu)->list.next) [ Impact: cleanup ] Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02trace_workqueue: use list_for_each_entry() instead of list_for_each_entry_safe()Zhaolei
No need to use list_for_each_entry_safe() in iteration without deleting any node, we can use list_for_each_entry() instead. [ Impact: cleanup ] Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-06-02ftrace, workqueuetrace: make workqueue tracepoints use TRACE_EVENT macroZhaolei
v3: zhaolei@cn.fujitsu.com: Change TRACE_EVENT definition to new format introduced by Steven Rostedt: consolidate trace and trace_event headers v2: kosaki@jp.fujitsu.com: print the function names instead of addr, and zap the work addr v1: zhaolei@cn.fujitsu.com: Make workqueue tracepoints use TRACE_EVENT macro TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to the tracepoints: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions Then, this patch converts DEFINE_TRACE to TRACE_EVENT in workqueue related tracepoints. [ Impact: expand workqueue tracer to events tracing ] Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-28trace: disable preemption before taking raw spinlocksHeiko Carstens
s390 code uses smp_processor_id() in __raw_spin_lock() code which reveals that a (raw) spinlock is taken without preemption disabled. This can potentially deadlock. To fix this explicitly disable and enable preemption. BUG: using smp_processor_id() in preemptible [00000000] code: cat/2278 caller is trace_find_cmdline+0x40/0xfc CPU: 0 Not tainted 2.6.30-rc7-dirty #39 Process cat (pid: 2278, task: 000000003faedb68, ksp: 000000003b33b988) 000000003b33b988 000000003b33bae0 0000000000000002 0000000000000000 000000003b33bb80 000000003b33baf8 000000003b33baf8 00000000000175d6 0000000000000001 000000003b33b988 000000003f9b0000 000000000000000b 000000000000000c 000000003b33bb40 000000003b33bae0 0000000000000000 0000000000000000 00000000000175d6 000000003b33bae0 000000003b33bb28 Call Trace: ([<00000000000174b2>] show_trace+0x112/0x170) [<0000000000017582>] show_stack+0x72/0x100 [<0000000000441538>] dump_stack+0xc8/0xd8 [<000000000025c350>] debug_smp_processor_id+0x114/0x130 [<00000000000bf0e4>] trace_find_cmdline+0x40/0xfc [<00000000000c35d4>] trace_print_context+0x58/0xac [<00000000000bb676>] print_trace_line+0x416/0x470 [<00000000000bc8fe>] s_show+0x4e/0x428 [<000000000013834e>] seq_read+0x36a/0x5d4 [<0000000000112a78>] vfs_read+0xc8/0x174 [<0000000000112c58>] SyS_read+0x74/0xc4 [<000000000002c7ae>] sysc_noemu+0x10/0x16 [<000002000012436c>] 0x2000012436c 1 lock held by cat/2278: #0: (&p->lock){+.+.+.}, at: [<0000000000138056>] seq_read+0x72/0x5d4 [ Impact: fix preempt-unsafe raw spinlock ] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-26tracing: add __print_symbolic to trace eventsSteven Rostedt
This patch adds __print_symbolic which is similar to __print_flags but works for an enumeration type instead. That is, there is only a one to one mapping between the values and the symbols. When a match is made, then it is printed, otherwise the hex value is outputed. [ Impact: add interface for showing symbol names in events ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-26tracing: add __print_flags for eventsSteven Rostedt
Developers have been asking for the ability in the ftrace event tracer to display names of bits in a flags variable. Instead of printing out c2, it would be easier to read FOO|BAR|GOO, assuming that FOO is bit 1, BAR is bit 6 and GOO is bit 7. Some examples where this would be useful are the state flags in a context switch, kmalloc flags, and even permision flags in accessing files. [ v2 changes include: Frederic Weisbecker's idea of using a mask instead of bits, thus we can output GFP_KERNEL instead of GPF_WAIT|GFP_IO|GFP_FS. Li Zefan's idea of allowing the caller of __print_flags to add their own delimiter (or no delimiter) where we can get for file permissions rwx instead of r|w|x. ] [ v3 changes: Christoph Hellwig's idea of using an array instead of va_args. ] [ Impact: better displaying of flags in trace output ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-26ftrace: clean up of using ftrace_event_enable_disable()Zhaolei
Always use ftrace_event_enable_disable() to enable/disable an event so that we can factorize out the event toggling code. [ Impact: factorize and cleanup event tracing code ] Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <4A14FDFE.2080402@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-26ftrace: Add task_comm support for trace_eventZhaolei
If we enable a trace event alone without any tracer running (such as function tracer, sched switch tracer, etc...) it can't output enough task command information. We need to use the tracing_{start/stop}_cmdline_record() helpers which are designed to keep track of cmdlines for any tasks that were scheduled during the tracing. Before this patch: # echo 1 > debugfs/tracing/events/sched/sched_switch/enable # cat debugfs/tracing/trace # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | <...>-2289 [000] 526276.724790: sched_switch: task bash:2289 [120] ==> sshd:2287 [120] <...>-2287 [000] 526276.725231: sched_switch: task sshd:2287 [120] ==> bash:2289 [120] <...>-2289 [000] 526276.725452: sched_switch: task bash:2289 [120] ==> sshd:2287 [120] <...>-2287 [000] 526276.727181: sched_switch: task sshd:2287 [120] ==> swapper:0 [140] <idle>-0 [000] 526277.032734: sched_switch: task swapper:0 [140] ==> events/0:5 [115] <...>-5 [000] 526277.032782: sched_switch: task events/0:5 [115] ==> swapper:0 [140] ... After this patch: # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | bash-2269 [000] 527347.989229: sched_switch: task bash:2269 [120] ==> sshd:2267 [120] sshd-2267 [000] 527347.990960: sched_switch: task sshd:2267 [120] ==> bash:2269 [120] bash-2269 [000] 527347.991143: sched_switch: task bash:2269 [120] ==> sshd:2267 [120] sshd-2267 [000] 527347.992959: sched_switch: task sshd:2267 [120] ==> swapper:0 [140] <idle>-0 [000] 527348.531989: sched_switch: task swapper:0 [140] ==> events/0:5 [115] events/0-5 [000] 527348.532115: sched_switch: task events/0:5 [115] ==> swapper:0 [140] ... Changelog: v1->v2: Update Kconfig to select CONTEXT_SWITCH_TRACER in ENABLE_EVENT_TRACING v2->v3: v2 can solve problem that was caused by config EVENT_TRACING alone, but when CONFIG_FTRACE is off and CONFIG_TRACING is selected by other config, compile fail happened again. This version solves it. [ Impact: fix incomplete output of event tracing ] Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4A14FDFE.2080402@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-25tracing: add trace_event_read_lock()Lai Jiangshan
I found that there is nothing to protect event_hash in ftrace_find_event(). Rcu protects the event hashlist but not the event itself while we use it after its extraction through ftrace_find_event(). This lack of a proper locking in this spot opens a race window between any event dereferencing and module removal. Eg: --Task A-- print_trace_line(trace) { event = find_ftrace_event(trace) --Task B-- trace_module_remove_events(mod) { list_trace_events_module(ev, mod) { unregister_ftrace_event(ev->event) { hlist_del(ev->event->node) list_del(....) } } } |--> module removed, the event has been dropped --Task A-- event->print(trace); // Dereferencing freed memory If the event retrieved belongs to a module and this module is concurrently removed, we may end up dereferencing a data from a freed module. RCU could solve this, but it would add latency to the kernel and forbid tracers output callbacks to call any sleepable code. So this fix converts 'trace_event_mutex' to a read/write semaphore, and adds trace_event_read_lock() to protect ftrace_find_event(). [ Impact: fix possible freed memory dereference in ftrace ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4A114806.7090302@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-20ftrace: fix check for return value of register_module_notifier in ↵Ming Lei
event_trace_init register_module_notifier() returns zero in the success case. So fix the inverted fail case check in trace events modules handler. [ Impact: fix spurious warning on ftrace initialization] Reported-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-19blktrace: remove debugfs entries on bad pathStefan Raspl
debugfs directory entries for devices are not removed on some of the failure pathes in do_blk_trace_setup(). One way to reproduce is to start blktrace on multiple devices with insufficient Vmalloc space: Devices will fail with a message like this: BLKTRACESETUP(2) /dev/sdu failed: 5/Input/output error If so, the respective entries in debugfs (e.g. /sys/kernel/debug/block/sdu) will remain and subsequent attempts to start blktrace on the respective devices will not succeed due to existing directories. [ Impact: fix /debug/tracing file cleanup corner case ] Signed-off-by: Stefan Raspl <stefan.raspl@linux.vnet.ibm.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com LKML-Reference: <4A1266CC.5040801@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18tracing: fix check for return value of register_module_notifierMing Lei
return zero should be correct, so fix it. [ Impact: eliminate incorrect syslog message ] Signed-off-by: Ming Lei <tom.leiming@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: rostedt@goodmis.org LKML-Reference: <1242545498-7285-1-git-send-email-tom.leiming@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-14tracing/filters: fix off-by-one bugLi Zefan
We should leave the last slot for the ending '\0'. [ Impact: fix possible crash when the length of an operand is 128 ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A0CDC8C.30602@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-14tracing/filters: add missing unlock in a failure pathLi Zefan
[ Impact: fix deadlock in a rare case we fail to allocate memory ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A0CDC6F.7070200@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-14tracing: stop stack trace on first empty entrySteven Rostedt
The stack tracer stores eight entries in the ring buffer when an event traces the stack. The output outputs all eight entries regardless of how many entries were recorded. This patch breaks out of the loop when a null entry is discovered. [ Impact: only print the stack that is recorded ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-11ring-buffer: move code around to remove some branchesSteven Rostedt
This is a bit of micro-optimizations. But since the ring buffer is used in tracing every function call, it is an extreme hot path. Every nanosecond counts. This change shows over 5% improvement in the ring-buffer-benchmark. [ Impact: more efficient code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-11ring-buffer: use internal time stamp functionSteven Rostedt
The ring_buffer_time_stamp that is exported adds a little more overhead than is needed for using it internally. This patch adds an internal timestamp function that can be inlined (a single line function) and used internally for the ring buffer. [ Impact: a little less overhead to the ring buffer ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-11ring-buffer: small optimizationsSteven Rostedt
Doing some small changes in the fast path of the ring buffer recording saves over 3% in the ring-buffer-benchmark test. [ Impact: a little faster ring buffer recording ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-11ring-buffer: move calculation of event lengthSteven Rostedt
The event length is calculated and passed in to rb_reserve_next_event in two different locations. Having rb_reserve_next_event do the calculations directly makes only one location to do the change and causes the calculation to be inlined by gcc. Before: text data bss dec hex filename 16538 24 12 16574 40be kernel/trace/ring_buffer.o After: text data bss dec hex filename 16490 24 12 16526 408e kernel/trace/ring_buffer.o [ Impact: smaller more efficient code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-11ring-buffer: remove type parameter from rb_reserve_next_eventSteven Rostedt
The rb_reserve_next_event is only called for the data type (type = 0). There is no reason to pass in the type to the function. Before: text data bss dec hex filename 16554 24 12 16590 40ce kernel/trace/ring_buffer.o After: text data bss dec hex filename 16538 24 12 16574 40be kernel/trace/ring_buffer.o [ Impact: cleaner, smaller and slightly more efficient code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-11ring-buffer: check for divide by zero in ring-buffer-benchmarkSteven Rostedt
Although we check if "missed" is not zero, we divide by hit + missed, and the addition can possible overflow and become a divide by zero. This patch checks for this case, and will report it when it happens then modify "hit" to make the calculation be non zero. [ Impact: prevent possible divide by zero in ring-buffer-benchmark ] Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-11ring-buffer: replace constants with time macros in ring-buffer-benchmarkSteven Rostedt
The use of numeric constants is discouraged. It is cleaner and more descriptive to use macros for constant time conversions. This patch also removes an extra new line. [ Impact: more descriptive time conversions ] Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-11blktrace: pdu_buf of pc events should be unsignedLi Zefan
I got this: 8,0 1 305.417782332 2037 I R 32 (ffffff9e 10 00 ...) [bash] It should be: 8,0 1 305.417782332 2037 I R 32 (9e 10 00 ...) [bash] [ Impact: fix output of pc events ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A07C6B3.9080802@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08tracing: add trace_set_clr_event to export event enabling functionSteven Rostedt
Other parts of the kernel may need to be able to enable or disable specific events. Especially parts that create trace events. [ Impact: allow enabling of trace events by those that create the event ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-08tracing: initialize return value for __ftrace_set_clr_eventSteven Rostedt
Commit 8f31bfe538ebafac187d2d4465a92e1d9ee6d8c2 tracing/events: clean up for ftrace_set_clr_event() Moved out the code for ftrace_set_clr_event into a helper funciton but did not initialize the return value. As a result, we do not warn about a typo in the echoing of events in set_event. This patch restores the old warning: # echo foobar > set_event -bash: echo: write error: Invalid argument [ Impact: restore warning of invalid entries to set_event ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-08tracing/events: simplify system_enable_read()Li Zefan
A smarter way to figure out the output of an enable file. [ Impact: clean up ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A0399A5.2080603@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08tracing/events: clean up for ftrace_set_clr_event()Li Zefan
Add a helper function __ftrace_set_clr_event(), and replace some ftrace_set_clr_event() calls with this helper, thus we don't need any kstrdup() or kmalloc(). As a side effect, this patch fixes an issue in self tests code, which is similar to the one fixed in commit d6bf81ef0f7474434c2a049e8bf3c9146a14dd96 ("tracing: append ":*" to internal setting of system events") It's a small issue and won't cause any bug in fact, but we should do things right anyway. [ Impact: prevent spurious event-enabling in tracing self-tests ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A03998E.3020503@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-07ring-buffer: change WARN_ON from checking preempt_count to preemptibleSteven Rostedt
There's a WARN_ON in the ring buffer code that makes sure preemption is disabled. It checks "!preempt_count()". But when CONFIG_PREEMPT is not enabled, preempt_count() is always zero, and this will trigger the warning. [ Impact: prevent false warning on non preemptible kernels ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-07ring-buffer: add total count in ring-buffer-benchmarkSteven Rostedt
It is nice to see the overhead of the benchmark test when tracing is disabled. That is, we turn off the ring buffer just to see what the cost of running the loop that calls into the ring buffer is. Currently, if no entries wer made, we get 0. This is not informative. This patch changes it to check if we had any "missed" (non recorded) events. If so, a total count is also reported. [ Impact: evaluate the over head of the ring buffer benchmark test ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-07ring-buffer: only periodically call cond_resched to ring-buffer-benchmarkSteven Rostedt
Calling cond_resched at every iteration of the loop adds a bit of overhead to the benchmark. This patch does two things. 1) only calls cond-resched when CONFIG_PREEMPT is not enabled 2) only calls cond-resched after so many traces has been performed. [ Impact: less overhead to the ring-buffer-benchmark ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-07tracing: have menu default enabled when kernel debug is configuredSteven Rostedt
Tracing can be very helpful to debug the kernel. When DEBUG_KERNEL is enabled it is nice to enable the trace menu as well. This patch only make the tracing menu enabled by default, it does not make any of the tracers enabled. And the menu is only enabled by default if DEBUG_KERNEL is enabled. [ Impact: show tracing options to those debugging the kernel ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-07tracing: append ":*" to internal setting of system eventsSteven Rostedt
The system enabling of events uses the same code as the set_event file. It passes in the name of the system to the parser and that will enable all the events that has that system as a name. The problem is that it will also enable events with the same name as the system. If you have system name foo, and system name bar, but within the system bar, there exists an event called foo. By setting the system name foo, you will also be enabling the event foo in the system bar. This is not an expected result. The solution is to pass in "foo:*", which will only enable the system foo and not events called foo. [ Impact: prevent accidental enabling of events with same name as a system ] Reported-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-07ring-buffer: remove complex calculations in ring-buffer-testSteven Rostedt
Ingo Molnar thought that the code to calculate the time in cond_resched is a bit too ugly and is not needed. This patch removes it and replaces it with a simple call to cond_resched. I kept the comment that explains the reason for the cond_resched. [ Impact: remove ugly code ] Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-07Merge branch 'tracing/hw-branch-tracing' into tracing/coreIngo Molnar
Merge reason: this topic is ready for upstream now. It passed Oleg's review and Andrew had no further mm/* objections/observations either. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-07Merge branch 'linus' into tracing/coreIngo Molnar
Merge reason: tracing/core was on a .30-rc1 base and was missing out on on a handful of tracing fixes present in .30-rc5-almost. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-07tracing/events: fix concurrent access to ftrace_events list, fixLi Zefan
In filter_add_subsystem_pred() we should release event_mutex before calling filter_free_subsystem_preds(), since both functions hold event_mutex. [ Impact: fix deadlock when writing invalid pred into subsystem filter ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: tzanussi@gmail.com Cc: a.p.zijlstra@chello.nl Cc: fweisbec@gmail.com Cc: rostedt@goodmis.org LKML-Reference: <4A028993.7020509@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-07tracing/filters: support for operator reserved characters in stringsFrederic Weisbecker
When we set a filter for an event, such as: echo "name == my_lock_name" > \ /debug/tracing/events/lockdep/lock_acquired/filter then the following order of token type is parsed: - space - operator - parentheses - operand Because the operators and parentheses have a higher precedence than the operand characters, which is normal, then we can't use any string containing such special characters: ()=<>!&| To get this support and also avoid ambiguous intepretation from the parser or the human, we can do it using double quotes so that we keep the usual languages habits. Then after this patch you can still declare string condition like before: echo name == myname But if you want to compare against a string containing an operator character, you can use double quotes: echo 'name == "&myname"' Don't forget to include the whole expression into single quotes or the double ones will be eaten by echo. [ Impact: support strings with special characters for tracing filters ] Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Zhaolei <zhaolei@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-07tracing/filters: support for filters of dynamic sized arraysFrederic Weisbecker
Currently the filtering infrastructure supports well the numeric types and fixed sized array types. But the recently added __string() field uses a specific indirect offset mechanism which requires a specific predicate. Until now it wasn't supported. This patch adds this support and implies very few changes, only a new predicate is needed, the management of this specific field can be done through the usual string helpers in the filtering infrastructure. [ Impact: support all kinds of strings in the tracing filters ] Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Zhaolei <zhaolei@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-06tracing: add hierarchical enabling of eventsSteven Rostedt
With the current event directory, you can only enable individual events. The file debugfs/tracing/set_event is used to be able to enable or disable several events at once. But that can still be awkward. This patch adds hierarchical enabling of events. That is, each directory in debugfs/tracing/events has an "enable" file. This file can enable or disable all events within the directory and below. # echo 1 > /debugfs/tracing/events/enable will enable all events. # echo 1 > /debugfs/tracing/events/sched/enable will enable all events in the sched subsystem. # echo 1 > /debugfs/tracing/events/enable # echo 0 > /debugfs/tracing/events/irq/enable will enable all events, but then disable just the irq subsystem events. When reading one of these enable files, there are four results: 0 - all events this file affects are disabled 1 - all events this file affects are enabled X - there is a mixture of events enabled and disabled ? - this file does not affect any event Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-06tracing: reset ring buffer when removing modules with eventsSteven Rostedt
Li Zefan found that there's a race using the event ids of events and modules. When a module is loaded, an event id is incremented. We only have 16 bits for event ids (65536) and there is a possible (but highly unlikely) race that we could load and unload a module that registers events so many times that the event id counter overflows. When it overflows, it then restarts and goes looking for available ids. An id is available if it was added by a module and released. The race is if you have one module add an id, and then is removed. Another module loaded can use that same event id. But if the old module still had events in the ring buffer, the new module's call back would get bogus data. At best (and most likely) the output would just be garbage. But if the module for some reason used pointers (not recommended) then this could potentially crash. The safest thing to do is just reset the ring buffer if a module that registered events is removed. [ Impact: prevent unpredictable results of event id overflows ] Reported-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <49FEAFD0.30106@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-06ring-buffer: change test to be more latency friendlySteven Rostedt
The ring buffer benchmark/test runs a producer for 10 seconds. This is done with preemption and interrupts enabled. But if the kernel is not compiled with CONFIG_PREEMPT, it basically stops everything but interrupts for 10 seconds. Although this is just a test and is not for production, this attribute can be quite annoying. It can also spawn badness elsewhere. This patch solves the issues by calling "cond_resched" when the system is not compiled with CONFIG_PREEMPT. It also keeps track of the time spent to call cond_resched such that it does not go against the time calculations. That is, if the task schedules away, the time scheduled out is removed from the test data. Note, this only works for non PREEMPT because we do not know when the task is scheduled out if we have PREEMPT enabled. [ Impact: prevent test from stopping the world for 10 seconds ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-05-06ring-buffer: make moving the tail page a separate functionSteven Rostedt
Ingo Molnar thought the code would be cleaner if we used a function call instead of a goto for moving the tail page. After implementing this, it seems that gcc still inlines the result and the output is pretty much the same. Since this is considered a cleaner approach, might as well implement it. [ Impact: code clean up ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>