aboutsummaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2009-04-01ring-buffer: do not remove reader page from list on ring buffer freeSteven Rostedt
Impact: prevent possible memory leak The reader page of the ring buffer is special. Although it points into the ring buffer, it is not part of the actual buffer. It is a page used by the reader to swap with a page in the ring buffer. Once the swap is made, the new reader page is again outside the buffer. Even though the reader page points into the buffer, it is really pointing to residual data. Note, this data is used by the reader. reader page | v (prev) +---+ (next) +----------| |----------+ | +---+ | v v +---+ +---+ +---+ -->| |------->| |------->| |---> <--| |<-------| |<-------| |<--- +---+ +---+ +---+ ^ ^ ^ \ | / ------- Buffer--------- If we perform a list_del_init() on the reader page we will actually remove the last page the reader swapped with and not the reader page itself. This will cause that page to not be freed, and thus is a memory leak. Luckily, the only user of the ring buffer so far is ftrace. And ftrace will not free its ring buffer after it allocates it. There is no current possible memory leak. But once there are other users, or if ftrace dynamically creates and frees its ring buffer, then this would be a memory leak. This patch fixes the leak for future cases. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-01function-graph: allow unregistering twiceSteven Rostedt
Impact: fix to permanent disabling of function graph tracer There should be nothing to prevent a tracer from unregistering a function graph callback more than once. This can simplify error paths. But currently, the counter does not account for mulitple unregistering of the function graph callback. If it happens, the function graph tracer will be permanently disabled. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31Merge branches 'tracing/docs', 'tracing/filters', 'tracing/ftrace', ↵Ingo Molnar
'tracing/kprobes', 'tracing/blktrace-v2' and 'tracing/textedit' into tracing/core-v2
2009-03-31trace: make argument 'mem' of trace_seq_putmem() constLi Zefan
Impact: fix build warning I passed a const value to trace_seq_putmem(), and I got compile warning. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31tracing: add missing 'extern' keywords to trace_output.hEduard - Gabriel Munteanu
Impact: cleanup Many declarations within trace_output.h are missing the 'extern' keyword in an inconsistent manner. This adds 'extern' where it should be. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31tracing: provide trace_seq_reserve()Eduard - Gabriel Munteanu
trace_seq_reserve() allows a caller to reserve space in a trace_seq and write directly into it. This makes it easier to export binary data to userspace via the tracing interface, by simply filling in a struct. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31blktrace: print out BLK_TN_MESSAGE properlyLi Zefan
Impact: improve ftrace plugin output Before this patch: # cat trace make-5383 [001] 741.240059: 8,7 P N [make] __trace_note_message: cfq1074 # echo 1 > options/blk_classic # cat trace 8,7 1 0.692221252 0 C W 130411392 + 1024 [0] Bad pc action 6361 Bad pc action 283d # echo 0 > options/blk_classic # echo bin > trace_options # cat trace_pipe | blkparse -i - (can't parse messages generated by blk_add_trace_msg()) After this patch: # cat trace <idle>-0 [001] 187.600933: 8,7 C W 145220224 + 8 [0] <idle>-0 [001] 187.600946: 8,7 m N cfq1076 complete # echo 1 > options/blk_classic # cat trace 8,7 1 0.256378996 238 I W 113190728 + 8 [pdflush] 8,7 1 0.256378998 238 m N cfq1076 insert_request # echo 0 > options/blk_classic # echo bin > trace_options # cat trace_pipe | blkparse -i - 8,7 1 0 22.973250293 0 C W 102770576 + 8 [0] 8,7 1 0 22.973259213 0 m N cfq1076 complete Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31blktrace: extract duplidate codeLi Zefan
Impact: cleanup blk_trace_event_print() and blk_tracer_print_line() share most of the code. text data bss dec hex filename 8605 393 12 9010 2332 kernel/trace/blktrace.o.orig text data bss dec hex filename 8555 393 12 8960 2300 kernel/trace/blktrace.o This patch also prepares for the next patch, that prints out BLK_TN_MESSAGE. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31blktrace: fix memory leak when freeing struct blk_io_traceLi Zefan
Impact: fix mixed ioctl and ftrace-plugin blktrace use memory leak When mixing the use of ioctl-based blktrace and ftrace-based blktrace, we can leak memory in this way: # btrace /dev/sda > /dev/null & # echo 0 > /sys/block/sda/sda1/trace/enable now we leak bt->dropped_file, bt->msg_file, bt->rchan... Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31blktrace: fix blk_probes_ref chaosLi Zefan
Impact: fix mixed ioctl and ftrace-plugin blktrace use refcount bugs ioctl-based blktrace allocates bt and registers tracepoints when ioctl(BLKTRACESETUP), and do all cleanups when ioctl(BLKTRACETEARDOWN). while ftrace-based blktrace allocates/frees bt when: # echo 1/0 > /sys/block/sda/sda1/trace/enable and registers/unregisters tracepoints when: # echo blk/nop > /debugfs/tracing/current_tracer or # echo 1/0 > /debugfs/tracing/tracing_enable The separatation of allocation and registeration causes 2 problems: 1. current user-space blktrace still calls ioctl(TEARDOWN) when ioctl(SETUP) failed: # echo 1 > /sys/block/sda/sda1/trace/enable # blktrace /dev/sda BLKTRACESETUP: Device or resource busy ^C and now blk_probes_ref == -1 2. Another way to make blk_probes_ref == -1: # plugin sdb && mount sdb1 # echo 1 > /sys/block/sdb/sdb1/trace/enable # remove sdb This patch does the allocation and registeration when writing sdaX/trace/enable. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31blktrace: make classic output more classicLi Zefan
Impact: fix ftrace plugin timestamp output In the classic user-space blktrace, the output timestamp is sec.nsec not sec.usec. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31blktrace: fix off-by-one bugLi Zefan
'what' is used as the index of array what2act, so it can't >= the array size. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31blktrace: fix the original blktraceLi Zefan
Currently the original blktrace, which is using relay and is used via ioctl, is broken. You can use ftrace to see the output of blktrace, but user-space blktrace is unusable. It's broken by "blktrace: add ftrace plugin" (c71a896154119f4ca9e89d6078f5f63ad60ef199) - if (unlikely(bt->trace_state != Blktrace_running)) + if (unlikely(bt->trace_state != Blktrace_running || !blk_tracer_enabled)) return; With this patch, both ioctl and ftrace can be used, but of course you can't use both of them at the same time. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31blktrace: fix a race when creating blk_tree_root in debugfsLi Zefan
t1 t2 ------ ------ do_blk_trace_setup() do_blk_trace_setup() if (!blk_tree_root) { if (!blk_tree_root) blk_tree_root = create_dir() blk_tree_root = create_dir(); (now blk_tree_root == NULL) ... dir = create_dir(name, blk_tree_root); Due to this race, t1 will create 'dir' in /debugfs but not /debugfs/block. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-31blktrace: fix timestamp in binary outputLi Zefan
I found the timestamp is wrong: # echo bin > trace_option # echo blk > current_tracer # cat trace_pipe | blkparse -i - 8,0 0 0 0.000000000 504 A W ... ... 8,7 1 0 0.008534097 0 C R ... (should be 8.534097xxx) user-space blkparse expects the timestamp to be nanosecond. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-26tracing: filter fix for TRACE_EVENT_FORMAT eventsTom Zanussi
Impact: fix crash (hang) when using TRACE_EVENT_FORMAT filter files filters are only hooked up to the tracepoint events defined using TRACE_EVENT but not the tracers that use TRACE_EVENT_FORMAT, such as ftrace. Do not display the filter files at all for TRACE_EVENT_FORMAT events for the time being. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237878882.8339.61.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release()Zhaolei
"Because when we call ftrace_free_rec we change the rec->ip to point to the next record in the chain. Something is very wrong if rec->ip >= s && rec->ip < e and the record is already free." "Note, use FTRACE_WARN_ON() macro. This way it shuts down ftrace if it is hit and helps to avoid further damage later." -- Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-25trace_workqueues: fix empty line's outputLai Jiangshan
Empty lines separate cpus stat. After previous fix(trace_stat: keep original order) applied, the empty lines are displayed at incorrect position. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <49C9F266.2060706@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25trace_stat: keep original orderLai Jiangshan
Impact: make trace_stat files show items with the original order trace_stat tracer reverse the items, it makes the output looks a little ugly. Example, when we read trace_stat/workqueues, we get cpu#7's stat. at first, and then cpu#6... cpu#0. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <49C9F23F.5040307@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-25trace_stat: don't call seq_printf() in seq_operation->start()Lai Jiangshan
Impact: Fix incorrect way using seq_file's API Use SEQ_START_TOKEN instead of calling ->stat_headers() int seq_operation->start(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Steven Rostedt <srostedt@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> LKML-Reference: <49C9EAE5.5070202@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24tracing: use union for multi-usages fieldLai Jiangshan
Impact: cleanup struct dyn_ftrace::ip has different usages in his lifecycle, we use union for it. And also for struct dyn_ftrace::flags. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <49C871BE.3080405@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24ftrace: show virtual PIDLai Jiangshan
Impact: fix PID output under namespaces When current namespace is not the global namespace, pid read from set_ftrace_pid is no correct. # ~/newpid_namespace_run bash # echo $$ 1 # echo 1 > set_ftrace_pid # cat set_ftrace_pid 3756 Since we write virtual PID to set_ftrace_pid, we need get virtual PID when we read it. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <49C84D65.9050606@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24function-graph: add option for include sleep timesSteven Rostedt
Impact: give user a choice to show times spent while sleeping The user may want to see the time a function spent sleeping. This patch adds the trace option "sleep-time" to allow that. The "sleep-time" option is default on. echo sleep-time > /debug/tracing/trace_options produces: ------------------------------------------ 2) avahi-d-3428 => <idle>-0 ------------------------------------------ 2) | finish_task_switch() { 2) 0.621 us | _spin_unlock_irq(); 2) 2.202 us | } 2) ! 1002.197 us | } 2) ! 1003.521 us | } where as, echo nosleep-time > /debug/tracing/trace_options produces: 0) <idle>-0 => yum-upd-3416 ------------------------------------------ 0) | finish_task_switch() { 0) 0.643 us | _spin_unlock_irq(); 0) 2.342 us | } 0) + 41.302 us | } 0) + 42.453 us | } Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-24function-graph: ignore times across scheduleSteven Rostedt
Impact: more accurate timings The current method of function graph tracing does not take into account the time spent when a task is not running. This shows functions that call schedule have increased costs: 3) + 18.664 us | } ------------------------------------------ 3) <idle>-0 => kblockd-123 ------------------------------------------ 3) | finish_task_switch() { 3) 1.441 us | _spin_unlock_irq(); 3) 3.966 us | } 3) ! 2959.433 us | } 3) ! 2961.465 us | } This patch uses the tracepoint in the scheduling context switch to account for time that has elapsed while a task is scheduled out. Now we see: ------------------------------------------ 3) <idle>-0 => edac-po-1067 ------------------------------------------ 3) | finish_task_switch() { 3) 0.685 us | _spin_unlock_irq(); 3) 2.331 us | } 3) + 41.439 us | } 3) + 42.663 us | } Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-24function-graph: prevent more than one tracer registeringSteven Rostedt
Impact: prevent crash due to multiple function graph tracers The function graph tracer can currently only handle a single tracer being registered. If another tracer registers with the function graph tracer it can crash the system. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-24function-graph: moved the timestamp from arch to generic codeSteven Rostedt
This patch move the timestamp from happening in the arch specific code into the general code. This allows for better control by the tracer to time manipulation. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-24tracing: fix memory leak in trace_statSteven Rostedt
If the function profiler does not have any items recorded and one were to cat the function stat file, the kernel would take a BUG with a NULL pointer dereference. Looking further into this, I found that returning NULL from stat_start did not stop the stat logic, and would later call stat_next. This breaks from the way seq_file works, so I looked into fixing the stat code. This is where I noticed that the last next_entry is never freed. It is allocated, and if the stat_next returns NULL, the code breaks out of the loop, unlocks the mutex and exits. We never link the next_entry nor do we free it. Thus it is a real memory leak. This patch rearranges the code a bit to not only fix the memory leak, but also to act more like seq_file where nothing is printed if there is nothing to print. That is, stat_start returns NULL. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-03-24blktrace: print human-readable act_maskLi Zefan
Impact: new feature, allow symbolic values in /debug/tracing/act_mask Print stringified act_mask instead of hex value: # cat act_mask read,write,barrier,sync,queue,requeue,issue,complete,fs,pc,ahead,meta, discard,drv_data # echo "meta,write" > act_mask # cat act_mask write,meta Also: - make act_mask accept "ahead", "meta", "discard" and "drv_data" - use strsep() instead of strchr() to parse user input - return -EINVAL if a token is not found in the mask map - fix a bug that 'value' is unsigned, so it can < 0 - propagate error value of blk_trace_mask2str() to userspace, but not always return -ENXIO. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <49C8AB42.1000802@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24blktrace: fix t_error()Li Zefan
Impact: fix error flag output t_error() should return t->error but not t->sector. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <49C8945F.5020802@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24blktrace: fix wrong calculation of RWBSLi Zefan
Impact: fix the output of IO type category characters Trace categories are the upper 16 bits, not the lower 16 bits. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <49C89432.8010805@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24blktrace: mark ddir_act[] constLi Zefan
Impact: cleanup ddir_act and what2act always stay immutable. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <49C89415.5080503@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24tracing/filters: disallow integer values for string filters and vice versaTom Zanussi
Impact: fix filter use boundary condition / crash Make sure filters for string fields don't use integer values and vice versa. Getting it wrong can crash the system or produce bogus results. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237878882.8339.61.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24tracing/filters: use trace_seq_printf() to print filtersTom Zanussi
Impact: cleanup Instead of just using the trace_seq buffer to print the filters, use trace_seq_printf() as it was intended to be used. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237878871.8339.59.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24tracing/filters: free pred when clearing filtersTom Zanussi
Impact: fix (small) per trace filter modification memory leak Free the current pred when clearing the filters via the filter files. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237878851.8339.58.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-24tracing/filters: use list_for_each_entryTom Zanussi
Impact: cleanup No need to use the safe version here, so use list_for_each_entry instead of list_for_each_entry_safe in find_event_field(). Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237878841.8339.57.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing/function-graph-tracer: fix functions call traces imbalanceFrederic Weisbecker
Impact: fix traces output Sometimes one can observe an imbalance in the traces between function calls and function return traces: func1() { } } The curly brace inside func1() is the return of another function nested inside func1. The return trace have been inserted in the buffer but not the entry. We are storing a return address on the function traces stack while we haven't inserted its entry on the buffer, hence the imbalance on the traces. This is because the tracers doesn't check all failures that can happen on buffer insertion. This patch reports the tracing recursion failures and the ring buffer failures. In such cases, we now restore the original return address for the function, giving up its return trace. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237843021-11695-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing: Fix TRACING_SUPPORT dependency for PPC32Anton Vorontsov
commit 40ada30f9621fbd831ac2437b9a2a399aa ("tracing: clean up menu"), despite the "clean up" in its purpose, introduced a behavioural change for Kconfig symbols: we no longer able to select tracing support on PPC32 (because IRQFLAGS_SUPPORT isn't yet implemented). The IRQFLAGS_SUPPORT is not mandatory for most tracers, tracing core has a special case for platforms w/o irqflags (which, by the way, has become useless as of the commit above). Though according to Ingo Molnar, there was periodic build failures on weird, unmaintained architectures that had no irqflags-tracing support and hence didn't know the raw_irqs_save/restore primitives. Thus we'd better not enable irqflags-less tracing for all architectures. This patch restores the old behaviour for PPC32, and thus brings the tracing back. Other architectures can either add themselves to the exception list or (better) implement TRACE_IRQFLAGS_SUPPORT. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-b: Steven Rostedt <rostedt@goodmis.org> Cc: linuxppc-dev@ozlabs.org LKML-Reference: <20090323220724.GA9851@oksana.dev.rtsoft.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing/ftrace: check if debugfs is registered before creating filesFrederic Weisbecker
Impact: fix a crash with ftrace={nop,boot} parameter If the nop or initcall tracers are launched as boot tracers, they will attempt to create their option directory and files. But these tracers are registered very early and then assigned as "boot tracers" very early if asked to. Since they do this before debugfs has been registered (core initcall), a crash is triggered. Another early tracers could also come later. So we fix it by checking if debugfs is initialized before creating the root tracing directory. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237759847-21025-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing/filters: clean up filter_add_subsystem_pred()Tom Zanussi
Impact: cleanup, memory leak fix This patch cleans up filter_add_subsystem_pred(): - searches for the field before creating a copy of the pred - fixes memory leak in the case a predicate isn't applied - if -ENOMEM, makes sure there's no longer a reference to the pred so the caller can free the half-finished filter - changes the confusing i == MAX_FILTER_PRED - 1 comparison previously remarked upon This affects only per-subsystem event filtering. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237796808.7527.40.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing/filters: fix bug in copy_pred()Tom Zanussi
Impact: fix potential crash on subsystem filter expression freeing When making a copy of the predicate, pred->field_name needs to be duplicated in the copy as well, otherwise bad things can happen due to later multiple frees of the same string. This affects only per-subsystem event filtering. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: =?ISO-8859-1?Q?Fr=E9d=E9ric?= Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237796802.7527.39.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing/filters: use list_for_each_entry_safeTom Zanussi
Impact: cleanup Use list_for_each_entry_safe instead of list_for_each_entry in find_event_field(). Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237796788.7527.35.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing/events: don't discard an event after commitFrederic Weisbecker
When we want to filter an event, the filter test is done after the event is commited to the ring-buffer to be discarded later if needed. But a reader could be reading this event while we are trying to discard it. Other kind of racy events can even happen because the event is commited and can be read and/or consumed. What we want is to discard the event before committing it. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <1237763919-21505-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing/ftrace: make nop-tracer use polling wait for events on pipeFrederic Weisbecker
Impact: display events when they arrive Now that the events don't use wake_up() anymore, we need the nop tracer to poll waiting for events on the pipe. Especially because nop is useful to look at orphan traces types (traces types that don't rely on specific tracers) because it doesn't produce traces itself. And unlike other tracers that trigger specific traces periodically, nop triggers no traces by itself that can wake him. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237759847-21025-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing/events: don't use wake up for eventsFrederic Weisbecker
Impact: fix hard-lockup with sched switch events Some ftrace events, such as sched wakeup, can be traced while the runqueue lock is hold. Since they are using trace_current_buffer_unlock_commit(), they call wake_up() which can try to grab the runqueue lock too, resulting in a deadlock. Now for all event, we call a new helper: trace_nowake_buffer_unlock_commit() which do pretty the same than trace_current_buffer_unlock_commit() except than it doesn't call trace_wake_up(). Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237759847-21025-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23tracing/events: make the filter files writableFrederic Weisbecker
We need the filter files to be writable, the current filter file permissions are only set readable. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <1237759847-21025-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-22tracing: add run-time field descriptions for event filtering, kfree fixIngo Molnar
Impact: fix potential kfree of random data in (rare) failure path Zero-initialize the field structure. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <1237710639.7703.46.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-22tracing: add per-subsystem filteringTom Zanussi
This patch adds per-subsystem filtering to the event tracing subsystem. It adds a 'filter' debugfs file to each subsystem directory. This file can be written to to set filters; reading from it will display the current set of filters set for that subsystem. Basically what it does is propagate the filter down to each event contained in the subsystem. If a particular event doesn't have a field with the name specified in the filter, it simply doesn't get set for that event. You can verify whether or not the filter was set for a particular event by looking at the filter file for that event. As with per-event filters, compound expressions are supported, echoing '0' to the subsystem's filter file clears all filters in the subsystem, etc. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237710677.7703.49.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-22tracing: add per-event filteringTom Zanussi
This patch adds per-event filtering to the event tracing subsystem. It adds a 'filter' debugfs file to each event directory. This file can be written to to set filters; reading from it will display the current set of filters set for that event. Basically, any field listed in the 'format' file for an event can be filtered on (including strings, but not yet other array types) using either matching ('==') or non-matching ('!=') 'predicates'. A 'predicate' can be either a single expression: # echo pid != 0 > filter # cat filter pid != 0 or a compound expression of up to 8 sub-expressions combined using '&&' or '||': # echo comm == Xorg > filter # echo "&& sig != 29" > filter # cat filter comm == Xorg && sig != 29 Only events having field values matching an expression will be available in the trace output; non-matching events are discarded. Note that a compound expression is built up by echoing each sub-expression separately - it's not the most efficient way to do things, but it keeps the parser simple and assumes that compound expressions will be relatively uncommon. In any case, a subsequent patch introducing a way to set filters for entire subsystems should mitigate any need to do this for lots of events. Setting a filter without an '&&' or '||' clears the previous filter completely and sets the filter to the new expression: # cat filter comm == Xorg && sig != 29 # echo comm != Xorg # cat filter comm != Xorg To clear a filter, echo 0 to the filter file: # echo 0 > filter # cat filter none The limit of 8 predicates for a compound expression is arbitrary - for efficiency, it's implemented as an array of pointers to predicates, and 8 seemed more than enough for any filter... Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237710665.7703.48.camel@charm-linux> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-22tracing: add ring_buffer_event_discard() to ring bufferTom Zanussi
This patch overloads RINGBUF_TYPE_PADDING to provide a way to discard events from the ring buffer, for the event-filtering mechanism introduced in a subsequent patch. I did the initial version but thanks to Steven Rostedt for adding the parts that actually made it work. ;-) Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-22tracing: fix four sparse warningsDmitri Vorobiev
Impact: cleanup. This patch fixes the following sparse warnings: kernel/trace/trace.c:385:9: warning: symbol 'trace_seq_to_buffer' was not declared. Should it be static? kernel/trace/trace_clock.c:29:13: warning: symbol 'trace_clock_local' was not declared. Should it be static? kernel/trace/trace_clock.c:54:13: warning: symbol 'trace_clock' was not declared. Should it be static? kernel/trace/trace_clock.c:74:13: warning: symbol 'trace_clock_global' was not declared. Should it be static? Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com> LKML-Reference: <1237741871-5827-4-git-send-email-dmitri.vorobiev@movial.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>