aboutsummaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2009-07-22perf_counter tools: Fix vmlinux symbol generation breakageMike Galbraith
vmlinux meets the criteria for symbol adjustment, which breaks vmlinux generated symbols. Fix this by exempting vmlinux. This is a bit fragile in that someone could change the kernel dso's name, but currently that name is also hardwired. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248091298.18702.18.camel@marge.simson.net>
2009-07-22perf_counter: Detect debugfs locationJason Baron
If "/sys/kernel/debug" is not a debugfs mount point, search for the debugfs filesystem in /proc/mounts, but also allows the user to specify '--debugfs-dir=blah' or set the environment variable: 'PERF_DEBUGFS_DIR' Signed-off-by: Jason Baron <jbaron@redhat.com> [ also made it probe "/debug" by default ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090721181629.GA3094@redhat.com>
2009-07-22perf_counter: Add tracepoint support to perf list, perf statJason Baron
Add support to 'perf list' and 'perf stat' for kernel tracepoints. The implementation creates a 'for_each_subsystem' and 'for_each_event' for easy iteration over the tracepoints. Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <426129bf9fcc8ee63bb094cf736e7316a7dcd77a.1248190728.git.jbaron@redhat.com>
2009-07-22perf symbol: C++ demanglingArnaldo Carvalho de Melo
[acme@doppio ~]$ perf report -s comm,dso,symbol -C firefox -d /usr/lib64/xulrunner-1.9.1/libxul.so | grep :: | head 2.21% [.] nsDeque::Push(void*) 1.78% [.] GraphWalker::DoWalk(nsDeque&) 1.30% [.] GCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*) 1.27% [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 1.18% [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&) 1.13% [.] nsDeque::PopFront() 1.11% [.] nsGlobalWindow::RunTimeout(nsTimeout*) 0.97% [.] nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&) 0.95% [.] nsJSEventListener::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) 0.95% [.] nsCOMPtr_base::~nsCOMPtr_base() [acme@doppio ~]$ Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Suggested-by: Clark Williams <williams@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090720171412.GB10410@ghostprotocols.net>
2009-07-22perf: avoid structure size confusion by using a fixed sizeArjan van de Ven
for some reason, this structure gets compiled as 36 bytes in some files (the ones that alloacte it) but 40 bytes in others (the ones that use it). The cause is an off_t type that gets a different size in different compilation units for some yet-to-be-explained reason. But the effect is disasterous; the size/offset members of the struct are at different offsets, and result in mostly complete garbage. The parser in perf is so robust that this all gets hidden, and after skipping an certain amount of samples, it recovers.... so this bug is not normally noticed. .... except when you want every sample to be exact. Fix this by just using an explicitly sized type. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4A655917.9080504@linux.intel.com>
2009-07-22perf_counter: Improve perf stat and perf record option parsingAnton Blanchard
perf stat and perf record currently look for all options on the command line. This can lead to some confusion: # perf stat ls -l Error: unknown switch `l' While we can work around this by adding '--' before the command, the git option parsing code can stop at the first non option: # perf stat ls -l Performance counter stats for 'ls -l': .... Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090722130412.GD9029@kryten>
2009-07-22perf_counter: PERF_SAMPLE_ID and inherited countersPeter Zijlstra
Anton noted that for inherited counters the counter-id as provided by PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID because each inherited counter gets its own id. His suggestion was to always return the parent counter id, since that is the primary counter id as exposed. However, these inherited counters have a unique identifier so that events like PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which counter gets modified, which is important when trying to normalize the sample streams. This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD, which is more useful anyway, since changing periods became a lot more common than initially thought -- rendering PERF_EVENT_PERIOD the less useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate value, since it reports the value used to trigger the overflow, whereas PERF_EVENT_PERIOD simply reports the requested period changed, which might only take effect on the next cycle). This still leaves us PERF_EVENT_THROTTLE to consider, but since that _should_ be a rare occurrence, and linking it to a primary id is the most useful bit to diagnose the problem, we introduce a PERF_SAMPLE_STREAM_ID, for those few cases where the full reconstruction is important. [Does change the ABI a little, but I see no other way out] Suggested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248095846.15751.8781.camel@twins>
2009-07-22Merge commit 'tip/perfcounters/core' into perf-counters-for-linusPeter Zijlstra
2009-07-18perf_counter: Make call graph option consistentAnton Blanchard
perf record uses -g for logging call graph data but perf report uses -c to print call graph data. Be consistent and use -g everywhere for call graph data. Also update the help text to reflect the current default - fractal,0.5 Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.803604373@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18perf_counter: Add perf record option to log addressesAnton Blanchard
Add the -d or --data option to log event addresses (eg page faults). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.697698033@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-18perf_counter: Synthesize VDSO mmap eventAnton Blanchard
perf record synthesizes mmap events for the running process. Right now it just catches file mappings, but we can check for the vdso symbol and add that too. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.517264409@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-13perf_counter tools: Fix index boundary checkRoel Kluin
Keep index within event_type_descriptors[] Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A5A7F0B.4070106@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11perf report: Introduce -n/--show-nr-samplesArnaldo Carvalho de Melo
[acme@doppio pahole]$ perf report -ns comm,dso,symbol -d /lib64/libc-2.10.1.so -C pahole | head -17 21.94% 32101 [.] _int_malloc 20.10% 29402 [.] __GI_strcmp 16.77% 24533 [.] __tsearch 12.61% 18450 [.] malloc_consolidate 6.42% 9394 [.] _int_free 6.28% 9191 [.] __tfind 4.56% 6678 [.] __GI___libc_free 4.46% 6520 [.] _IO_vfprintf_internal 2.59% 3786 [.] __malloc 1.17% 1716 [.] __GI_memcpy [acme@doppio pahole]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-5-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11perf_counter tools: PLT info is stripped in -debuginfo packagesArnaldo Carvalho de Melo
So we need to get the richer .symtab from the debuginfo packages but the PLT info from the original DSO where we have just the leaner .dynsym symtab. Example: | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > before | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > after | [acme@doppio pahole]$ diff -U1 before after | --- before 2009-07-11 11:04:22.688595741 -0300 | +++ after 2009-07-11 11:04:33.380595676 -0300 | @@ -80,3 +80,2 @@ | 0.07% pahole ./build/pahole [.] pahole_stealer | - 0.06% pahole /usr/lib64/libdw-0.141.so [.] 0x00000000007140 | 0.06% pahole /usr/lib64/libdw-0.141.so [.] __libdw_getabbrev | @@ -91,2 +90,3 @@ | 0.06% pahole [kernel] [k] free_hot_cold_page | + 0.06% pahole /usr/lib64/libdw-0.141.so [.] tfind@plt | 0.05% pahole ./build/libdwarves.so.1.0.0 [.] ftype__add_parameter | @@ -242,2 +242,3 @@ | 0.01% pahole [kernel] [k] account_group_user_time | + 0.01% pahole /usr/lib64/libdw-0.141.so [.] strlen@plt | 0.01% pahole ./build/pahole [.] strcmp@plt | [acme@doppio pahole]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-4-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11perf report: Make the output more compactArnaldo Carvalho de Melo
When we filter by column content we may end up with a column that has the same value for all the lines. So remove that column and tell its unique value on the top, as a comment. Example: [acme@doppio pahole]$ perf report --sort comm,dso,symbol -d ./build/libdwarves.so.1.0.0 -C pahole | head -15 # dso: ./build/libdwarves.so.1.0.0 # comm: pahole # Samples: 58409 # # Overhead Symbol # ........ ...... # 20.93% [.] tag__recode_dwarf_type 14.94% [.] namespace__recode_dwarf_types 10.38% [.] cu__table_add_tag 6.69% [.] __die__process_tag 5.05% [.] die__process_function 4.70% [.] list__for_all_tags 3.68% [.] tag__init 3.48% [.] die__create_new_parameter [acme@doppio pahole]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-3-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11strlist: Introduce strlist__entry and strlist__nr_entries methodsArnaldo Carvalho de Melo
The strlist__entry method allows accessing strlists like an array, will be used in the 'perf report' to access the first entry. We now keep the nr_entries so that we can check if we have just one entry, will be used in 'perf report' to improve the output by showing just at the top when we have just, say, one DSO. While at it use nr_entries to optimize strlist__is_empty by not using the far more costly rb_first based implementation. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-2-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11perf report: Tidy up reporting of symbols not foundArnaldo Carvalho de Melo
Always printing the level info about if it is in the kernel, hypervisor or userspace as that is in the hist_entry. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-1-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11perf report: Adjust column width to the values sampledArnaldo Carvalho de Melo
Auto-adjust column width of perf report output to the longest occuring string length. Example: [acme@doppio pahole]$ perf report --sort comm,dso,symbol | head -13 12.79% pahole /usr/lib64/libdw-0.141.so [.] __libdw_find_attr 8.90% pahole /lib64/libc-2.10.1.so [.] _int_malloc 8.68% pahole /usr/lib64/libdw-0.141.so [.] __libdw_form_val_len 8.15% pahole /lib64/libc-2.10.1.so [.] __GI_strcmp 6.80% pahole /lib64/libc-2.10.1.so [.] __tsearch 5.54% pahole ./build/libdwarves.so.1.0.0 [.] tag__recode_dwarf_type [acme@doppio pahole]$ [acme@doppio pahole]$ perf report --sort comm,dso,symbol -d /lib64/libc-2.10.1.so | head -10 21.92% pahole /lib64/libc-2.10.1.so [.] _int_malloc 20.08% pahole /lib64/libc-2.10.1.so [.] __GI_strcmp 16.75% pahole /lib64/libc-2.10.1.so [.] __tsearch [acme@doppio pahole]$ Also add these extra options to control the new behaviour: -w, --field-width Force each column width to the provided list, for large terminal readability. -t, --field-separator: Use a special separator character and don't pad with spaces, replacing all occurances of this separator in symbol names (and other output) with a '.' character, that thus it's the only non valid separator. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090711014728.GH3452@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10perf_counter: Add P6 PMU supportVince Weaver
Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to enable/disable both its counters. We use this for the global enable/disable, and clear all config bits (except EN) to disable individual counters. Actual ia32 hardware doesn't support lfence, so use a locked op without side-effect to implement a full barrier. perf stat and perf record seem to function correctly. [a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code] Signed-off-by: Vince Weaver <vince@deater.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <Pine.LNX.4.64.0907081718450.2715@pianoman.cluster.toy> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10perf_counter tools: Rename cache events to remove $Anton Blanchard
The cache events contain '$' which will hit shell variable expansion. To avoid confusion change this to 'cache', ie L1-d$-loads becomes L1-dcache-loads. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Roland Dreier <rdreier@cisco.com> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20090706120131.GB4391@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-05perf report: Add "Fractal" mode output - support callchains with relative ↵Frederic Weisbecker
overhead rate The current callchain displays the overhead rates as absolute: relative to the total overhead. This patch provides relative overhead percentage, in which each branch of the callchain tree is a independant instrumentated object. This provides a 'fractal' view of the call-chain profile: each sub-graph looks like a profile in itself - relative to its parent. You can produce such output by using the "fractal" mode that you can abbreviate via f, fr, fra, frac, etc... ./perf report -s sym -c fractal Example: 8.46% [k] copy_user_generic_string | |--52.01%-- generic_file_aio_read | do_sync_read | vfs_read | | | |--97.20%-- sys_pread64 | | system_call_fastpath | | pread64 | | | --2.81%-- sys_read | system_call_fastpath | __read | |--39.85%-- generic_file_buffered_write | __generic_file_aio_write_nolock | generic_file_aio_write | do_sync_write | reiserfs_file_write | vfs_write | | | |--97.05%-- sys_pwrite64 | | system_call_fastpath | | __pwrite64 | | | --2.95%-- sys_write | system_call_fastpath | __write_nocancel [...] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246772361-9960-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-05perf_counter tools: callchains: Manage the cumul hits on the flyFrederic Weisbecker
The cumul hits are the number of hits of every childs of a node plus the hits of the current nodes, required for percentage computing of a branch. Theses numbers are calculated during the sorting of the branches of the callchain tree using a depth first postfix traversal, so that cumulative hits are propagated in the right order. But if we plan to implement percentages relative to the parent and not absolute percentages (relative to the whole overhead), we need to know the cumulative hits of the parent before computing the children because the relative minimum acceptable number of entries (ie: minimum rate against the cumulative hits from the parent) is the basis to filter the children against a given rate. Then we need to handle the cumul hits on the fly to prepare the implementation of relative overhead rates. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246772361-9960-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-05perf report: Change default callchain parametersFrederic Weisbecker
The default callchain parameters are set to use the flat mode and never filter any overhead threshold of backtrace. But flat mode is boring compared to graph mode. Also the number of callchains may be very high if none is filtered. Let's change this to set the graph view and a minimum overhead of 0.5% as default parameters. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246772361-9960-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-05perf report: Use a modifiable string for default callchain optionsFrederic Weisbecker
If the user doesn't provide options to tune his callchain output (ie: if he uses -c without arguments) then the default value passed in the OPT_CALLBACK_DEFAULT() macro is used. But it's parsed later by strtok() which will replace comma separators to a zero. This may segfault as we are using a read-only string. Use a modifiable one instead, and also fix the "100%" default minimum threshold value by turning it into a 0 (output every callchains) as it was intended in the origin. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246772361-9960-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-05perf report: Warn on callchain output request from non-callchain fileFrederic Weisbecker
perf report segfaults while trying to handle callchains from a non callchain data file. Instead of a segfault, print a useful message to the user. Reported-by: Jens Axboe <jens.axboe@oracle.com> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246772361-9960-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03perf report: Annotate variable initializationIngo Molnar
Certain versions of GCC dont see the initialization that is done here: builtin-report.c: In function ‘__cmd_report’: builtin-report.c:1038: warning: ‘syms’ may be used uninitialized in this function So annotate it with a NULL initialization. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03perf_counter tools: Adjust symbols in ET_EXEC files tooArnaldo Carvalho de Melo
Ingo Molnar wrote: > i just bisected a 'perf report' bug that would cause us to not > resolve all user-space symbols in a 'git gc' run to: > > f5812a7a336fb952d819e4427b9a2dce02368e82 is first bad commit > commit f5812a7a336fb952d819e4427b9a2dce02368e82 > Author: Arnaldo Carvalho de Melo <acme@redhat.com> > Date: Tue Jun 30 11:43:17 2009 -0300 > > perf_counter tools: Adjust only prelinked symbol's addresses Rename ->prelinked to ->adjust_symbols and making what was done only for prelinked libraries also to ET_EXEC binaries, such as /usr/bin/git: [acme@doppio pahole]$ readelf -h /usr/bin/git | grep Type Type: EXEC (Executable file) [acme@doppio pahole]$ And after installing the 'git-debuginfo' package, I get correct results: [acme@doppio linux-2.6-tip]$ perf report --sort comm,dso,symbol -d /usr/bin/git | head -20 # # (1139614 samples) # # Overhead Command Shared Object Symbol # ........ ................ ......................... ...... # 34.98% git /usr/bin/git [.] send_sideband 33.39% git /usr/bin/git [.] enter_repo 6.81% git /usr/bin/git [.] diff_opt_parse 4.95% git /usr/bin/git [.] is_repository_shallow 3.24% git /usr/bin/git [.] odb_mkstemp 1.39% git /usr/bin/git [.] output 1.34% git /usr/bin/git [.] xmmap 1.25% git /usr/bin/git [.] receive_pack_config 1.16% git /usr/bin/git [.] git_pathdup 0.90% git /usr/bin/git [.] read_object_with_reference 0.86% git /usr/bin/git [.] show_patch_diff 0.85% git /usr/bin/git 0x00000000095e2e 0.69% git /usr/bin/git [.] display [acme@doppio linux-2.6-tip]$ I'll check what are the last cases where we can't resolve symbols, like this 0x00000000095e2e later. And I guess this will fix the problems Mike were seeing too: [acme@doppio linux-2.6-tip]$ readelf -h ../build/perf/vmlinux | grep Type Type: EXEC (Executable file) [acme@doppio linux-2.6-tip]$ Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf_counter tools: Display percents of hits in callchain with overhead colorsFrederic Weisbecker
This adds the use of colors to signal at a glance the important overhead thresholds in callchains hit rates. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246558475-10624-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf_counter tools: Provide helper to print percents colorFrederic Weisbecker
Among perf annotate, perf report and perf top, we can find the common colored printing of percents according to the following rules: High overhead = > 5%, colored in red Mid overhead = > 0.5%, colored in green Low overhead = < 0.5%, default color Factorize these multiple checks in a single function named percent_color_fprintf() and also provide a get_percent_color() for sites which print percentages and other things at the same time. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246558475-10624-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf_counter tools: Set the minimum percent for callchains to be displayedFrederic Weisbecker
Callchains output may become a burden on a trace because even rarely hit site are exposed. This can be too much information. Let the user set a threshold as a minimum percent of hits using the new pattern for the -c option: -c mode,min_percent Example: $ perf report -s sym -c flat,4 8.25% [k] copy_user_generic_string 4.19% copy_user_generic_string generic_file_aio_read do_sync_read vfs_read sys_pread64 system_call_fastpath pread64 5.39% [k] search_by_key 4.63% 0x00000000009e0a 2.36% [k] memcpy_c [...] $ perf report -s sym -c graph,2 8.25% [k] copy_user_generic_string | |--4.31%-- generic_file_aio_read | do_sync_read | vfs_read | | | --4.19%-- sys_pread64 | system_call_fastpath | pread64 | --3.24%-- generic_file_buffered_write __generic_file_aio_write_nolock generic_file_aio_write do_sync_write reiserfs_file_write vfs_write | --3.14%-- sys_pwrite64 system_call_fastpath __pwrite64 5.39% [k] search_by_key | --2.23%-- reiserfs_update_sd_size 4.63% 0x00000000009e0a 2.36% [k] memcpy_c [...] You can also omit it and it will default to 0. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246558475-10624-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf report: Add support for callchain graph outputFrederic Weisbecker
Currently, the printing of callchains is done in a single vertical level, this is the "flat" mode: 8.25% [k] copy_user_generic_string 4.19% copy_user_generic_string generic_file_aio_read do_sync_read vfs_read sys_pread64 system_call_fastpath pread64 This patch introduces a new "graph" mode which provides a hierarchical output of factorized paths recursively sorted: 8.25% [k] copy_user_generic_string | |--4.31%-- generic_file_aio_read | do_sync_read | vfs_read | | | |--4.19%-- sys_pread64 | | system_call_fastpath | | pread64 | | | --0.12%-- sys_read | system_call_fastpath | __read | |--3.24%-- generic_file_buffered_write | __generic_file_aio_write_nolock | generic_file_aio_write | do_sync_write | reiserfs_file_write | vfs_write | | | |--3.14%-- sys_pwrite64 | | system_call_fastpath | | __pwrite64 | | | --0.10%-- sys_write [...] The command line has then changed. By providing the -c option, the callchain will output in the flat mode by default. But you can override it: perf report -c graph or perf report -c flat You can also pass the abreviated mode: perf report -c g or perf report -c gra will both make use of the graph mode. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246550301-8954-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf_counter tools: Add new OPT_CALLBACK_DEFAULT optionFrederic Weisbecker
There is no predefined macro to create an option that can have a custom value or a default one if none is given. This patch provides a new helper OPT_CALLBACK_DEFAULT() which defines such kind of option. For example, considering an option -c, we want to get the default value in the following cases: perf command -c -d perf command -d -c And the foo value when it's given: perf command -c foo -d perf command -d -c foo That's also why PARSE_OPT_LASTARG_DEFAULT is extended here to support default values whatever the position of the option, not only in the end. Should it now be renamed to PARSE_OPT_ARG_DEFAULT ? Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: git@vger.kernel.org LKML-Reference: <1246550301-8954-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf_counter tools: Create new chain_for_each_child() iteratorFrederic Weisbecker
Iterating through children of a node in the callchain tree shows something that may be quite confusing at a first glance. The head is the children field of the parent and the list nodes are in the brothers field of the children. This is because the childs are linked to the parent as a list of "brothers" using the "children" list of the parent as a head: --------------- | Parent (head) |------------------------------------- --------------- | | | children | | | ----------- ----------- | | 1st child |---brother---| 2nd child |---brother----- ----------- ----------- This makes the following strange pattern often occuring: list_for_each_entry(child, &parent->children, brothers) { // do something with children } Abstract it to chain_for_each_child() to factorize and simplify this pattern. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246550301-8954-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf_counter tools: Enable kernel module symbol loading in toolsMike Galbraith
Add the -m/--modules option to perf report and perf annotate, which enables live module symbol/image loading. To be used with -k/--vmlinux. (Also give perf annotate a -P/--full-paths option.) Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246514986.13293.48.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf_counter tools: Connect module support infrastructure to symbol loading ↵Mike Galbraith
infrastructure Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246514916.13293.46.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf_counter tools: Add infrastructure to support loading of kernel module ↵Mike Galbraith
symbols Add infrastructure for module path discovery and section load addresses. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246514830.13293.44.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02perf_counter tools: Make symbol loading consistently return number of loaded ↵Mike Galbraith
symbols perf_counter tools: Make symbol loading consistently return number of loaded symbols. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246514758.13293.42.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf stat: Handle pipe read failures in perf statFrederic Weisbecker
Building builtin-stat.c reports the following errors: cc1: warnings being treated as errors builtin-stat.c: In function ‘run_perf_stat’: builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result make: *** [builtin-stat.o] Erreur 1 This patch handles the possible pipe read failures. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246474930-6088-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf_counter tools: Share list.h with the kernelArnaldo Carvalho de Melo
The copy we were using came from another copy I did for the dwarves (pahole) package, that came from the kernel years ago. The only function that is used by the perf tools and that isn't in the kernel is list_del_range, that I'm leaving in the perf tools only for now. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090701174608.GA5823@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf_counter tools: Share rbtree.with the kernelArnaldo Carvalho de Melo
The tools/perf/util/rbtree.c copy already drifted by three csets: 4b324126e0c6c3a5080ca3ec0981e8766ed6f1ee 4c60117811171d867d4f27f17ea07d7419d45dae 16c047add3ceaf0ab882e3e094d1ec904d02312d So remove the copy and use the lib/rbtree.c directly, sharing the source code while still generating a separate object file, since tools/perf uses a far more agressive -O6 switch. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20090701152837.GG15682@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf list: Add cache eventsJaswinder Singh Rajput
After: $ ./perf list List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults OR faults [Software event] minor-faults [Software event] major-faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] L1-d$-loads [Hardware cache event] L1-d$-load-misses [Hardware cache event] L1-d$-stores [Hardware cache event] L1-d$-store-misses [Hardware cache event] L1-d$-prefetches [Hardware cache event] L1-d$-prefetch-misses [Hardware cache event] L1-i$-loads [Hardware cache event] L1-i$-load-misses [Hardware cache event] L1-i$-prefetches [Hardware cache event] L1-i$-prefetch-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-stores [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-prefetches [Hardware cache event] LLC-prefetch-misses [Hardware cache event] dTLB-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-stores [Hardware cache event] dTLB-store-misses [Hardware cache event] dTLB-prefetches [Hardware cache event] dTLB-prefetch-misses [Hardware cache event] iTLB-loads [Hardware cache event] iTLB-load-misses [Hardware cache event] branch-loads [Hardware cache event] branch-load-misses [Hardware cache event] rNNN [raw hardware event descriptor] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1246453578.3072.1.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf stat: Define MATCH_EVENT for easy attr checkingJaswinder Singh Rajput
MATCH_EVENT is useful: 1. for multiple attrs checking 2. avoid repetition of PERF_TYPE_ and PERF_COUNT_ and save space 3. avoids line breakage Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1246440909.3403.5.camel@hpdv5.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf_counter tools: Add more warnings and fix/annotate themIngo Molnar
Enable -Wextra. This found a few real bugs plus a number of signed/unsigned type mismatches/uncleanlinesses. It also required a few annotations All things considered it was still worth it so lets try with this enabled for now. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf report: Fix HV bit mismergeIngo Molnar
Fix: builtin-report.c: In function ‘hist_entry__add’: builtin-report.c:1015: error: case label not within a switch statement builtin-report.c:1017: error: break statement not within loop or switch Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf_counter tools: Rework event string parsing/syntaxPaul Mackerras
This reworks the parser for event descriptors to make it more consistent in what it accepts. It is now structured as a recursive descent parser for the following grammar: events ::= event ( ("," | space) space* event )* event ::= ( raw_event | numeric_event | symbolic_event | generic_hw_event ) [ event_modifier ] raw_event ::= "r" hex_number numeric_event ::= number ":" number number ::= decimal_number | "0x" hex_number | "0" octal_number symbolic_event ::= string_from_event_symbols_array generic_hw_event::= cache_type ( "-" ( cache_op | cache_result ) )* event_modifier ::= ":" ( "u" | "k" | "h" )+ with the extra restriction that you can have at most one cache_op and at most one cache_result. We pass the current string pointer by reference (i.e. as a const char **) to the various parsing functions so that they can advance the pointer to indicate how much they consumed. They return 0 if they didn't recognize the thing at the pointer or 1 if they did (and advance the pointer past it). This also fixes parse_aliases to take the longest matching alias from the table, not the first one. Otherwise "l1-data" would match the "l1-d" alias and the "ata" would not be consumed. This allows event modifiers indicating what processor modes to count in to be applied to any event, not just numeric events, and adds a ":h" modifier to indicate counting in hypervisor mode. Specifying ":u" now sets both exclude_kernel and exclude_hv, and so on. Multiple modes can be specified, e.g. ":uk" will count in user or hypervisor mode (i.e. only exclude_kernel will be set). Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <19018.53826.843815.189847@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf_counter tools: Various fixes for callchainsFrederic Weisbecker
The symbol resolving has of course revealed some bugs in the callchain tree handling. This patch fixes some of them, including: - inherit the children from the parents while splitting a node - fix list range moving - fix indexes setting in callchains - create a child on the current node if the path doesn't match in the existent children (was only done on the root) - compare using symbols when possible so that we can match a function using any ip inside by referring to its start address. The practical effects are: - remove double callchains - fix upside down or any random order of callchains - fix wrong paths - fix bad hits and percentage accounts Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246419315-9968-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf_counter tools: Resolve symbols in callchainsFrederic Weisbecker
This patch resolves the names, when possible, of each ip present in the callchains while using the -c option with perf report. Example: 5.40% [k] __d_lookup 5.37% perf_callchain perf_counter_overflow intel_pmu_handle_irq perf_counter_nmi_handler notifier_call_chain atomic_notifier_call_chain notify_die do_nmi nmi do_lookup __link_path_walk path_walk do_path_lookup user_path_at sys_faccessat sys_access system_call_fastpath 0x7fb609846f77 0.01% perf_callchain perf_counter_overflow intel_pmu_handle_irq perf_counter_nmi_handler notifier_call_chain atomic_notifier_call_chain notify_die do_nmi nmi do_lookup __link_path_walk path_walk do_path_lookup user_path_at sys_faccessat Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246419315-9968-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01perf_counter tools: Fix storage size allocation of callchain listFrederic Weisbecker
Fix a confusion while giving the size of a callchain list during its allocation. We are using the wrong structure size. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1246419315-9968-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01Merge branch 'linus' into perfcounters/urgentIngo Molnar
Merge reason: this branch was on a .30-ish base before, update it to an almost-.31-rc2 upstream base to pick up fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-30Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits) perf report: Add --symbols parameter perf report: Add --comms parameter perf report: Add --dsos parameter perf_counter tools: Adjust only prelinked symbol's addresses perf_counter: Provide a way to enable counters on exec perf_counter tools: Reduce perf stat measurement overhead/skew perf stat: Use percentages for scaling output perf_counter, x86: Update x86_pmu after WARN() perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run perf stat: Improve output perf stat: Fix multi-run stats perf stat: Add -n/--null option to run without counters perf_counter tools: Remove dead code perf_counter: Complete counter swap perf report: Print sorted callchains per histogram entries perf_counter tools: Prepare a small callchain framework perf record: Fix unhandled io return value perf_counter tools: Add alias for 'l1d' and 'l1i' perf-report: Add bare minimum PERF_EVENT_READ parsing perf-report: Add modes for inherited stats and no-samples ...