aboutsummaryrefslogtreecommitdiff
path: root/kernel/perf_counter.c
AgeCommit message (Collapse)Author
2009-05-25perf_counter: Generic per counter interrupt throttlePeter Zijlstra
Introduce a generic per counter interrupt throttle. This uses the perf_counter_overflow() quick disable to throttle a specific counter when its going too fast when a pmu->unthrottle() method is provided which can undo the quick disable. Power needs to implement both the quick disable and the unthrottle method. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525153931.703093461@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-25perf_counter: Fix PERF_COUNTER_CONTEXT_SWITCHES for cpu countersPeter Zijlstra
Ingo noticed that cpu counters had 0 context switches, even though there was plenty scheduling on the cpu. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525124600.419025548@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-25perf_counter: Propagate inheritance failures down the fork() pathPeter Zijlstra
Fail fork() when we fail inheritance for some reason (-ENOMEM most likely). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525124600.324656474@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-25perf_counter: Make pctrl() affect inherited counters tooPeter Zijlstra
Paul noted that the new ptcrl() didn't work on child counters. Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090525124600.203151469@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-24perf_counter: Increase mmap limitIngo Molnar
In a default 'perf top' run the tool will create a counter for each online CPU. With enough CPUs this will eventually exhaust the default limit. So scale it up with the number of online CPUs. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-24perf_counter: Remove perf_counter_context::nr_enabledPeter Zijlstra
now that pctrl() no longer disables other people's counters, remove the PMU cache code that deals with that. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090523163013.032998331@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-24perf_counter: Change pctrl() behaviourPeter Zijlstra
Instead of en/dis-abling all counters acting on a particular task, en/dis- able all counters we created. [ v2: fix crash on first counter enable ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090523163012.916937244@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-23perf_counter: Simplify context cleanupPeter Zijlstra
Use perf_counter_remove_from_context() to remove counters from the context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090523163012.796275849@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-23perf_counter: Sanitize context lockingPeter Zijlstra
Ensure we're consistent with the context locks. context->mutex context->lock list_{add,del}_counter(); so that either lock is sufficient to stabilize the context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090523163012.618790733@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-23perf_counter: Sanitize counter->mutexPeter Zijlstra
s/counter->mutex/counter->child_mutex/ and make sure its only used to protect child_list. The usage in __perf_counter_exit_task() doesn't appear to be problematic since ctx->mutex also covers anything related to fd tear-down. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090523163012.533186528@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-23perf_counter: Fix dynamic irq_period loggingPeter Zijlstra
We call perf_adjust_freq() from perf_counter_task_tick() which is is called under the rq->lock causing lock recursion. However, it's no longer required to be called under the rq->lock, so remove it from under it. Also, fix up some related comments. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090523163012.476197912@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-22perf_counter: Optimize context switch between identical inherited contextsPaul Mackerras
When monitoring a process and its descendants with a set of inherited counters, we can often get the situation in a context switch where both the old (outgoing) and new (incoming) process have the same set of counters, and their values are ultimately going to be added together. In that situation it doesn't matter which set of counters are used to count the activity for the new process, so there is really no need to go through the process of reading the hardware counters and updating the old task's counters and then setting up the PMU for the new task. This optimizes the context switch in this situation. Instead of scheduling out the perf_counter_context for the old task and scheduling in the new context, we simply transfer the old context to the new task and keep using it without interruption. The new context gets transferred to the old task. This means that both tasks still have a valid perf_counter_context, so no special case is introduced when the old task gets scheduled in again, either on this CPU or another CPU. The equivalence of contexts is detected by keeping a pointer in each cloned context pointing to the context it was cloned from. To cope with the situation where a context is changed by adding or removing counters after it has been cloned, we also keep a generation number on each context which is incremented every time a context is changed. When a context is cloned we take a copy of the parent's generation number, and two cloned contexts are equivalent only if they have the same parent and the same generation number. In order that the parent context pointer remains valid (and is not reused), we increment the parent context's reference count for each context cloned from it. Since we don't have individual fds for the counters in a cloned context, the only thing that can make two clones of a given parent different after they have been cloned is enabling or disabling all counters with prctl. To account for this, we keep a count of the number of enabled counters in each context. Two contexts must have the same number of enabled counters to be considered equivalent. Here are some measurements of the context switch time as measured with the lat_ctx benchmark from lmbench, comparing the times obtained with and without this patch series: -----Unmodified----- With this patch series Counters: none 2 HW 4H+4S none 2 HW 4H+4S 2 processes: Average 3.44 6.45 11.24 3.12 3.39 3.60 St dev 0.04 0.04 0.13 0.05 0.17 0.19 8 processes: Average 6.45 8.79 14.00 5.57 6.23 7.57 St dev 1.27 1.04 0.88 1.42 1.46 1.42 32 processes: Average 5.56 8.43 13.78 5.28 5.55 7.15 St dev 0.41 0.47 0.53 0.54 0.57 0.81 The numbers are the mean and standard deviation of 20 runs of lat_ctx. The "none" columns are lat_ctx run directly without any counters. The "2 HW" columns are with lat_ctx run under perfstat, counting cycles and instructions. The "4H+4S" columns are lat_ctx run under perfstat with 4 hardware counters and 4 software counters (cycles, instructions, cache references, cache misses, task clock, context switch, cpu migrations, and page faults). [ Impact: performance optimization of counter context-switches ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <18966.10666.517218.332164@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-22perf_counter: Dynamically allocate tasks' perf_counter_context structPaul Mackerras
This replaces the struct perf_counter_context in the task_struct with a pointer to a dynamically allocated perf_counter_context struct. The main reason for doing is this is to allow us to transfer a perf_counter_context from one task to another when we do lazy PMU switching in a later patch. This has a few side-benefits: the task_struct becomes a little smaller, we save some memory because only tasks that have perf_counters attached get a perf_counter_context allocated for them, and we can remove the inclusion of <linux/perf_counter.h> in sched.h, meaning that we don't end up recompiling nearly everything whenever perf_counter.h changes. The perf_counter_context structures are reference-counted and freed when the last reference is dropped. A context can have references from its task and the counters on its task. Counters can outlive the task so it is possible that a context will be freed well after its task has exited. Contexts are allocated on fork if the parent had a context, or otherwise the first time that a per-task counter is created on a task. In the latter case, we set the context pointer in the task struct locklessly using an atomic compare-and-exchange operation in case we raced with some other task in creating a context for the subject task. This also removes the task pointer from the perf_counter struct. The task pointer was not used anywhere and would make it harder to move a context from one task to another. Anything that needed to know which task a counter was attached to was already using counter->ctx->task. The __perf_counter_init_context function moves up in perf_counter.c so that it can be called from find_get_context, and now initializes the refcount, but is otherwise unchanged. We were potentially calling list_del_counter twice: once from __perf_counter_exit_task when the task exits and once from __perf_counter_remove_from_context when the counter's fd gets closed. This adds a check in list_del_counter so it doesn't do anything if the counter has already been removed from the lists. Since perf_counter_task_sched_in doesn't do anything if the task doesn't have a context, and leaves cpuctx->task_ctx = NULL, this adds code to __perf_install_in_context to set cpuctx->task_ctx if necessary, i.e. in the case where the current task adds the first counter to itself and thus creates a context for itself. This also adds similar code to __perf_counter_enable to handle a similar situation which can arise when the counters have been disabled using prctl; that also leaves cpuctx->task_ctx = NULL. [ Impact: refactor counter context management to prepare for new feature ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <18966.10075.781053.231153@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-20perf_counter: Fix context removal deadlockIngo Molnar
Disable the PMU globally before removing a counter from a context. This fixes the following lockup: [22081.741922] ------------[ cut here ]------------ [22081.746668] WARNING: at arch/x86/kernel/cpu/perf_counter.c:803 intel_pmu_handle_irq+0x9b/0x24e() [22081.755624] Hardware name: X8DTN [22081.758903] perfcounters: irq loop stuck! [22081.762985] Modules linked in: [22081.766136] Pid: 11082, comm: perf Not tainted 2.6.30-rc6-tip #226 [22081.772432] Call Trace: [22081.774940] <NMI> [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e [22081.781993] [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e [22081.788368] [<ffffffff8104505c>] ? warn_slowpath_common+0x77/0xa3 [22081.794649] [<ffffffff810450d3>] ? warn_slowpath_fmt+0x40/0x45 [22081.800696] [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e [22081.807080] [<ffffffff814d1a72>] ? perf_counter_nmi_handler+0x3f/0x4a [22081.813751] [<ffffffff814d2d09>] ? notifier_call_chain+0x58/0x86 [22081.819951] [<ffffffff8105b250>] ? notify_die+0x2d/0x32 [22081.825392] [<ffffffff814d1414>] ? do_nmi+0x8e/0x242 [22081.830538] [<ffffffff814d0f0a>] ? nmi+0x1a/0x20 [22081.835342] [<ffffffff8117e102>] ? selinux_file_free_security+0x0/0x1a [22081.842105] [<ffffffff81018793>] ? x86_pmu_disable_counter+0x15/0x41 [22081.848673] <<EOE>> [<ffffffff81018f3d>] ? x86_pmu_disable+0x86/0x103 [22081.855512] [<ffffffff8108fedd>] ? __perf_counter_remove_from_context+0x0/0xfe [22081.862926] [<ffffffff8108fcbc>] ? counter_sched_out+0x30/0xce [22081.868909] [<ffffffff8108ff36>] ? __perf_counter_remove_from_context+0x59/0xfe [22081.876382] [<ffffffff8106808a>] ? smp_call_function_single+0x6c/0xe6 [22081.882955] [<ffffffff81091b96>] ? perf_release+0x86/0x14c [22081.888600] [<ffffffff810c4c84>] ? __fput+0xe7/0x195 [22081.893718] [<ffffffff810c213e>] ? filp_close+0x5b/0x62 [22081.899107] [<ffffffff81046a70>] ? put_files_struct+0x64/0xc2 [22081.905031] [<ffffffff8104841a>] ? do_exit+0x1e2/0x6ef [22081.910360] [<ffffffff814d0a60>] ? _spin_lock_irqsave+0x9/0xe [22081.916292] [<ffffffff8104898e>] ? do_group_exit+0x67/0x93 [22081.921953] [<ffffffff810489cc>] ? sys_exit_group+0x12/0x16 [22081.927759] [<ffffffff8100baab>] ? system_call_fastpath+0x16/0x1b [22081.934076] ---[ end trace 3a3936ce3e1b4505 ]--- And could potentially also fix the lockup reported by Marcelo Tosatti. Also, print more debug info in case of a detected lockup. [ Impact: fix lockup ] Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-20perf_counter: Optimize sched in/out of countersPeter Zijlstra
Avoid a function call for !group counters by directly calling the counter function. [ Impact: micro-optimize the code ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090520102553.511933670@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-20perf_counter: Optimize disable of time based sw countersPeter Zijlstra
Currently we call hrtimer_cancel() unconditionally on disable of time based software counters. Avoid when possible. [ Impact: micro-optimize the code ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090520102553.388185031@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-20perf_counter: Log irq_period changesPeter Zijlstra
For the dynamic irq_period code, log whenever we change the period so that analyzing code can normalize the event flow. [ Impact: add new feature to allow more precise profiling ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090520102553.298769743@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-20perf_counter: Solve the rotate_ctx vs inherit race differentlyPeter Zijlstra
Instead of disabling RR scheduling of the counters, use a different list that does not get rotated to iterate the counters on inheritance. [ Impact: cleanup, optimization ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: John Kacur <jkacur@redhat.com> LKML-Reference: <20090520102553.237504544@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-20perf_counter: fix counter inheritance raceIngo Molnar
Context rotation should not occur when we are in the middle of walking the counter list when inheriting counters ... [ Impact: fix occasionally incorrect perf stat results ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-20perf_counter: fix counter freeing logicIngo Molnar
Fix counter lifetime bugs which explain the crashes reported by Marcelo Tosatti and Arnaldo Carvalho de Melo. The new rule is: flushing + freeing is only done for a task's own counters, never for other tasks. [ Impact: fix crashes/lockups with inherited counters ] Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-17perf_counter: Fix inheritance cleanup codePeter Zijlstra
Clean up code that open-coded the list_{add,del}_counter() code in __perf_counter_exit_task() which consequently diverged. This could lead to software counter crashes. Also, fold the ctx->nr_counter inc/dec into those functions and clean up some of the related code. [ Impact: fix potential sw counter crash, cleanup ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: allow arch to supply event misc flags and instruction pointerPaul Mackerras
At present the values we put in overflow events for the misc flags indicating processor mode and the instruction pointer are obtained using the standard user_mode() and instruction_pointer() functions. Those functions tell you where the performance monitor interrupt was taken, which might not be exactly where the counter overflow occurred, for example because interrupts were disabled at the point where the overflow occurred, or because the processor had many instructions in flight and chose to complete some more instructions beyond the one that caused the counter overflow. Some architectures (e.g. powerpc) can supply more precise information about where the counter overflow occurred and the processor mode at that point. This introduces new functions, perf_misc_flags() and perf_instruction_pointer(), which arch code can override to provide more precise information if available. They have default implementations which are identical to the existing code. This also adds a new misc flag value, PERF_EVENT_MISC_HYPERVISOR, for the case where a counter overflow occurred in the hypervisor. We encode the processor mode in the 2 bits previously used to indicate user or kernel mode; the values for user and kernel mode are unchanged and hypervisor mode is indicated by both bits being set. [ Impact: generalize perfcounter core facilities ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <18956.1272.818511.561835@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: frequency based adaptive irq_period, 32-bit fixPeter Zijlstra
fix: kernel/built-in.o: In function `perf_counter_alloc': perf_counter.c:(.text+0x7ddc7): undefined reference to `__udivdi3' [ Impact: build fix on 32-bit systems ] Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1242394667.6642.1887.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: frequency based adaptive irq_periodPeter Zijlstra
Instead of specifying the irq_period for a counter, provide a target interrupt frequency and dynamically adapt the irq_period to match this frequency. [ Impact: new perf-counter attribute/feature ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090515132018.646195868@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: per user mlock giftPeter Zijlstra
Instead of a per-process mlock gift for perf-counters, use a per-user gift so that there is less of a DoS potential. [ Impact: allow less worst-case unprivileged memory consumption ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090515132018.496182835@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: remove perf_disable/enable exportsPeter Zijlstra
Now that ACPI idle doesn't use it anymore, remove the exports. [ Impact: remove dead code/data ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090515132018.429826617@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: Rework the perf counter disable/enablePeter Zijlstra
The current disable/enable mechanism is: token = hw_perf_save_disable(); ... /* do bits */ ... hw_perf_restore(token); This works well, provided that the use nests properly. Except we don't. x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore provide a reference counter disable/enable interface, where the first disable disables the hardware, and the last enable enables the hardware again. [ Impact: refactor, simplify the PMU disable/enable logic ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: Fix perf_output_copy() WARN to account for overflowPeter Zijlstra
The simple reservation test in perf_output_copy() failed to take unsigned int overflow into account, fix this. [ Impact: fix false positive warning with more than 4GB of profiling data ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12perf_counter: call hw_perf_save_disable/restore around group_sched_inPaul Mackerras
I noticed that when enabling a group via the PERF_COUNTER_IOC_ENABLE ioctl on the group leader, the counters weren't enabled and counting immediately on return from the ioctl, but did start counting a little while later (presumably after a context switch). The reason was that __perf_counter_enable calls group_sched_in which calls hw_perf_group_sched_in, which on powerpc assumes that the caller has called hw_perf_save_disable already. Until commit 46d686c6 ("perf_counter: put whole group on when enabling group leader") it was true that all callers of group_sched_in had called hw_perf_save_disable first, and the powerpc hw_perf_group_sched_in relies on that (there isn't an x86 version). This fixes the problem by putting calls to hw_perf_save_disable / hw_perf_restore around the calls to group_sched_in and counter_sched_in in __perf_counter_enable. Having the calls to hw_perf_save_disable/restore around the counter_sched_in call is harmless and makes this call consistent with the other call sites of counter_sched_in, which have all called hw_perf_save_disable first. [ Impact: more precise counter group disable/enable functionality ] Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <18953.25733.53359.147452@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11perf_counter: call atomic64_set for counter->countPaul Mackerras
A compile warning triggered because we are calling atomic_set(&counter->count). But since counter->count is an atomic64_t, we have to use atomic64_set. So the count can be set short, resulting in the reset ioctl only resetting the low word. [ Impact: clear counter properly during the reset ioctl ] Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <18951.48285.270311.981806@drongo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11perf_counter: don't count scheduler ticks as context switchesPaul Mackerras
The context-switch software counter gives inflated values at present because each scheduler tick and each process-wide counter enable/disable prctl gets counted as a context switch. This happens because perf_counter_task_tick, perf_counter_task_disable and perf_counter_task_enable all call perf_counter_task_sched_out, which calls perf_swcounter_event to record a context switch event. This fixes it by introducing a variant of perf_counter_task_sched_out with two underscores in front for internal use within the perf_counter code, and makes perf_counter_task_{tick,disable,enable} call it. This variant doesn't record a context switch event, and takes a struct perf_counter_context *. This adds the new variant rather than changing the behaviour or interface of perf_counter_task_sched_out because that is called from other code. [ Impact: fix inflated context-switch event counts ] Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <18951.48034.485580.498953@drongo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11perf_counter: Put whole group on when enabling group leaderPaul Mackerras
Currently, if you have a group where the leader is disabled and there are siblings that are enabled, and then you enable the leader, we only put the leader on the PMU, and not its enabled siblings. This is incorrect, since the enabled group members should be all on or all off at any given point. This fixes it by adding a call to group_sched_in in __perf_counter_enable in the case where we're enabling a group leader. To avoid the need for a forward declaration this also moves group_sched_in up before __perf_counter_enable. The actual content of group_sched_in is unchanged by this patch. [ Impact: fix bug in counter enable code ] Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <18951.34946.451546.691693@drongo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08perf_counter: add PERF_RECORD_CPUPeter Zijlstra
Allow recording the CPU number the event was generated on. RFC: this leaves a u32 as reserved, should we fill in the node_id() there, or leave this open for future extention, as userspace can already easily do the cpu->node mapping if needed. [ Impact: extend perfcounter output record format ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090508170029.008627711@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08perf_counter: add PERF_RECORD_CONFIGPeter Zijlstra
Much like CONFIG_RECORD_GROUP records the hw_event.config to identify the values, allow to record this for all counters. [ Impact: extend perfcounter output record format ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090508170028.923228280@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08perf_counter: rework ioctl()sPeter Zijlstra
Corey noticed that ioctl()s on grouped counters didn't work on the whole group. This extends the ioctl() interface to take a second argument that is interpreted as a flags field. We then provide PERF_IOC_FLAG_GROUP to toggle the behaviour. Having this flag gives the greatest flexibility, allowing you to individually enable/disable/reset counters in a group, or all together. [ Impact: fix group counter enable/disable semantics ] Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090508170028.837558214@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08perf_counter: optimize perf_counter_task_tick()Peter Zijlstra
perf_counter_task_tick() does way too much work to find out there's nothing to do. Provide an easy short-circuit for the normal case where there are no counters on the system. [ Impact: micro-optimization ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090508170028.750619201@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05perf_counter: inheritable sample countersPeter Zijlstra
Redirect the output to the parent counter and put in some sanity checks. [ Impact: new perfcounter feature - inherited sampling counters ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090505155437.331556171@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05perf_counter: fix the output lockPeter Zijlstra
Use -1 instead of 0 as unlocked, since 0 is a valid cpu number. ( This is not an issue right now but will be once we allow multiple counters to output to the same mmap area. ) [ Impact: prepare code for multi-counter profile output ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090505155437.232686598@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05perf_counter: provide an mlock thresholdPeter Zijlstra
Provide a threshold to relax the mlock accounting, increasing usability. Each counter gets perf_counter_mlock_kb for free. [ Impact: allow more mmap buffering ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090505155437.112113632@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05perf_counter: add ioctl(PERF_COUNTER_IOC_RESET)Peter Zijlstra
Provide a way to reset an existing counter - this eases PAPI libraries around perfcounters. Similar to read() it doesn't collapse pending child counters. [ Impact: new perfcounter fd ioctl method to reset counters ] Suggested-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090505155437.022272933@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05perf_counter: uncouple data_head updates from wakeupsPeter Zijlstra
Keep data_head up-to-date irrespective of notifications. This fixes the case where you disable a counter and don't get a notification for the last few pending events, and it also allows polling usage. [ Impact: increase precision of perfcounter mmap-ed fields ] Suggested-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090505155436.925084300@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-04perf_counter: convert perf_resource_mutex to a spinlockIngo Molnar
Now percpu counters can be initialized very early. But the init sequence uses mutex_lock(). Fortunately, perf_resource_mutex should be a spinlock anyway, so convert it. [ Impact: fix crash due to early init mutex use ] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-04perf_counter: initialize the per-cpu context earlierIngo Molnar
percpu scheduling for perfcounters wants to take the context lock, but that lock first needs to be initialized. Currently it is an early_initcall() - but that is too late, the task tick runs much sooner than that. Call it explicitly from the scheduler init sequence instead. [ Impact: fix access-before-init crash ] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-04perf_counter: round-robin per-CPU counters tooIngo Molnar
This used to be unstable when we had the rq->lock dependencies, but now that they are that of the past we can turn on percpu counter RR too. [ Impact: handle counter over-commit for per-CPU counters too ] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-01perf_counter: fix race in perf_output_*Peter Zijlstra
When two (or more) contexts output to the same buffer, it is possible to observe half written output. Suppose we have CPU0 doing perf_counter_mmap(), CPU1 doing perf_counter_overflow(). If CPU1 does a wakeup and exposes head to user-space, then CPU2 can observe the data CPU0 is still writing. [ Impact: fix occasionally corrupted profiling records ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090501102533.007821627@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-30perf_counter: update copyright noticePaul Mackerras
This adds my name to the list of copyright holders on the core perf_counter.c, since I have contributed a significant amount of the code in there. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <18936.59200.888049.746658@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29perf_counter: add/update copyrightsIngo Molnar
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29perfcounters: rename struct hw_perf_counter_ops into struct pmuRobert Richter
This patch renames struct hw_perf_counter_ops into struct pmu. It introduces a structure to describe a cpu specific pmu (performance monitoring unit). It may contain ops and data. The new name of the structure fits better, is shorter, and thus better to handle. Where it was appropriate, names of function and variable have been changed too. [ Impact: cleanup ] Signed-off-by: Robert Richter <robert.richter@amd.com> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1241002046-8832-7-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-16perfcounters: export perf_tpcounter_eventSteven Whitehouse
Needed for modular tracepoint support. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09perf_counter: log full path namesPeter Zijlstra
Impact: fix perf-report output for /home mounted binaries, etc. dentry_path() only provide path-names up to the mount root, which is unsuited for out purpose, use d_path() instead. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090409085524.601794134@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>