aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2009-05-18powerpc/ftrace: Fix constraint to be early clobberSteven Rostedt
After upgrading my distcc boxes from gcc 4.2.2 to 4.4.0, the function graph tracer broke. This was discovered on my x86 boxes. The issue is that gcc used the same register for an output as it did for an input in an asm statement. I first thought this was a bug in gcc and reported it. I was notified that gcc was correct and that the output had to be flagged as an "early clobber". I noticed that powerpc had the same issue and this patch fixes it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18powerpc/ftrace: Use pr_devel() in ftrace.cMichael Ellerman
pr_debug() can now result in code being generated even when #DEBUG is not defined. That's not really desirable in the ftrace code which we want to be snappy. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3334 672 4 4010 faa arch/powerpc/kernel/ftrace.o size after: text data bss dec hex filename 2616 360 4 2980 ba4 arch/powerpc/kernel/ftrace.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15perf_counter: powerpc: supply more precise information on counter overflow ↵Paul Mackerras
events This uses values from the MMCRA, SIAR and SDAR registers on powerpc to supply more precise information for overflow events, including a data address when PERF_RECORD_ADDR is specified. Since POWER6 uses different bit positions in MMCRA from earlier processors, this converts the struct power_pmu limited_pmc5_6 field, which only had 0/1 values, into a flags field and defines bit values for its previous use (PPMU_LIMITED_PMC5_6) and a new flag (PPMU_ALT_SIPR) to indicate that the processor uses the POWER6 bit positions rather than the earlier positions. It also adds definitions in reg.h for the new and old positions of the bit that indicates that the SIAR and SDAR values come from the same instruction. For the data address, the SDAR value is supplied if we are not doing instruction sampling. In that case there is no guarantee that the address given in the PERF_RECORD_ADDR subrecord will correspond to the instruction whose address is given in the PERF_RECORD_IP subrecord. If instruction sampling is enabled (e.g. because this counter is counting a marked instruction event), then we only supply the SDAR value for the PERF_RECORD_ADDR subrecord if it corresponds to the instruction whose address is in the PERF_RECORD_IP subrecord. Otherwise we supply 0. [ Impact: support more PMU hardware features on PowerPC ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <18955.37028.48861.555309@drongo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: powerpc: use u64 for event codes internallyPaul Mackerras
Although the perf_counter API allows 63-bit raw event codes, internally in the powerpc back-end we had been using 32-bit event codes. This expands them to 64 bits so that we can add bits for specifying threshold start/stop events and instruction sampling modes later. This also corrects the return value of can_go_on_limited_pmc; we were returning an event code rather than just a 0/1 value in some circumstances. That didn't particularly matter while event codes were 32-bit, but now that event codes are 64-bit it might, so this fixes it. [ Impact: extend PowerPC perfcounter interfaces from u32 to u64 ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <18955.36874.472452.353104@drongo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: frequency based adaptive irq_periodPeter Zijlstra
Instead of specifying the irq_period for a counter, provide a target interrupt frequency and dynamically adapt the irq_period to match this frequency. [ Impact: new perf-counter attribute/feature ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090515132018.646195868@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: Rework the perf counter disable/enablePeter Zijlstra
The current disable/enable mechanism is: token = hw_perf_save_disable(); ... /* do bits */ ... hw_perf_restore(token); This works well, provided that the use nests properly. Except we don't. x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore provide a reference counter disable/enable interface, where the first disable disables the hardware, and the last enable enables the hardware again. [ Impact: refactor, simplify the PMU disable/enable logic ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15powerpc: Fix PCI ROM accessBenjamin Herrenschmidt
A couple of issues crept in since about 2.6.27 related to accessing PCI device ROMs on various powerpc machines. First, historically, we don't allocate the ROM resource in the resource tree. I'm not entirely certain of why, I susepct they often contained garbage on x86 but it's hard to tell. This causes the current generic code to always call pci_assign_resource() when trying to access the said ROM from sysfs, which will try to re-assign some new address regardless of what the ROM BAR was already set to at boot time. This can be a problem on hypervisor platforms like pSeries where we aren't supposed to move PCI devices around (and in fact probably can't). Second, our code that generates the PCI tree from the OF device-tree (instead of doing config space probing) which we mostly use on pseries at the moment, didn't set the (new) flag IORESOURCE_SIZEALIGN on any resource. That means that any attempt at re-assigning such a resource with pci_assign_resource() would fail due to resource_alignment() returning 0. This fixes this by doing these two things: - The code that calculates resource flags based on the OF device-node is improved to set IORESOURCE_SIZEALIGN on any valid BAR, and while at it also set IORESOURCE_READONLY for ROMs since we were lacking that too - We now allocate ROM resources as part of the resource tree. However to limit the chances of nasty conflicts due to busted firmwares, we only do it on the second pass of our two-passes allocation scheme, so that all valid and enabled BARs get precedence. This brings pSeries back the ability to access PCI ROMs via sysfs (and thus initialize various video cards from X etc...). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15powerpc/pseries: Really fix the oprofile CPU type on pseriesBenjamin Herrenschmidt
My previous pach for fixing the oprofile CPU type got somewhat mismerged (by my fault) when it collided with another related patch. This should finally (fingers crossed) fix the whole thing. We make sure we keep the -old- oprofile type and CPU type whenever one of them was specified in the first pass through the function. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15powerpc: Allow mem=x cmdline to work with 4G+Becky Bruce
We're currently choking on mem=4g (and above) due to memory_limit being specified as an unsigned long. Make memory_limit phys_addr_t to fix this. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-01powerpc: Fix setting of oprofile cpu typeBenjamin Herrenschmidt
commit 2657dd4e301d4841ed67a4fac7d145ad8f3e1b28 introduced a bug where we would now always override the "real" oprofile CPU type with the "compatible" one provided by a pseudo-PVR in the device-tree which is incorrect and breaks oprofile on all current configs since the "compatible" ones aren't yet recognized. This fixes it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-01powerpc adjust oprofile_cpu_type version 3Michael Wolf
Oprofile is changing the naming it is using for the compatibility modes. Instead of having compat-power<x>, oprofile will go to family naming convention and use ibm-compat-v<x>. Currently only ibm-compat-v1 will be defined. The notion of compatibility events just started with POWER6. So there is no way that any other tool could exist that is using these oprofile_cpu_type strings we want to change. Signed-off-by: Mike Wolf <mjw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-04-29perf_counter: powerpc: allow use of limited-function countersPaul Mackerras
POWER5+ and POWER6 have two hardware counters with limited functionality: PMC5 counts instructions completed in run state and PMC6 counts cycles in run state. (Run state is the state when a hardware RUN bit is 1; the idle task clears RUN while waiting for work to do and sets it when there is work to do.) These counters can't be written to by the kernel, can't generate interrupts, and don't obey the freeze conditions. That means we can only use them for per-task counters (where we know we'll always be in run state; we can't put a per-task counter on an idle task), and only if we don't want interrupts and we do want to count in all processor modes. Obviously some counters can't go on a limited hardware counter, but there are also situations where we can only put a counter on a limited hardware counter - if there are already counters on that exclude some processor modes and we want to put on a per-task cycle or instruction counter that doesn't exclude any processor mode, it could go on if it can use a limited hardware counter. To keep track of these constraints, this adds a flags argument to the processor-specific get_alternatives() functions, with three bits defined: one to say that we can accept alternative event codes that go on limited counters, one to say we only want alternatives on limited counters, and one to say that this is a per-task counter and therefore events that are gated by run state are equivalent to those that aren't (e.g. a "cycles" event is equivalent to a "cycles in run state" event). These flags are computed for each counter and stored in the counter->hw.counter_base field (slightly wonky name for what it does, but it was an existing unused field). Since the limited counters don't freeze when we freeze the other counters, we need some special handling to avoid getting skew between things counted on the limited counters and those counted on normal counters. To minimize this skew, if we are using any limited counters, we read PMC5 and PMC6 immediately after setting and clearing the freeze bit. This is done in a single asm in the new write_mmcr0() function. The code here is specific to PMC5 and PMC6 being the limited hardware counters. Being more general (e.g. having a bitmap of limited hardware counter numbers) would have meant more complex code to read the limited counters when freezing and unfreezing the normal counters, with conditional branches, which would have increased the skew. Since it isn't necessary for the code to be more general at this stage, it isn't. This also extends the back-ends for POWER5+ and POWER6 to be able to handle up to 6 counters rather than the 4 they previously handled. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <18936.19035.163066.892208@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29perfcounters: rename struct hw_perf_counter_ops into struct pmuRobert Richter
This patch renames struct hw_perf_counter_ops into struct pmu. It introduces a structure to describe a cpu specific pmu (performance monitoring unit). It may contain ops and data. The new name of the structure fits better, is shorter, and thus better to handle. Where it was appropriate, names of function and variable have been changed too. [ Impact: cleanup ] Signed-off-by: Robert Richter <robert.richter@amd.com> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1241002046-8832-7-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29Merge branch 'linus' into perfcounters/coreIngo Molnar
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-28Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: powerpc/ps3: Fix build error on UP powerpc/cell: Select PCI for IBM_CELL_BLADE AND CELLEB powerpc: ppc32 needs elf_read_implies_exec() powerpc/86xx: Add device_type entry to soc for ppc9a powerpc/44x: Correct memory size calculation for denali-based boards maintainers: Fix PowerPC 4xx git tree powerpc: fix for long standing bug noticed by gcc 4.4.0 Revert "powerpc: Add support for early tlbilx opcode"
2009-04-28powerpc: Revert switch to TEXT_TEXT in linker scriptTim Abbott
Commit edada399 broke the build on 64-bit powerpc because it moved the __ftr_alt_* sections of a file away from the .text section, causing link failures due to relative conditional branch targets being too far away from the branch instructions. This happens on pretty much all 64-bit powerpc configs. This change reverts commit edada399 while preserving the update from the *.refok sections to .ref.text that has happened since. Signed-off-by: Tim Abbott <tabbott@mit.edu> Requested-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-27powerpc: Use TEXT_TEXT macro in linker script.Tim Abbott
Rather than adding .ref.text to the powerpc linker script so that we can use __REF on the powerpc architecture, it seems simpler to switch to using the generic TEXT_TEXT macro. Signed-off-by: Tim Abbott <tabbott@mit.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-26powerpc: convert to use __HEAD and HEAD_TEXT macros.Tim Abbott
This has the consequence of changing the section name use for head code from ".text.head" to ".head.text". Since this commit changes all users in the architecture, this change should be harmless. Signed-off-by: Tim Abbott <tabbott@mit.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-24Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: powerpc: Fix modular build of ide-pmac when mediabay is built in powerpc/pasemi: Fix build error on UP powerpc: Make macintosh/mediabay driver depend on CONFIG_BLOCK maintainers: Fix PS3 patterns powerpc/ps3: Fix CONFIG_PS3_FLASH=n build warning powerpc/32: Don't clobber personality flags on exec powerpc: Fix crash on CPU hotplug powerpc/85xx: Remove defconfigs that mpc85xx_{smp_}defconfig cover powerpc/85xx: Added SMP defconfig powerpc/85xx: Enabled a bunch of FSL specific drivers/options powerpc/85xx: Updated generic mpc85xx_defconfig powerpc: don't disable SATA interrupts on Freescale MPC8610 HPCD fsl_rio: Pass the proper device to dma mapping routines powerpc: Fix of_node_put() exit path in of_irq_map_one() powerpc/5200: defconfig updates powerpc/5200: Add FLASH nodes to lite5200 device tree powerpc/device-tree: Document MTD nodes with multiple "reg" tuples powerpc/of-device-tree: Factor MTD physmap bindings out of booting-without-of powerpc/5200: Bring the legacy fsl_spi_platform_data hooks back
2009-04-23Revert "powerpc: Add support for early tlbilx opcode"Kumar Gala
This reverts commit e9965577406a2148ade97b5e0ce7c448b4ba4ef6. Our HW guys were able to fix this so it never sees the light of day. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-22Merge branch 'merge' of git://git.secretlab.ca/git/linux-2.6 into mergePaul Mackerras
2009-04-21clocksource: pass clocksource to read() callbackMagnus Damm
Pass clocksource pointer to the read() callback for clocksources. This allows us to share the callback between multiple instances. [hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods] [akpm@linux-foundation.org: cleanup] Signed-off-by: Magnus Damm <damm@igel.co.jp> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-20powerpc: Fix of_node_put() exit path in of_irq_map_one()Ilpo Järvinen
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2009-04-15Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: powerpc: pseries/dtl.c should include asm/firmware.h powerpc: Fix data-corrupting bug in __futex_atomic_op powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset() powerpc: Allow 256kB pages with SHMEM powerpc: Document new FSL I2C bindings and cleanup powerpc/mm: Fix compile warning powerpc/85xx: TQM8548: update defconfig powerpc/85xx: TQM8548: use proper phy-handles for enet2 and enet3 powerpc/85xx: TQM85xx: correct address of LM75 I2C device nodes powerpc: Add support for early tlbilx opcode powerpc: Fix tlbilx opcode
2009-04-09perf_counter: powerpc: add nmi_enter/nmi_exit callsPaul Mackerras
Impact: fix potential deadlocks on powerpc Now that the core is using in_nmi() (added in e30e08f6, "perf_counter: fix NMI race in task clock"), we need the powerpc perf_counter_interrupt to call nmi_enter() and nmi_exit() in those cases where the interrupt happens when interrupts are soft-disabled. If interrupts were soft-enabled, we can treat it as a regular interrupt and do irq_enter/irq_exit around the whole routine. This lets us get rid of the test_perf_counter_pending() call at the end of perf_counter_interrupt, thus simplifying things a little. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <18909.31952.873098.336615@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08perf_counter: allow for data addresses to be recordedPeter Zijlstra
Paul suggested we allow for data addresses to be recorded along with the traditional IPs as power can provide these. For now, only the software pagefault events provide data addresses, but in the future power might as well for some events. x86 doesn't seem capable of providing this atm. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130409.394816925@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08perf_counter: powerpc: set sample enable bit for marked instruction eventsPaul Mackerras
Impact: enable access to hardware feature POWER processors have the ability to "mark" a subset of the instructions and provide more detailed information on what happens to the marked instructions as they flow through the pipeline. This marking is enabled by the "sample enable" bit in MMCRA, and there are synchronization requirements around setting and clearing the bit. This adds logic to the processor-specific back-ends so that they know which events relate to marked instructions and set the sampling enable bit if any event that we want to put on the PMU is a marked instruction event. It also adds logic to the generic powerpc code to do the necessary synchronization if that bit is set. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <18908.31930.1024.228867@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08perf_counter: fix powerpc buildPaul Mackerras
Commit 4af4998b ("perf_counter: rework context time") changed struct perf_counter_context to have a 'time' field instead of a 'time_now' field, but neglected to fix the place in the powerpc perf_counter.c where the time_now field was accessed. This fixes it. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <18908.31922.411398.147810@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08Merge commit 'v2.6.30-rc1' into perfcounters/coreIngo Molnar
Conflicts: arch/powerpc/include/asm/systbl.h arch/powerpc/include/asm/unistd.h include/linux/init_task.h Merge reason: the conflicts are non-trivial: PowerPC placement of sys_perf_counter_open has to be mixed with the new preadv/pwrite syscalls. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07dma-mapping: replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)Yang Hongyang
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32) Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07perf_counter: theres more to overflow than writing eventsPeter Zijlstra
Prepare for more generic overflow handling. The new perf_counter_overflow() method will handle the generic bits of the counter overflow, and can return a !0 return value, in which case the counter should be (soft) disabled, so that it won't count until it's properly disabled. XXX: do powerpc and swcounter Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090406094517.812109629@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-07powerpc: Add support for early tlbilx opcodeKumar Gala
During the ISA 2.06 development the opcode for tlbilx changed and some early implementations used to old opcode. Add support for a MMU_FTR fixup to deal with this. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-04-07powerpc/ftrace: Fix printf format warningMichael Ellerman
'tramp' is an unsigned long, so print it with %lx. Fixes the following build warning: arch/powerpc/kernel/ftrace.c:291: error: format ‘%x’ expects type ‘unsigned int’, but argument 2 has type ‘long unsigned int’ Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07powerpc/ftrace: Fix #if that should be #ifdefMichael Ellerman
Commit bb7253403f7a4670a128e4c080fd8ea1bd4d5029 ("powerpc64, ftrace: save toc only on modules for function graph"), added an #if CONFIG_PPC64. This changes it to #ifdef. Fixes the following warning on 32-bit builds: arch/powerpc/kernel/ftrace.c:562:5: error: "CONFIG_PPC64" is not defined Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07powerpc: Fix ptrace compat wrapper for FPU register accessMichael Neuling
The ptrace compat wrapper mishandles access to the fpu registers. The PTRACE_PEEKUSR and PTRACE_POKEUSR requests miscalculate the index into the fpr array due to the broken FPINDEX macro. The PPC_PTRACE_PEEKUSR_3264 request needs to use the same formula that the native ptrace interface uses when operating on the register number (as opposed to the 4-byte offset). The PPC_PTRACE_POKEUSR_3264 request didn't take TS_FPRWIDTH into account. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07powerpc: Print information about mapping hw irqs to virtual irqsMichael Ellerman
The irq remapping layer seems to cause some confusion when people see a different irq number in /proc/interrupts vs the one they request in their driver or DTS. So have the irq remapping layer print out a message when we map an irq. The message is only printed the first time the irq is mapped, and it's KERN_DEBUG so most people won't see it. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07powerpc: Disable VSX or current process in giveup_fpu/altivecMichael Neuling
When we call giveup_fpu, we need to need to turn off VSX for the current process. If we don't, on return to userspace it may execute a VSX instruction before the next FP instruction, and not have its register state refreshed correctly from the thread_struct. Ditto for altivec. This caused a bug where an unaligned lfs or stfs results in fix_alignment calling giveup_fpu so it can use the FPRs (in order to do a single <-> double conversion), and then returning to userspace with FP off but VSX on. Then if a VSX instruction is executed, before another FP instruction, it will proceed without another exception and hence have the incorrect register state for VSX registers 0-31. lfs unaligned <- alignment exception turns FP off but leaves VSX on VSX instruction <- no exception since VSX on, hence we get the wrong VSX register values for VSX registers 0-31, which overlap the FPRs. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07powerpc/pseries: Fix ibm,client-architecture commentAnton Blanchard
We specify a 64MB RMO, but the comment says 128MB. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07powerpc/pseries: Add dispatch dispersion statisticsAnton Blanchard
PHYP tells us how often a shared processor dispatch changed physical cpus. This can highlight performance problems caused by the hypervisor. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07powerpc: Clean up some prom printoutsAnton Blanchard
Make all messages consistent, some have spaces before the "...", some do not. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07powerpc: Print progress of ibm,client-architecture methodAnton Blanchard
The ibm,client-architecture method will often cause a reconfiguration reboot. When this happens the last thing we see is: Hypertas detected, assuming LPAR ! Which doesn't explain what just happened. Wrap the ibm,client-architecture so it's clear what is going on: Calling ibm,client-architecture... done In order to maintain the law of conservation of screen real estate, downgrade two other messages to debug. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-07powerpc: Remove duplicated #include'sHuang Weiyi
Remove duplicated #include's in - arch/powerpc/include/asm/ps3fb.h - arch/powerpc/kernel/setup-common.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2009-04-06perf_counter: make it possible for hw_perf_counter_init to return error codesPaul Mackerras
Impact: better error reporting At present, if hw_perf_counter_init encounters an error, all it can do is return NULL, which causes sys_perf_counter_open to return an EINVAL error to userspace. This isn't very informative for userspace; it means that userspace can't tell the difference between "sorry, oprofile is already using the PMU" and "we don't support this CPU" and "this CPU doesn't support the requested generic hardware event". This commit uses the PTR_ERR/ERR_PTR/IS_ERR set of macros to let hw_perf_counter_init return an error code on error rather than just NULL if it wishes. If it does so, that error code will be returned from sys_perf_counter_open to userspace. If it returns NULL, an EINVAL error will be returned to userspace, as before. This also adapts the powerpc hw_perf_counter_init to make use of this to return ENXIO, EINVAL, EBUSY, or EOPNOTSUPP as appropriate. It would be good to add extra error numbers in future to allow userspace to distinguish the various errors that are currently reported as EINVAL, i.e. irq_period < 0, too many events in a group, conflict between exclude_* settings in a group, and PMU resource conflict in a group. [ v2: fix a bug pointed out by Corey Ashford where error returns from hw_perf_counter_init were not handled correctly in the case of raw hardware events.] Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Orig-LKML-Reference: <20090330171023.682428180@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06perf_counter: powerpc: only reserve PMU hardware when we need itPaul Mackerras
Impact: cooperate with oprofile At present, on PowerPC, if you have perf_counters compiled in, oprofile doesn't work. There is code to allow the PMU to be shared between competing subsystems, such as perf_counters and oprofile, but currently the perf_counter subsystem reserves the PMU for itself at boot time, and never releases it. This makes perf_counter play nicely with oprofile. Now we keep a count of how many perf_counter instances are counting hardware events, and reserve the PMU when that count becomes non-zero, and release the PMU when that count becomes zero. This means that it is possible to have perf_counters compiled in and still use oprofile, as long as there are no hardware perf_counters active. This also means that if oprofile is active, sys_perf_counter_open will fail if the hw_event specifies a hardware event. To avoid races with other tasks creating and destroying perf_counters, we use a mutex. We use atomic_inc_not_zero and atomic_add_unless to avoid having to take the mutex unless there is a possibility of the count going between 0 and 1. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Orig-LKML-Reference: <20090330171023.627912475@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06perf_counter: unify and fix delayed counter wakeupPeter Zijlstra
While going over the wakeup code I noticed delayed wakeups only work for hardware counters but basically all software counters rely on them. This patch unifies and generalizes the delayed wakeup to fix this issue. Since we're dealing with NMI context bits here, use a cmpxchg() based single link list implementation to track counters that have pending wakeups. [ This should really be generic code for delayed wakeups, but since we cannot use cmpxchg()/xchg() in generic code, I've let it live in the perf_counter code. -- Eric Dumazet could use it to aggregate the network wakeups. ] Furthermore, the x86 method of using TIF flags was flawed in that its quite possible to end up setting the bit on the idle task, loosing the wakeup. The powerpc method uses per-cpu storage and does appear to be sufficient. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Orig-LKML-Reference: <20090330171023.153932974@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06perf_counter: record time running and time enabled for each counterPaul Mackerras
Impact: new functionality Currently, if there are more counters enabled than can fit on the CPU, the kernel will multiplex the counters on to the hardware using round-robin scheduling. That isn't too bad for sampling counters, but for counting counters it means that the value read from a counter represents some unknown fraction of the true count of events that occurred while the counter was enabled. This remedies the situation by keeping track of how long each counter is enabled for, and how long it is actually on the cpu and counting events. These times are recorded in nanoseconds using the task clock for per-task counters and the cpu clock for per-cpu counters. These values can be supplied to userspace on a read from the counter. Userspace requests that they be supplied after the counter value by setting the PERF_FORMAT_TOTAL_TIME_ENABLED and/or PERF_FORMAT_TOTAL_TIME_RUNNING bits in the hw_event.read_format field when creating the counter. (There is no way to change the read format after the counter is created, though it would be possible to add some way to do that.) Using this information it is possible for userspace to scale the count it reads from the counter to get an estimate of the true count: true_count_estimate = count * total_time_enabled / total_time_running This also lets userspace detect the situation where the counter never got to go on the cpu: total_time_running == 0. This functionality has been requested by the PAPI developers, and will be generally needed for interpreting the count values from counting counters correctly. In the implementation, this keeps 5 time values (in nanoseconds) for each counter: total_time_enabled and total_time_running are used when the counter is in state OFF or ERROR and for reporting back to userspace. When the counter is in state INACTIVE or ACTIVE, it is the tstamp_enabled, tstamp_running and tstamp_stopped values that are relevant, and total_time_enabled and total_time_running are determined from them. (tstamp_stopped is only used in INACTIVE state.) The reason for doing it like this is that it means that only counters being enabled or disabled at sched-in and sched-out time need to be updated. There are no new loops that iterate over all counters to update total_time_enabled or total_time_running. This also keeps separate child_total_time_running and child_total_time_enabled fields that get added in when reporting the totals to userspace. They are separate fields so that they can be atomic. We don't want to use atomics for total_time_running, total_time_enabled etc., because then we would have to use atomic sequences to update them, which are slower than regular arithmetic and memory accesses. It is possible to measure total_time_running by adding a task_clock counter to each group of counters, and total_time_enabled can be measured approximately with a top-level task_clock counter (though inaccuracies will creep in if you need to disable and enable groups since it is not possible in general to disable/enable the top-level task_clock counter simultaneously with another group). However, that adds extra overhead - I measured around 15% increase in the context switch latency reported by lat_ctx (from lmbench) when a task_clock counter was added to each of 2 groups, and around 25% increase when a task_clock counter was added to each of 4 groups. (In both cases a top-level task-clock counter was also added.) In contrast, the code added in this commit gives better information with no overhead that I could measure (in fact in some cases I measured lower times with this code, but the differences were all less than one standard deviation). [ v2: address review comments by Andrew Morton. ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Orig-LKML-Reference: <18890.6578.728637.139402@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06perf_counter: new output ABI - part 1Peter Zijlstra
Impact: Rework the perfcounter output ABI use sys_read() only for instant data and provide mmap() output for all async overflow data. The first mmap() determines the size of the output buffer. The mmap() size must be a PAGE_SIZE multiple of 1+pages, where pages must be a power of 2 or 0. Further mmap()s of the same fd must have the same size. Once all maps are gone, you can again mmap() with a new size. In case of 0 extra pages there is no data output and the first page only contains meta data. When there are data pages, a poll() event will be generated for each full page of data. Furthermore, the output is circular. This means that although 1 page is a valid configuration, its useless, since we'll start overwriting it the instant we report a full page. Future work will focus on the output format (currently maintained) where we'll likey want each entry denoted by a header which includes a type and length. Further future work will allow to splice() the fd, also containing the async overflow data -- splice() would be mutually exclusive with mmap() of the data. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Orig-LKML-Reference: <20090323172417.470536358@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06perf_counter: add an mmap method to allow userspace to read hardware countersPaul Mackerras
Impact: new feature giving performance improvement This adds the ability for userspace to do an mmap on a hardware counter fd and get access to a read-only page that contains the information needed to translate a hardware counter value to the full 64-bit counter value that would be returned by a read on the fd. This is useful on architectures that allow user programs to read the hardware counters, such as PowerPC. The mmap will only succeed if the counter is a hardware counter monitoring the current process. On my quad 2.5GHz PowerPC 970MP machine, userspace can read a counter and translate it to the full 64-bit value in about 30ns using the mmapped page, compared to about 830ns for the read syscall on the counter, so this does give a significant performance improvement. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Orig-LKML-Reference: <20090323172417.297057964@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06perf_counter: remove the event config bitfieldsPeter Zijlstra
Since the bitfields turned into a bit of a mess, remove them and rely on good old masks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Orig-LKML-Reference: <20090323172417.059499915@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-06perf_counter: fix type/event_id layout on big-endian systemsPaul Mackerras
Impact: build fix for powerpc Commit db3a944aca35ae61 ("perf_counter: revamp syscall input ABI") expanded the hw_event.type field into a union of structs containing bitfields. In particular it introduced a type field and a raw_type field, with the intention that the 1-bit raw_type field should overlay the most-significant bit of the 8-bit type field, and in fact perf_counter_alloc() now assumes that (or at least, assumes that raw_type doesn't overlay any of the bits that are 1 in the values of PERF_TYPE_{HARDWARE,SOFTWARE,TRACEPOINT}). Unfortunately this is not true on big-endian systems such as PowerPC, where bitfields are laid out from left to right, i.e. from most significant bit to least significant. This means that setting hw_event.type = PERF_TYPE_SOFTWARE will set hw_event.raw_type to 1. This fixes it by making the layout depend on whether or not __BIG_ENDIAN_BITFIELD is defined. It's a bit ugly, but that's what we get for using bitfields in a user/kernel ABI. Also, that commit didn't fix up some places in arch/powerpc/kernel/ perf_counter.c where hw_event.raw and hw_event.event_id were used. This fixes them too. Signed-off-by: Paul Mackerras <paulus@samba.org>