aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc
AgeCommit message (Collapse)Author
2009-01-08powerpc/52xx: Use DEFINE_SPINLOCKJulia Lawall
SPIN_LOCK_UNLOCKED is deprecated. The following makes the change suggested in Documentation/spinlocks.txt The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ declarer name DEFINE_SPINLOCK; identifier xxx_lock; @@ - spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED; + DEFINE_SPINLOCK(xxx_lock); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08powerpc: Rewrite sysfs processor cache info codeNathan Lynch
The current code for providing processor cache information in sysfs has the following deficiencies: - several complex functions that are hard to understand - implicit recursion (cache_desc_release -> kobject_put -> cache_desc_release) - explicit recursion (create_cache_index_info) - use of two per-cpu arrays when one would suffice - duplication of work on systems where CPUs share cache Also, when I looked at implementing support for a shared_cpu_map attribute, it was pretty much impossible to handle hotplug without checking every single online CPU's cache_desc list and fixing things up... not that this is a hot path, but it would have introduced O(n^2)-ish behavior during boot. Addressing this involved rethinking the core data structures used, which didn't lend itself to an incremental approach. This implementation maintains a "forest" (potentially more than one tree) of cache objects which reflects the system's cache topology. Cache objects are instantiated as needed as CPUs come online. A per-cpu array is used mainly for sysfs-related bookkeeping; the objects in the array just point to the appropriate points in the forest. This maintains compatibility with the existing code and includes some enhancements: - Implement the shared_cpu_map attribute, which is essential for enabling userspace to discover the system's overall cache topology. - Use cache-block-size properties if cache-line-size is not available. I chose to place this implementation in a new file since it would have roughly doubled the size of sysfs.c, which is already kind of messy. Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08powerpc/iseries: Kexec is known not to work on iseriesMichael Ellerman
The non-zero return from the prepare callback is returned by sys_kexec_load() to userspace, indicating that kexec is not supported on the machine. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08powerpc: Copy bootable images in the default install scriptGrant Likely
This patch makes the default install script (arch/powerpc/boot/install.sh) copy the bootable image files into the install directory. Before this patch only the vmlinux image file was copied. This patch makes the default 'make install' command useful for embedded development when $(INSTALL_PATH) is set in the environment. As a side effect, this patch changes the calling convention of the install.sh script. Instead of a single 5th parameter, the script is now passed a list of all the target images stored in the $(image-y) Makefile variable. This should be backwards compatible with existing install scripts since it just adds additional arguments and does not change existing ones. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08powerpc/mm: Cleanup careful_allocation(): consolidate memset()Dave Hansen
Both users of careful_allocation() immediately memset() the result. So, just do it in one place. Also give careful_allocation() a 'z' prefix to bring it in line with kzmalloc() and friends. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08powerpc/mm: Make careful_allocation() return virtual addrsDave Hansen
Since we memset() the result in both of the uses here, just make careful_alloc() return a virtual address. Also, add a separate variable to store the physial address that comes back from the lmb_alloc() functions. This makes it less likely that someone will screw it up forgetting to convert before returning since the vaddr is always in a void* and the paddr is always in an unsigned long. I admit this is arbitrary since one of its users needs a paddr and one a vaddr, but it does remove a good number of casts. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08powerpc/mm:: Cleanup careful_allocation(): bootmem already panicsDave Hansen
If we fail a bootmem allocation, the bootmem code itself panics. No need to redo it here. Also change the wording of the other panic. We don't strictly have to allocate memory on the specified node. It is just a hint and that node may not even *have* any memory on it. In that case we can and do fall back to other nodes. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08powerpc/mm: Add better comment on careful_allocation()Dave Hansen
The behavior in careful_allocation() really confused me at first. Add a comment to hopefully make it easier on the next doofus that looks at it. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08powerpc/powermac: Add missing of_node_putNicolas Palix
This patch fixes some unbalanced OF node references in the powermac code Signed-off-by: Nicolas Palix <npalix@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08powerpc/pci: Reserve legacy regions on PCIBenjamin Herrenschmidt
There's a problem on some embedded platforms when we re-assign everything on PCI, such as 44x. The generic code tries to avoid assigning devices to addresses overlapping the low legacy addresses such as VGA hard decoded areas using constants that are unfortunately no good for us, as they don't take into account the address translation we do to access PCI busses. Thus we end up allocating things like IO BARs to 0, which is technically legal, but will shadow hard decoded ports for use by things like VGA cards. This works around it by attempting to reserve legacy regions before we try to assign addresses. NOTE: This may have nasty side effects in cases I haven't tested yet: - We try to use FW mappings (ie. powermac) and the FW has allocated a conflicting address over those legacy regions. This will typically happen. I would expect the new code to just fail with an informative message without harm but I haven't had a chance to test that scenario yet. - A device with fixed BARs overlapping those legacy addresses such as an IDE controller in legacy mode is in the system. I don't know for sure yet what will happen there, I have to test :-) Ideally, we should change PCIBIOS_MIN_IO/MIN_MEM accross the board to take a bus pointer so they can provide appropriate per-bus translated values to the generic code but that's a more invasive patch. I will do that in the future, but in the meantime, this fixes the problem locally Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08Merge commit 'origin/master' into nextBenjamin Herrenschmidt
2009-01-07Merge branch 'linux-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (98 commits) PCI PM: Put PM callbacks in the order of execution PCI PM: Run default PM callbacks for all devices using new framework PCI PM: Register power state of devices during initialization PCI PM: Call pci_fixup_device from legacy routines PCI PM: Rearrange code in pci-driver.c PCI PM: Avoid touching devices behind bridges in unknown state PCI PM: Move pci_has_legacy_pm_support PCI PM: Power-manage devices without drivers during suspend-resume PCI PM: Add suspend counterpart of pci_reenable_device PCI PM: Fix poweroff and restore callbacks PCI: Use msleep instead of cpu_relax during ASPM link retraining PCI: PCIe portdrv: Add kerneldoc comments to remining core funtions PCI: PCIe portdrv: Rearrange code so that related things are together PCI: PCIe portdrv: Fix suspend and resume of PCI Express port services PCI: PCIe portdrv: Add kerneldoc comments to some core functions x86/PCI: Do not use interrupt links for devices using MSI-X net: sfc: Use pci_clear_master() to disable bus mastering PCI: Add pci_clear_master() as opposite of pci_set_master() PCI hotplug: remove redundant test in cpq hotplug PCI: pciehp: cleanup register and field definitions ...
2009-01-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (123 commits) wimax/i2400m: add CREDITS and MAINTAINERS entries wimax: export linux/wimax.h and linux/wimax/i2400m.h with headers_install i2400m: Makefile and Kconfig i2400m/SDIO: TX and RX path backends i2400m/SDIO: firmware upload backend i2400m/SDIO: probe/disconnect, dev init/shutdown and reset backends i2400m/SDIO: header for the SDIO subdriver i2400m/USB: TX and RX path backends i2400m/USB: firmware upload backend i2400m/USB: probe/disconnect, dev init/shutdown and reset backends i2400m/USB: header for the USB bus driver i2400m: debugfs controls i2400m: various functions for device management i2400m: RX and TX data/control paths i2400m: firmware loading and bootrom initialization i2400m: linkage to the networking stack i2400m: Generic probe/disconnect, reset and message passing i2400m: host/device procotol and core driver definitions i2400m: documentation and instructions for usage wimax: Makefile, Kconfig and docbook linkage for the stack ...
2009-01-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits) trivial: chack -> check typo fix in main Makefile trivial: Add a space (and a comma) to a printk in 8250 driver trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx trivial: Fix misspelling of "firmware" in powerpc Makefile trivial: Fix misspelling of "firmware" in usb.c trivial: Fix misspelling of "firmware" in qla1280.c trivial: Fix misspelling of "firmware" in a100u2w.c trivial: Fix misspelling of "firmware" in megaraid.c trivial: Fix misspelling of "firmware" in ql4_mbx.c trivial: Fix misspelling of "firmware" in acpi_memhotplug.c trivial: Fix misspelling of "firmware" in ipw2100.c trivial: Fix misspelling of "firmware" in atmel.c trivial: Fix misspelled firmware in Kconfig trivial: fix an -> a typos in documentation and comments trivial: fix then -> than typos in comments and documentation trivial: update Jesper Juhl CREDITS entry with new email trivial: fix singal -> signal typo trivial: Fix incorrect use of "loose" in event.c trivial: printk: fix indentation of new_text_line declaration trivial: rtc-stk17ta8: fix sparse warning ...
2009-01-07PCI: powerpc: use generic pci_swizzle_interrupt_pin()Bjorn Helgaas
Use the generic pci_swizzle_interrupt_pin() instead of arch-specific code. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07USB: powerpc: Workaround for the PPC440EPX USBH_23 errata [take 3]Vitaly Bordug
A published errata for ppc440epx states, that when running Linux with both EHCI and OHCI modules loaded, the EHCI module experiences a fatal error when a high-speed device is connected to the USB2.0, and functions normally if OHCI module is not loaded. There used to be recommendation to use only hi-speed or full-speed devices with specific conditions, when respective module was unloaded. Later, it was observed that ohci suspend is enough to keep things going, and it was turned into workaround, as explained below. Quote from original descriprion: The 440EPx USB 2.0 Host controller is an EHCI compliant controller. In USB 2.0 Host controllers, each EHCI controller has one or more companion controllers, which may be OHCI or UHCI. An USB 2.0 Host controller will contain one or more ports. For each port, only one of the controllers is connected at any one time. In the 440EPx, there is only one OHCI companion controller, and only one USB 2.0 Host port. All ports on an USB 2.0 controller default to the companion controller. If you load only an ohci driver, it will have control of the ports and any deviceplugged in will operate, although high speed devices will be forced to operate at full speed. When an ehci driver is loaded, it explicitly takes control of the ports. If there is a device connected, and / or every time there is a new device connected, the ehci driver determines if the device is high speed or not. If it is high speed, the driver retains control of the port. If it is not, the driver explicitly gives the companion controller control of the port. The is a software workaround that uses Initial version of the software workaround was posted to linux-usb-devel: http://www.mail-archive.com/linux-usb-devel@lists.sourceforge.net/msg54019.html and later available from amcc.com: http://www.amcc.com/Embedded/Downloads/download.html?cat=1&family=15&ins=2 The patch below is generally based on the latter, but reworked to powerpc/of_device USB drivers, and uses a few devicetree inquiries to get rid of (some) hardcoded defines. Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org> Signed-off-by: Stefan Roese <sr@denx.de> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-06powerpc: introduce asm/swab.hHarvey Harrison
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06kprobes: add kprobe_insn_mutex and cleanup arch_remove_kprobe()Masami Hiramatsu
Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove kprobe_mutex from architecture dependent code. This allows us to call arch_remove_kprobe() (and free_insn_slot) while holding kprobe_mutex. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06atomic_t: unify all arch definitionsMatthew Wilcox
The atomic_t type cannot currently be used in some header files because it would create an include loop with asm/atomic.h. Move the type definition to linux/types.h to break the loop. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: show node to memory section relationship with symlinks in sysfsGary Hade
Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: report the MMU pagesize in /proc/pid/smapsMel Gorman
The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the kernel to back a VMA. This matches the size used by the MMU in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64K as a base pagesize may still use 4K pages for the MMU on older processor. To distinguish, this patch reports MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06trivial: Fix misspelling of "firmware" in powerpc MakefileNick Andrew
Fix misspelling of "firmware" in powerpc Makefile It's spelled "firmware". Signed-off-by: Nick Andrew <nick@nick-andrew.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06trivial: fix then -> than typos in comments and documentationFrederik Schwarzer
- (better, more, bigger ...) then -> (...) than Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-05zero i_uid/i_gid on inode allocationAl Viro
... and don't bother in callers. Don't bother with zeroing i_blocks, while we are at it - it's already been zeroed. i_mode is not worth the effort; it has no common default value. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05Merge commit 'kumar/kumar-next' into nextBenjamin Herrenschmidt
2009-01-03Merge branch 'cpus4096-for-linus-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits) x86: setup_per_cpu_areas() cleanup cpumask: fix compile error when CONFIG_NR_CPUS is not defined cpumask: use alloc_cpumask_var_node where appropriate cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t x86: use cpumask_var_t in acpi/boot.c x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids sched: put back some stack hog changes that were undone in kernel/sched.c x86: enable cpus display of kernel_max and offlined cpus ia64: cpumask fix for is_affinity_mask_valid() cpumask: convert RCU implementations, fix xtensa: define __fls mn10300: define __fls m32r: define __fls h8300: define __fls frv: define __fls cris: define __fls cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS cpumask: zero extra bits in alloc_cpumask_var_node cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/ cpumask: convert mm/ ...
2009-01-03Merge branch 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6Linus Torvalds
* 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID [PATCH] improve idle cputime accounting [PATCH] improve precision of idle time detection. [PATCH] improve precision of process accounting. [PATCH] idle cputime accounting [PATCH] fix scaled & unscaled cputime accounting
2009-01-03Merge branch 'master' of ↵Mike Travis
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into merge-rr-cpumask Conflicts: arch/x86/kernel/io_apic.c kernel/rcuclassic.c kernel/sched.c kernel/time/tick-sched.c Signed-off-by: Mike Travis <travis@sgi.com> [ mingo@elte.hu: backmerged typo fix for io_apic.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-02Merge branch 'cpus4096-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually
2009-01-02Merge branch 'kvm-updates/2.6.29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits) KVM: MMU: handle large host sptes on invlpg/resync KVM: Add locking to virtual i8259 interrupt controller KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared MAINTAINERS: Maintainership changes for kvm/ia64 KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs() KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip KVM: fix handling of ACK from shared guest IRQ KVM: MMU: check for present pdptr shadow page in walk_shadow KVM: Consolidate userspace memory capability reporting into common code KVM: Advertise the bug in memory region destruction as fixed KVM: use cpumask_var_t for cpus_hardware_enabled KVM: use modern cpumask primitives, no cpumask_t on stack KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus KVM: set owner of cpu and vm file operations anon_inodes: use fops->owner for module refcount x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj KVM: MMU: prepopulate the shadow on invlpg KVM: MMU: skip global pgtables on sync due to cr3 switch KVM: MMU: collapse remote TLB flushes on root sync ...
2009-01-01cpumask: Introduce topology_core_cpumask()/topology_thread_cpumask(): powerpcRusty Russell
Impact: New API The old topology_core_siblings() and topology_thread_siblings() return a cpumask_t; these new ones return a (const) struct cpumask *. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-31take init_fs to saner placeAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31shrink struct dentryNick Piggin
struct dentry is one of the most critical structures in the kernel. So it's sad to see it going neglected. With CONFIG_PROFILING turned on (which is probably the common case at least for distros and kernel developers), sizeof(struct dcache) == 208 here (64-bit). This gives 19 objects per slab. I packed d_mounted into a hole, and took another 4 bytes off the inline name length to take the padding out from the end of the structure. This shinks it to 200 bytes. I could have gone the other way and increased the length to 40, but I'm aiming for a magic number, read on... I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant: why was this ever a good idea? The cookie system should increase its hash size or use a tree or something if lookups are a problem. Also the "fast dcookie lookups" in oprofile should be moved into the dcookie code -- how can oprofile possibly care about the dcookie_mutex? It gets dropped after get_dcookie() returns so it can't be providing any sort of protection. At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system with ~140 000 entries allocated. 192 is also a multiple of 64, so we get nice cacheline alignment on 64 and 32 byte line systems -- any given dentry will now require 3 cachelines to touch all fields wheras previously it would require 4. I know the inline name size was chosen quite carefully, however with the reduction in cacheline footprint, it should actually be just about as fast to do a name lookup for a 36 character name as it was before the patch (and faster for other sizes). The memory footprint savings for names which are <= 32 or > 36 bytes long should more than make up for the memory cost for 33-36 byte names. Performance is a feature... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31KVM: Consolidate userspace memory capability reporting into common codeAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: mostly cosmetic updates to the exit timing accounting codeHollis Blanchard
The only significant changes were to kvmppc_exit_timing_write() and kvmppc_exit_timing_show(), both of which were dramatically simplified. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: Implement in-kernel exit timing statisticsHollis Blanchard
Existing KVM statistics are either just counters (kvm_stat) reported for KVM generally or trace based aproaches like kvm_trace. For KVM on powerpc we had the need to track the timings of the different exit types. While this could be achieved parsing data created with a kvm_trace extension this adds too much overhead (at least on embedded PowerPC) slowing down the workloads we wanted to measure. Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm code. These statistic is available per vm&vcpu under the kvm debugfs directory. As this statistic is low, but still some overhead it can be enabled via a .config entry and should be off by default. Since this patch touched all powerpc kvm_stat code anyway this code is now merged and simplified together with the exit timing statistic code (still working with exit timing disabled in .config). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: save and restore guest mappings on context switchHollis Blanchard
Store shadow TLB entries in memory, but only use it on host context switch (instead of every guest entry). This improves performance for most workloads on 440 by reducing the guest TLB miss rate. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: directly insert shadow mappings into the hardware TLBHollis Blanchard
Formerly, we used to maintain a per-vcpu shadow TLB and on every entry to the guest would load this array into the hardware TLB. This consumed 1280 bytes of memory (64 entries of 16 bytes plus a struct page pointer each), and also required some assembly to loop over the array on every entry. Instead of saving a copy in memory, we can just store shadow mappings directly into the hardware TLB, accepting that the host kernel will clobber these as part of the normal 440 TLB round robin. When we do that we need less than half the memory, and we have decreased the exit handling time for all guest exits, at the cost of increased number of TLB misses because the host overwrites some guest entries. These savings will be increased on processors with larger TLBs or which implement intelligent flush instructions like tlbivax (which will avoid the need to walk arrays in software). In addition to that and to the code simplification, we have a greater chance of leaving other host userspace mappings in the TLB, instead of forcing all subsequent tasks to re-fault all their mappings. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31powerpc/44x: declare tlb_44x_index for use in C codeHollis Blanchard
KVM currently ignores the host's round robin TLB eviction selection, instead maintaining its own TLB state and its own round robin index. However, by participating in the normal 44x TLB selection, we can drop the alternate TLB processing in KVM. This results in a significant performance improvement, since that processing currently must be done on *every* guest exit. Accordingly, KVM needs to be able to access and increment tlb_44x_index. (KVM on 440 cannot be a module, so there is no need to export this symbol.) Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: support large host pagesHollis Blanchard
KVM on 440 has always been able to handle large guest mappings with 4K host pages -- we must, since the guest kernel uses 256MB mappings. This patch makes KVM work when the host has large pages too (tested with 64K). Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: fix userspace mapping invalidation on context switchHollis Blanchard
We used to defer invalidating userspace TLB entries until jumping out of the kernel. This was causing MMU weirdness most easily triggered by using a pipe in the guest, e.g. "dmesg | tail". I believe the problem was that after the guest kernel changed the PID (part of context switch), the old process's mappings were still present, and so copy_to_user() on the "return to new process" path ended up using stale mappings. Testing with large pages (64K) exposed the problem, probably because with 4K pages, pressure on the TLB faulted all process A's mappings out before the guest kernel could insert any for process B. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: use prefetchable mappings for guest memoryHollis Blanchard
Bare metal Linux on 440 can "overmap" RAM in the kernel linear map, so that it can use large (256MB) mappings even if memory isn't a multiple of 256MB. To prevent the hardware prefetcher from loading from an invalid physical address through that mapping, it's marked Guarded. However, KVM must ensure that all guest mappings are backed by real physical RAM (since a deliberate access through a guarded mapping could still cause a machine check). Accordingly, we don't need to make our mappings guarded, so let's allow prefetching as the designers intended. Curiously this patch didn't affect performance at all on the quick test I tried, but it's clearly the right thing to do anyways and may improve other workloads. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: use MMUCR accessor to obtain TIDHollis Blanchard
We have an accessor; might as well use it. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: fix Kconfig constraintsHollis Blanchard
Make sure that CONFIG_KVM cannot be selected without processor support (currently, 440 is the only processor implementation available). Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: improve trap emulationHollis Blanchard
set ESR[PTR] when emulating a guest trap. This allows Linux guests to properly handle WARN_ON() (i.e. detect that it's a non-fatal trap). Also remove debugging printk in trap emulation. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: optimize irq delivery pathHollis Blanchard
In kvmppc_deliver_interrupt is just one case left in the switch and it is a rare one (less than 8%) when looking at the exit numbers. Therefore we can at least drop the switch/case and if an if. I inserted an unlikely too, but that's open for discussion. In kvmppc_can_deliver_interrupt all frequent cases are in the default case. I know compilers are smart but we can make it easier for them. By writing down all options and removing the default case combined with the fact that ithe values are constants 0..15 should allow the compiler to write an easy jump table. Modifying kvmppc_can_deliver_interrupt pointed me to the fact that gcc seems to be unable to reduce priority_exception[x] to a build time constant. Therefore I changed the usage of the translation arrays in the interrupt delivery path completely. It is now using priority without translation to irq on the full irq delivery path. To be able to do that ivpr regs are stored by their priority now. Additionally the decision made in kvmppc_can_deliver_interrupt is already sufficient to get the value of interrupt_msr_mask[x]. Therefore we can replace the 16x4byte array used here with a single 4byte variable (might still be one miss, but the chance to find this in cache should be better than the right entry of the whole array). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: optimize find first bitHollis Blanchard
Since we use a unsigned long here anyway we can use the optimized __ffs. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: optimize kvm stat handlingHollis Blanchard
Currently we use an unnecessary if&switch to detect some cases. To be honest we don't need the ligh_exits counter anyway, because we can calculate it out of others. Sum_exits can also be calculated, so we can remove that too. MMIO, DCR and INTR can be counted on other places without these additional control structures (The INTR case was never hit anyway). The handling of BOOKE_INTERRUPT_EXTERNAL/BOOKE_INTERRUPT_DECREMENTER is similar, but we can avoid the additional if when copying 3 lines of code. I thought about a goto there to prevent duplicate lines, but rewriting three lines should be better style than a goto cross switch/case statements (its also not enough code to justify a new inline function). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: fix set regs to take care of msr changeHollis Blanchard
When changing some msr bits e.g. problem state we need to take special care of that. We call the function in our mtmsr emulation (not needed for wrtee[i]), but we don't call kvmppc_set_msr if we change msr via set_regs ioctl. It's a corner case we never hit so far, but I assume it should be kvmppc_set_msr in our arch set regs function (I found it because it is also a corner case when using pv support which would miss the update otherwise). Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: ppc: adjust vcpu types to support 64-bit coresHollis Blanchard
However, some of these fields could be split into separate per-core structures in the future. Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>