aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/mm
AgeCommit message (Collapse)Author
2007-05-09[POWERPC] Don't use SLAB/SLUB for PTE pagesHugh Dickins
The SLUB allocator relies on struct page fields first_page and slab, overwritten by ptl when SPLIT_PTLOCK: so the SLUB allocator cannot then be used for the lowest level of pagetable pages. This was obstructing SLUB on PowerPC, which uses kmem_caches for its pagetables. So convert its pte level to use normal gfp pages (whereas pmd, pud and 64k-page pgd want partpages, so continue to use kmem_caches for pmd, pud and pgd). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09[POWERPC] Add ability to 4K kernel to hash in 64K pagesBenjamin Herrenschmidt
This adds the ability for a kernel compiled with 4K page size to have special slices containing 64K pages and hash the right type of hash PTEs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09[POWERPC] Introduce address space "slices"Benjamin Herrenschmidt
The basic issue is to be able to do what hugetlbfs does but with different page sizes for some other special filesystems; more specifically, my need is: - Huge pages - SPE local store mappings using 64K pages on a 4K base page size kernel on Cell - Some special 4K segments in 64K-page kernels for mapping a dodgy type of powerpc-specific infiniband hardware that requires 4K MMU mappings for various reasons I won't explain here. The main issues are: - To maintain/keep track of the page size per "segment" (as we can only have one page size per segment on powerpc, which are 256MB divisions of the address space). - To make sure special mappings stay within their allotted "segments" (including MAP_FIXED crap) - To make sure everybody else doesn't mmap/brk/grow_stack into a "segment" that is used for a special mapping Some of the necessary mechanisms to handle that were present in the hugetlbfs code, but mostly in ways not suitable for anything else. The patch relies on some changes to the generic get_unmapped_area() that just got merged. It still hijacks hugetlb callbacks here or there as the generic code hasn't been entirely cleaned up yet but that shouldn't be a problem. So what is a slice ? Well, I re-used the mechanism used formerly by our hugetlbfs implementation which divides the address space in "meta-segments" which I called "slices". The division is done using 256MB slices below 4G, and 1T slices above. Thus the address space is divided currently into 16 "low" slices and 16 "high" slices. (Special case: high slice 0 is the area between 4G and 1T). Doing so simplifies significantly the tracking of segments and avoids having to keep track of all the 256MB segments in the address space. While I used the "concepts" of hugetlbfs, I mostly re-implemented everything in a more generic way and "ported" hugetlbfs to it. Slices can have an associated page size, which is encoded in the mmu context and used by the SLB miss handler to set the segment sizes. The hash code currently doesn't care, it has a specific check for hugepages, though I might add a mechanism to provide per-slice hash mapping functions in the future. The slice code provide a pair of "generic" get_unmapped_area() (bottomup and topdown) functions that should work with any slice size. There is some trickiness here so I would appreciate people to have a look at the implementation of these and let me know if I got something wrong. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09[POWERPC] Small fixes & cleanups in segment page size demotionBenjamin Herrenschmidt
The code for demoting segments to 4K had some issues, like for example, when using _PAGE_4K_PFN flag, the first CPU to hit it would do the demotion, but other CPUs hitting the same page wouldn't properly flush their SLBs if mmu_ci_restriction isn't set. There are also potential issues with hash_preload not handling _PAGE_4K_PFN. All of these are non issues on current hardware but might bite us in the future. This patch thus fixes it by: - Taking the test comparing the mm and current CPU context page sizes to decide to flush SLBs out of the mmu_ci_restrictions test since that can also be triggered by _PAGE_4K_PFN pages - Due to the above being done all the time, demote_segment_4k doesn't need update the context and flush the SLB - demote_segment_4k can be static and doesn't need an EXPORT_SYMBOL - Making hash_preload ignore anything that has either _PAGE_4K_PFN or _PAGE_NO_CACHE set, thus avoiding duplication of the complicated logic in hash_page() (and possibly making hash_preload a little bit faster for the normal case). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-09[POWERPC] Mark pages that don't exist as nosaveJohannes Berg
On some powerpc architectures (notably 64-bit powermac) there is a memory hole, for example on powermacs between 2G and 4G. Since we use the flat memory model regardless, these pages must be marked as nosave (for suspend to disk.) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-08Merge branch 'master' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (77 commits) [POWERPC] Abolish powerpc_flash_init() [POWERPC] Early serial debug support for PPC44x [POWERPC] Support for the Ebony 440GP reference board in arch/powerpc [POWERPC] Add device tree for Ebony [POWERPC] Add powerpc/platforms/44x, disable platforms/4xx for now [POWERPC] MPIC U3/U4 MSI backend [POWERPC] MPIC MSI allocator [POWERPC] Enable MSI mappings for MPIC [POWERPC] Tell Phyp we support MSI [POWERPC] RTAS MSI implementation [POWERPC] PowerPC MSI infrastructure [POWERPC] Rip out the existing powerpc msi stubs [POWERPC] Remove use of 4level-fixup.h for ppc32 [POWERPC] Add powerpc PCI-E reset API implementation [POWERPC] Holly bootwrapper [POWERPC] Holly DTS [POWERPC] Holly defconfig [POWERPC] Add support for 750CL Holly board [POWERPC] Generalize tsi108 PCI setup [POWERPC] Generalize tsi108 PHY types ... Fixed conflict in include/asm-powerpc/kdebug.h manually Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08header cleaning: don't include smp_lock.h when not usedRandy Dunlap
Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08move die notifier handling to common codeChristoph Hellwig
This patch moves the die notifier handling to common code. Previous various architectures had exactly the same code for it. Note that the new code is compiled unconditionally, this should be understood as an appel to the other architecture maintainer to implement support for it aswell (aka sprinkling a notify_die or two in the proper place) arm had a notifiy_die that did something totally different, I renamed it to arm_notify_die as part of the patch and made it static to the file it's declared and used at. avr32 used to pass slightly less information through this interface and I brought it into line with the other architectures. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix vmalloc_sync_all bustage] [bryan.wu@analog.com: fix vmalloc_sync_all in nommu] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08use SLAB_PANIC flag cleanupAkinobu Mita
Use SLAB_PANIC and delete duplicated panic(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Ian Molton <spyro@f2s.com> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08[POWERPC] Remove use of 4level-fixup.h for ppc32David Gibson
For 32-bit systems, powerpc still relies on the 4level-fixup.h hack, to pretend that the generic pagetable handling stuff is 3-levels rather than 4. This patch removes this, instead using the newer pgtable-nopmd.h to handle the elision of both the pud and pmd pagetable levels (ppc32 pagetables are actually 2 levels). This removes a little extraneous code, and makes it more easily compared to the 64-bit pagetable code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-08Merge branch 'linux-2.6'Paul Mackerras
2007-05-08[POWERPC] Add __init annotations to reserve_mem() and stabs_alloc()Michael Ellerman
reserve_mem() and stabs_alloc() are both called only from other __init routines, so can be marked __init. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-07get_unmapped_area handles MAP_FIXED on powerpcBenjamin Herrenschmidt
The current get_unmapped_area code calls the f_ops->get_unmapped_area or the arch one (via the mm) only when MAP_FIXED is not passed. That makes it impossible for archs to impose proper constraints on regions of the virtual address space. To work around that, get_unmapped_area() then calls some hugetlbfs specific hacks. This cause several problems, among others: - It makes it impossible for a driver or filesystem to do the same thing that hugetlbfs does (for example, to allow a driver to use larger page sizes to map external hardware) if that requires applying a constraint on the addresses (constraining that mapping in certain regions and other mappings out of those regions). - Some archs like arm, mips, sparc, sparc64, sh and sh64 already want MAP_FIXED to be passed down in order to deal with aliasing issues. The code is there to handle it... but is never called. This series of patches moves the logic to handle MAP_FIXED down to the various arch/driver get_unmapped_area() implementations, and then changes the generic code to always call them. The hugetlbfs hacks then disappear from the generic code. Since I need to do some special 64K pages mappings for SPEs on cell, I need to work around the first problem at least. I have further patches thus implementing a "slices" layer that handles multiple page sizes through slices of the address space for use by hugetlbfs, the SPE code, and possibly others, but it requires that serie of patches first/ There is still a potential (but not practical) issue due to the fact that filesystems/drivers implemeting g_u_a will effectively bypass all arch checks. This is not an issue in practice as the only filesystems/drivers using that hook are doing so for arch specific purposes in the first place. There is also a problem with mremap that will completely bypass all arch checks. I'll try to address that separately, I'm not 100% certain yet how, possibly by making it not work when the vma has a file whose f_ops has a get_unmapped_area callback, and by making it use is_hugepage_only_range() before expanding into a new area. Also, I want to turn is_hugepage_only_range() into a more generic is_normal_page_range() as that's really what it will end up meaning when used in stack grow, brk grow and mremap. None of the above "issues" however are introduced by this patch, they are already there, so I think the patch can go ini for 2.6.22. This patch: Handle MAP_FIXED in powerpc's arch_get_unmapped_area() in all 3 implementations of it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slab allocators: remove multiple alignment specificationsChristoph Lameter
It is not necessary to tell the slab allocators to align to a cacheline if an explicit alignment was already specified. It is rather confusing to specify multiple alignments. Make sure that the call sites only use one form of alignment. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGNChristoph Lameter
This patch was recently posted to lkml and acked by Pekka. The flag SLAB_MUST_HWCACHE_ALIGN is 1. Never checked by SLAB at all. 2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB 3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB. The only remaining use is in sparc64 and ppc64 and their use there reflects some earlier role that the slab flag once may have had. If its specified then SLAB_HWCACHE_ALIGN is also specified. The flag is confusing, inconsistent and has no purpose. Remove it. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07[POWERPC] 64K page support for kexecLuke Browning
This fixes a couple of kexec problems related to 64K page support in the kernel. kexec issues a tlbie for each pte. The parameters for the tlbie are the page size and the virtual address. Support was missing for the computation of these two parameters for 64K pages. This adds that support. Signed-off-by: Luke Browning <lukebrowning@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-02[POWERPC] Minor fault path optimizationChristoph Hellwig
Call the kprobes pagefault handler directly instead of going through the complex notifier chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-02[POWERPC] Revise PPC44x MMU code for arch/powerpcDavid Gibson
This patch takes the definitions for the PPC44x MMU (a software loaded TLB) from asm-ppc/mmu.h, cleans them up of things no longer necessary in arch/powerpc and puts them in a new asm-powerpc/mmu_44x.h file. It also substantially simplifies arch/powerpc/mm/44x_mmu.c and makes a couple of small fixes necessary for the 44x MMU code to build and work properly in arch/powerpc. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-02[POWERPC] Initialise spinlock in the DEBUG_PAGEALLOC codeMichael Ellerman
Fixes: BUG: spinlock bad magic on CPU#0, swapper/0 lock: c00000000064ec30, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 Call Trace: [c00000000062b980] [c00000000000f920] .show_stack+0x6c/0x1a0 (unreliable) [c00000000062ba20] [c0000000001c2b40] .spin_bug+0xb0/0xd4 [c00000000062bab0] [c0000000001c2ed0] ._raw_spin_lock+0x44/0x184 [c00000000062bb50] [c0000000003a42b4] ._spin_lock+0x10/0x24 [c00000000062bbd0] [c00000000002b4dc] .kernel_map_pages+0x198/0x278 [c00000000062bc90] [c000000000079720] .free_hot_cold_page+0x124/0x418 [c00000000062bd70] [c000000000530278] .free_all_bootmem_core+0x14c/0x224 [c00000000062be50] [c00000000052a178] .mem_init+0x68/0x170 [c00000000062bee0] [c00000000051d874] .start_kernel+0x2a0/0x37c [c00000000062bf90] [c0000000000084c8] .start_here_common+0x54/0x8c Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-05-02[POWERPC] Remove unneeded page_is_ram exportJohannes Berg
arch/powerpc/mm/mem.c exports page_is_ram, which is not used anywhere that could be modular. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-24[POWERPC] Abolish PHYS_FMT macro from arch/powerpcDavid Gibson
32-bit powerpc systems define a macro, PHYS_FMT, giving a printf format string fragment for displaying physical addresses, since most 32-bit powerpc platforms use 32-bit physical addresses but a few use 64-bit physical addresses. This macro is used in exactly one place, a rare error message, where we can solve the problem more simply by just unconditionally casting the address up to 64-bit quantity before formatting it. This patch does so, meaning that as we bring MMU definitions from asm-ppc over to asm-powerpc, cleaning them up in the process, we don't need to implement this ugly macro (which additionally has a very bad name for something global). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-24[POWERPC] Cleanup and fix breakage in tlbflush.hDavid Gibson
BenH's commit a741e67969577163a4cfc78d7fd2753219087ef1 in powerpc.git, although (AFAICT) only intended to affect ppc64, also has side-effects which break 44x. I think 40x, 8xx and Freescale Book E are also affected, though I haven't tested them. The problem lies in unconditionally removing flush_tlb_pending() from the versions of flush_tlb_mm(), flush_tlb_range() and flush_tlb_kernel_range() used on ppc64 - which are also used the embedded platforms mentioned above. The patch below cleans up the convoluted #ifdef logic in tlbflush.h, in the process restoring the necessary flushes for the software TLB platforms. There are three sets of definitions for the flushing hooks: the software TLB versions (revised to avoid using names which appear to related to TLB batching), the 32-bit hash based versions (external functions) amd the 64-bit hash based versions (which implement batching). It also moves the declaration of update_mmu_cache() to always be in tlbflush.h (previously it was in tlbflush.h except for PPC64, where it was in pgtable.h). Booted on Ebony (440GP) and compiled for 64-bit and 32-bit multiplatform. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] DEBUG_PAGEALLOC for 64-bitBenjamin Herrenschmidt
Here's an implementation of DEBUG_PAGEALLOC for 64 bits powerpc. It applies on top of the 32 bits patch. Unlike Anton's previous attempt, I'm not using updatepp. I'm removing the hash entries from the bolted mapping (using a map in RAM of all the slots). Expensive but it doesn't really matter, does it ? :-) Memory hot-added doesn't benefit from this unless it's added at an address that is below end_of_DRAM() as calculated at boot time. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> arch/powerpc/Kconfig.debug | 2 arch/powerpc/mm/hash_utils_64.c | 84 ++++++++++++++++++++++++++++++++++++++-- 2 files changed, 82 insertions(+), 4 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] DEBUG_PAGEALLOC for 32-bitBenjamin Herrenschmidt
Here's an implementation of DEBUG_PAGEALLOC for ppc32. It disables BAT mapping and is only tested with Hash table based processor though it shouldn't be too hard to adapt it to others. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> arch/powerpc/Kconfig.debug | 9 ++++++ arch/powerpc/mm/init_32.c | 4 +++ arch/powerpc/mm/pgtable_32.c | 52 +++++++++++++++++++++++++++++++++++++++ arch/powerpc/mm/ppc_mmu_32.c | 4 ++- include/asm-powerpc/cacheflush.h | 6 ++++ 5 files changed, 74 insertions(+), 1 deletion(-) Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] Fix 32-bit mm operations when not using BATsBenjamin Herrenschmidt
On hash table based 32 bits powerpc's, the hash management code runs with a big spinlock. It's thus important that it never causes itself a hash fault. That code is generally safe (it does memory accesses in real mode among other things) with the exception of the actual access to the code itself. That is, the kernel text needs to be accessible without taking a hash miss exceptions. This is currently guaranteed by having a BAT register mapping part of the linear mapping permanently, which includes the kernel text. But this is not true if using the "nobats" kernel command line option (which can be useful for debugging) and will not be true when using DEBUG_PAGEALLOC implemented in a subsequent patch. This patch fixes this by pre-faulting in the hash table pages that hit the kernel text, and making sure we never evict such a page under hash pressure. Signed-off-by: Benjamin Herrenchmidt <benh@kernel.crashing.org> arch/powerpc/mm/hash_low_32.S | 22 ++++++++++++++++++++-- arch/powerpc/mm/mem.c | 3 --- arch/powerpc/mm/mmu_decl.h | 4 ++++ arch/powerpc/mm/pgtable_32.c | 11 +++++++---- 4 files changed, 31 insertions(+), 9 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] Cleanup 32-bit map_pageBenjamin Herrenschmidt
The 32 bits map_page() function is used internally by the mm code for early mmu mappings and for ioremap. It should never be called for an address that already has a valid PTE or hash entry, so we add a BUG_ON for that and remove the useless flush_HPTE call. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> arch/powerpc/mm/pgtable_32.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] Make tlb flush batch use lazy MMU modeBenjamin Herrenschmidt
The current tlb flush code on powerpc 64 bits has a subtle race since we lost the page table lock due to the possible faulting in of new PTEs after a previous one has been removed but before the corresponding hash entry has been evicted, which can leads to all sort of fatal problems. This patch reworks the batch code completely. It doesn't use the mmu_gather stuff anymore. Instead, we use the lazy mmu hooks that were added by the paravirt code. They have the nice property that the enter/leave lazy mmu mode pair is always fully contained by the PTE lock for a given range of PTEs. Thus we can guarantee that all batches are flushed on a given CPU before it drops that lock. We also generalize batching for any PTE update that require a flush. Batching is now enabled on a CPU by arch_enter_lazy_mmu_mode() and disabled by arch_leave_lazy_mmu_mode(). The code epects that this is always contained within a PTE lock section so no preemption can happen and no PTE insertion in that range from another CPU. When batching is enabled on a CPU, every PTE updates that need a hash flush will use the batch for that flush. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] Rename get_property to of_get_property: arch/powerpcStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] Allow drivers to map individual 4k pages to userspacePaul Mackerras
Some drivers have resources that they want to be able to map into userspace that are 4k in size. On a kernel configured with 64k pages we currently end up mapping the 4k we want plus another 60k of physical address space, which could contain anything. This can introduce security problems, for example in the case of an infiniband adaptor where the other 60k could contain registers that some other program is using for its communications. This patch adds a new function, remap_4k_pfn, which drivers can use to map a single 4k page to userspace regardless of whether the kernel is using a 4k or a 64k page size. Like remap_pfn_range, it would typically be called in a driver's mmap function. It only maps a single 4k page, which on a 64k page kernel appears replicated 16 times throughout a 64k page. On a 4k page kernel it reduces to a call to remap_pfn_range. The way this works on a 64k kernel is that a new bit, _PAGE_4K_PFN, gets set on the linux PTE. This alters the way that __hash_page_4K computes the real address to put in the HPTE. The RPN field of the linux PTE becomes the 4k RPN directly rather than being interpreted as a 64k RPN. Since the RPN field is 32 bits, this means that physical addresses being mapped with remap_4k_pfn have to be below 2^44, i.e. 0x100000000000. The patch also factors out the code in arch/powerpc/mm/hash_utils_64.c that deals with demoting a process to use 4k pages into one function that gets called in the various different places where we need to do that. There were some discrepancies between exactly what was done in the various places, such as a call to spu_flush_all_slbs in one case but not in others. Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] Rename prom_n_size_cells to of_n_size_cellsStephen Rothwell
This is more consistent and gets us closer to the Sparc code. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13[POWERPC] Rename prom_n_addr_cells to of_n_addr_cellsStephen Rothwell
This is more consistent and gets us closer to the Sparc code. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-04-13Merge branch 'linux-2.6' into for-2.6.22Paul Mackerras
2007-03-10[POWERPC] Fix spu SLB invalidationsBenjamin Herrenschmidt
The SPU code doesn't properly invalidate SPUs SLBs when necessary, for example when changing a segment size from the hugetlbfs code. In addition, it saves and restores the SLB content on context switches which makes it harder to properly handle those invalidations. This patch removes the saving & restoring for now, something more efficient might be found later on. It also adds a spu_flush_all_slbs(mm) that can be used by the core mm code to flush the SLBs of all SPEs that are running a given mm at the time of the flush. In order to do that, it adds a spinlock to the list of all SPEs and move some bits & pieces from spufs to spu_base.c Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2007-03-08[POWERPC] Allow duplicate lmb_reserve() callsDavid Gibson
At present calling lmb_reserve() (and hence lmb_add_region()) twice for exactly the same memory region will cause strange behaviour. This makes life difficult when booting from a flat device tree with memory reserve map. Which regions are automatically reserved by the kernel has changed over time, so it's quite possible a newer kernel could attempt to auto-reserve a region which is also explicitly listed in the device tree's reserve map, leading to trouble. This patch avoids the problem by making lmb_reserve() ignore a call to reserve a previously reserved region. It also removes a now redundant test designed to avoid one specific case of the problem noted above. At present, this patch deals only with duplicate reservations of an identical region. Attempting to reserve two different, but overlapping regions will still cause problems. I might post another patch later dealing with this case, but I'm avoiding it now since it is substantially more complicated to deal with, less likely to occur and more likely to indicate a genuine bug elsewhere if it does occur. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) Documentation/kernel-docs.txt update. arch/cris: typo in KERN_INFO Storage class should be before const qualifier kernel/printk.c: comment fix update I/O sched Kconfig help texts - CFQ is now default, not AS. Remove duplicate listing of Cris arch from README kbuild: more doc. cleanups doc: make doc. for maxcpus= more visible drivers/net/eexpress.c: remove duplicate comment add a help text for BLK_DEV_GENERIC correct a dead URL in the IP_MULTICAST help text fix the BAYCOM_SER_HDX help text fix SCSI_SCAN_ASYNC help text trivial documentation patch for platform.txt Fix typos concerning hierarchy Fix comment typo "spin_lock_irqrestore". Fix misspellings of "agressive". drivers/scsi/a100u2w.c: trivial typo patch Correct trivial typo in log2.h. Remove useless FIND_FIRST_BIT() macro from cardbus.c. ...
2007-02-17Fix typos concerning hierarchyUwe Kleine-König
heirarchical, hierachical -> hierarchical heirarchy, hierachy -> hierarchy Signed-off-by: Uwe Kleine-König <zeisberg@informatik.uni-freiburg.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-16[POWERPC] Fix bug with early ioremap and 64k pagesBenjamin Herrenschmidt
The code for bolting hash entries for ioremap done before proper mm initialization has a grown a bug when using 64K pages on a machine where non-cacheable mappings are demoted to 4K HW pages. The wrong page size index is being passed to the hash table mapping functions causing a crash at boot on some pSeries machines using bare metal linux. This fixes it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-13[POWERPC] Fix vDSO page count calculationBenjamin Herrenschmidt
The recent vDSO consolidation patches broke powerpc due to a mistake in the definition of MAXPAGES constants. This fixes it by moving to a dynamically allocated array of pages instead as I don't like much hard coded size limits. Also move the vdso initialisation to an initcall since it doesn't really need to be done -that- early. Applogies for not catching the breakage earlier, Roland _did_ CC me on his patches a while ago, I got busy with other things and forgot to test them. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-09[POWERPC] Fix is_power_of_4(x) compile errorKumar Gala
When building an 85xx kernel we get: CC arch/powerpc/mm/pgtable_32.o arch/powerpc/mm/pgtable_32.c: In function 'io_block_mapping': arch/powerpc/mm/pgtable_32.c:330: error: expected identifier before '(' token arch/powerpc/mm/pgtable_32.c:330: error: expected statement before ')' token The is_power_of_2(x) fixup patch left an extra ')' on the is_power_of_4 macro. There is a similiar issue on the arch/ppc side. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2007-02-08[POWERPC] Remove bogus comment about page_is_ramJohannes Berg
arch/powerpc/mm/mem.c states that page_is_ram is called by the code that implements /dev/mem, which isn't true. Remove the comment. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-07[POWERPC] Add "is_power_of_2" checking to log2.h.Robert P. J. Day
Add the inline function "is_power_of_2()" to log2.h, where the value zero is *not* considered to be a power of two. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-02-07[POWERPC] 8xx: platform specific mmu updatesVitaly Bordug
This is just a straight port of the same done in arch/ppc by Marcelo Tosatti. One used to be [PATCH] ppc32 8xx: update_mmu_cache() needs unconditional tlbie, commit eb07d964b4491d1bb5864cd3d7e7633ccdda9a53 In a nutshell, the board is nearly stuck without this, yet without any visible failure - being just very slow. Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-01-24[POWERPC] TLB insertion cleanupIshizaki Kou
This patch changes handling return value of ppc_md.hpte_insert() into the same way as __hash_page_*(). Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-01-09[POWERPC] Fix bogus BUG_ON() in in hugetlb_get_unmapped_area()David Gibson
The powerpc specific version of hugetlb_get_unmapped_area() makes some unwarranted assumptions about what checks have been made to its parameters by its callers. This will lead to a BUG_ON() if a 32-bit process attempts to make a hugepage mapping which extends above TASK_SIZE (4GB). I'm not sure if these assumptions came about because they were valid with earlier versions of the get_unmapped_area() path, or if it was always broken. Nonetheless this patch fixes the logic, and removes the crash. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-13[PATCH] getting rid of all casts of k[cmz]alloc() callsRobert P. J. Day
Run this: #!/bin/sh for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do echo "De-casting $f..." perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f done And then go through and reinstate those cases where code is casting pointers to non-pointers. And then drop a few hunks which conflicted with outstanding work. Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Greg KH <greg@kroah.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Karsten Keil <kkeil@suse.de> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Ian Kent <raven@themaw.net> Cc: Steven French <sfrench@us.ibm.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11[POWERPC] Support ibm,dynamic-reconfiguration-memory nodesPaul Mackerras
For PAPR partitions with large amounts of memory, the firmware has an alternative, more compact representation for the information about the memory in the partition and its NUMA associativity information. This adds the code to the kernel to parse this alternative representation. The other part of this patch is telling the firmware that we can handle the alternative representation. There is however a subtlety here, because the firmware will invoke a reboot if the memory representation we request is different from the representation that firmware is currently using. This is because firmware can't change the representation on the fly. Further, some firmware versions used on POWER5+ machines have a bug where this reboot leaves the machine with an altered value of load-base, which will prevent any kernel booting until it is reset to the normal value (0x4000). Because of this bug, we do NOT set fake_elf.rpanote.new_mem_def = 1, and thus we do not request the new representation on POWER5+ and earlier machines. We do request the new representation on POWER6, which uses the ibm,client-architecture-support call. Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-07[PATCH] slab: remove kmem_cache_tChristoph Lameter
Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] shared page table for hugetlb pageChen, Kenneth W
Following up with the work on shared page table done by Dave McCracken. This set of patch target shared page table for hugetlb memory only. The shared page table is particular useful in the situation of large number of independent processes sharing large shared memory segments. In the normal page case, the amount of memory saved from process' page table is quite significant. For hugetlb, the saving on page table memory is not the primary objective (as hugetlb itself already cuts down page table overhead significantly), instead, the purpose of using shared page table on hugetlb is to allow faster TLB refill and smaller cache pollution upon TLB miss. With PT sharing, pte entries are shared among hundreds of processes, the cache consumption used by all the page table is smaller and in return, application gets much higher cache hit ratio. One other effect is that cache hit ratio with hardware page walker hitting on pte in cache will be higher and this helps to reduce tlb miss latency. These two effects contribute to higher application performance. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Dave McCracken <dmccr@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-04[POWERPC] Fix cputable.h for combined buildStephen Rothwell
Remove CPU_FTR_16M_PAGE from the cupfeatures mask at runtime on iSeries. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-12-04[POWERPC] ps3: multiplatform build fixesArnd Bergmann
A few code paths need to check whether or not they are running on the PS3's LV1 hypervisor before making hcalls. This introduces a new firmware feature bit for this, FW_FEATURE_PS3_LV1. Now when both PS3 and IBM_CELL_BLADE are enabled, but not PSERIES, FW_FEATURE_PS3_LV1 and FW_FEATURE_LPAR get enabled at compile time, which is a bug. The same problem can also happen for (PPC_ISERIES && !PPC_PSERIES && PPC_SOMETHING_ELSE). In order to solve this, I introduce a new CONFIG_PPC_NATIVE option that is set when at least one platform is selected that can run without a hypervisor and then turns the firmware feature check into a run-time option. The new cell oprofile support that was recently merged does not work on hypervisor based platforms like the PS3, therefore make it depend on PPC_CELL_NATIVE instead of PPC_CELL. This may change if we get oprofile support for PS3. Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>