Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2005-10-29	[PATCH] mm: arches skip ptlock	Hugh Dickins
	Convert those few architectures which are calling pud_alloc, pmd_alloc, pte_alloc_map on a user mm, not to take the page_table_lock first, nor drop it after. Each of these can continue to use pte_alloc_map, no need to change over to pte_alloc_map_lock, they're neither racy nor swappable. In the sparc64 io_remap_pfn_range, flush_tlb_range then falls outside of the page_table_lock: that's okay, on sparc64 it's like flush_tlb_mm, and that has always been called from outside of page_table_lock in dup_mmap. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29	[PATCH] core remove PageReserved	Nick Piggin
	Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality. PageReserved special casing is removed from get_page and put_page. All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages. MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept). Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed. Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work). A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. Signed-off-by: Nick Piggin <npiggin@suse.de> Refcount bug fix for filemap_xip.c Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29	[PATCH] mm: mm_init set_mm_counters	Hugh Dickins
	How is anon_rss initialized? In dup_mmap, and by mm_alloc's memset; but that's not so good if an mm_counter_t is a special type. And how is rss initialized? By set_mm_counter, all over the place. Come on, we just need to initialize them both at once by set_mm_counter in mm_init (which follows the memcpy when forking). Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29	[PATCH] mm: tlb_finish_mmu forget rss	Hugh Dickins
	zap_pte_range has been counting the pages it frees in tlb->freed, then tlb_finish_mmu has used that to update the mm's rss. That got stranger when I added anon_rss, yet updated it by a different route; and stranger when rss and anon_rss became mm_counters with special access macros. And it would no longer be viable if we're relying on page_table_lock to stabilize the mm_counter, but calling tlb_finish_mmu outside that lock. Remove the mmu_gather's freed field, let tlb_finish_mmu stick to its own business, just decrement the rss mm_counter in zap_pte_range (yes, there was some point to batching the update, and a subsequent patch restores that). And forget the anal paranoia of first reading the counter to avoid going negative - if rss does go negative, just fix that bug. Remove the mmu_gather's flushes and avoided_flushes from arm and arm26: no use was being made of them. But arm26 alone was actually using the freed, in the way some others use need_flush: give it a need_flush. arm26 seems to prefer spaces to tabs here: respect that. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-29	[PATCH] mm: tlb_is_full_mm was obscure	Hugh Dickins
	tlb_is_full_mm? What does that mean? The TLB is full? No, it means that the mm's last user has gone and the whole mm is being torn down. And it's an inline function because sparc64 uses a different (slightly better) "tlb_frozen" name for the flag others call "fullmm". And now the ptep_get_and_clear_full macro used in zap_pte_range refers directly to tlb->fullmm, which would be wrong for sparc64. Rather than correct that, I'd prefer to scrap tlb_is_full_mm altogether, and change sparc64 to just use the same poor name as everyone else - is that okay? Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-28	[PATCH] gfp_t: remaining bits of arch/*	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-14	[SPARC64]: Fix powering off on SMP.	David S. Miller
	Doing a "SUNW,stop-self" firmware call on the other cpus is not the correct thing to do when dropping into the firmware for a halt, reboot, or power-off. For now, just do nothing to quiet the other cpus, as the system should be quiescent enough. Later we may decide to implement smp_send_stop() like the other SMP platforms do. Based upon a report from Christopher Zimmermann. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13	[SPARC64]: Eliminate PCI IOMMU dma mapping size limit.	David S. Miller
	The hairy fast allocator in the sparc64 PCI IOMMU code has a hard limit of 256 pages. Certain devices can exceed this when performing very large I/Os. So replace with a more simple allocator, based largely upon the arch/ppc64/kernel/iommu.c code. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-13	[SPARC64]: Consolidate common PCI IOMMU init code.	David S. Miller
	All the PCI controller drivers were doing the same thing setting up the IOMMU software state, put it all in one spot. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-12	[SPARC64]: Fix boot failures on SunBlade-150	David S. Miller
	The sequence to move over to the Linux trap tables from the firmware ones needs to be more air tight. It turns out that to be %100 safe we do need to be able to translate OBP mappings in our TLB miss handlers early. In order not to eat up a lot of kernel image memory with static page tables, just use the translations array in the OBP TLB miss handlers. That solves the bulk of the problem. Furthermore, to make sure the OBP TLB miss path will work even before the fixed MMU globals are loaded, explicitly load %g1 to TLB_SFSR at the beginning of the i-TLB and d-TLB miss handlers. To ease the OBP TLB miss walking of the prom_trans[] array, we sort it then delete all of the non-OBP entries in there (for example, there are entries for the kernel image itself which we're not interested in at all). We also save about 32K of kernel image size with this change. Not a bad side effect :-) There are still some reasons why trampoline.S can't use the setup_trap_table() yet. The most noteworthy are: 1) OBP boots secondary processors with non-bias'd stack for some reason. This is easily fixed by using a small bootup stack in the kernel image explicitly for this purpose. 2) Doing a firmware call via the normal C call prom_set_trap_table() goes through the whole OBP enter/exit sequence that saves and restores OBP and Linux kernel state in the MMUs. This path unfortunately does a "flush %g6" while loading up the OBP locked TLB entries for the firmware call. If we setup the %g6 in the trampoline.S code properly, that is in the PAGE_OFFSET linear mapping, but we're not on the kernel trap table yet so those addresses won't translate properly. One idea is to do a by-hand firmware call like we do in the early bootup code and elsewhere here in trampoline.S But this fails as well, as aparently the secondary processors are not booted with OBP's special locked TLB entries loaded. These are necessary for the firwmare to processes TLB misses correctly up until the point where we take over the trap table. This does need to be resolved at some point. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-11	[SPARC64]: Fix net booting on Ultra5	David S. Miller
	We were not doing alignment properly when remapping the kernel image. What we want is a 4MB aligned physical address to map at KERNBASE. Mistakedly we were 4MB aligning the virtual address where the kernel initially sits, that's wrong. Instead, we should PAGE align the virtual address, then 4MB align the physical address result the prom gives to us. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-10	[SPARC64]: Fix Ultra5, Ultra60, et al. boot failures.	David S. Miller
	On the boot processor, we need to do the move onto the Linux trap table a little bit differently else we'll take unhandlable faults in the firmware address space. Previously we would do the following: 1) Disable PSTATE_IE in %pstate. 2) Set %tba by hand to sparc64_ttable_tl0 3) Initialize alternate, mmu, and interrupt global trap registers. 4) Call prom_set_traptable() That doesn't work very well actually with the way we boot the kernel VM these days. It worked by luck on many systems because the firmware accesses for the prom_set_traptable() call happened to be loaded into the TLB already, something we cannot assume. So the new scheme is this: 1) Clear PSTATE_IE in %pstate and set %pil to 15 2) Call prom_set_traptable() 3) Initialize alternate, mmu, and interrupt global trap registers. and this works quite well. This sequence has been moved into a callable function in assembler named setup-trap_table(). The idea is that eventually trampoline.S can use this code as well. That isn't possible currently due to some complications, but eventually we should be able to do it. Thanks to Meelis Roos for the Ultra5 boot failure report. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-08	[SPARC64]: Fix compile error in irq.c	Sven Hartge
	irq.c is missing the inclusion of asm/io.h, which causes readb() and writeb() the be undefined. Signed-off-by: Sven Hartge <hartge@ds9.argh.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-07	[SPARC64]: Fix userland FPU state corruption.	David S. Miller
	We need to use stricter memory barriers around the block load and store instructions we use to save and restore the FPU register file. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-06	[SPARC64]: Probe for power device on ISA bus too.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-05	[SPARC64]: Fix initrd when net booting.	David S. Miller
	By allocating early memory for the firmware page tables, we can write over the beginning of the initrd image. So what we do now is: 1) Read in firmware translations table while still on the firmware's trap table. 2) Switch to Linux trap table. 3) Init bootmem. 4) Build firmware page tables using __alloc_bootmem(). And this keeps the initrd from being clobbered. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-04	[SPARC64]: Replace cheetah+ code patching with variables.	David S. Miller
	Instead of code patching to handle the page size fields in the context registers, just use variables from which we get the proper values. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29	[SPARC64]: Fix several bugs in flush_ptrace_access().	David S. Miller
	1) Use cpudata cache line sizes, not magic constants. 2) Align start address in cheetah case so we do not get unaligned address traps. (pgrep was good at triggering this, via /proc/${pid}/cmdline accesses) Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29	[SPARC64]: Kill arch/sparc64/prom/memory.c	David S. Miller
	No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-29	[SPARC64]: Rewrite convoluted physical memory probing.	David S. Miller
	Delete all of the code working with sp_banks[] and replace with clean acquisition and sorting of physical memory parameters from the firmware. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28	[SPARC64]: Solidify check in cheetah_check_main_memory().	David S. Miller
	Need to make sure the address is below high_memory before passing it to kern_addr_valid(). Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28	[SPARC64]: Kill all external references to sp_banks[]	David S. Miller
	Thus, we can mark sp_banks[] static in arch/sparc64/mm/init.c Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28	[SPARC64]: Move phys_base, kern_{base,size}, and sp_banks[] init to paging_init	David S. Miller
	Also, move prom_probe_memory() into arch/sparc64/mm/init.c Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28	[SPARC]: Declare paging_init() in asm/pgtable.h	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28	[SPARC64]: Simplify user fault fixup handling.	David S. Miller
	Instead of doing byte-at-a-time user accesses to figure out where the fault occurred, read the saved fault_address from the current thread structure. For the sake of defensive programming, if the fault_address does not fall into the user buffer range, simply assume the whole area faulted. This will cause the fixup for copy_from_user() to clear the entire kernel side buffer. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28	[SPARC64]: Fix fault handling in unaligned trap handler.	David S. Miller
	We were not calling kernel_mna_trap_fault() correctly. Instead of being fancy, just return 0 vs. -EFAULT from the assembler stubs, and handle that return value as appropriate. Create an "__retl_efault" stub for assembler exception table entries and use it where possible. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28	[SPARC64]: Convert to use generic exception table support.	David S. Miller
	The funny "range" exception table entries we had were only used by the compat layer socketcall assembly, and it wasn't even needed there. For free we now get proper exception table sorting and fast binary searching. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-28	[SPARC64]: Fix bug in unaligned load endianness swapping	David S. Miller
	The in-memory value was being swapped, not the value we loaded into the register. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27	[SPARC64]: Add missing IDs for newer cpus.	David S. Miller
	Also, the us3_cpufreq driver can work on Ultra-IV and IV+. They use the SAFARI bus register to control the clock divider just like Ultra-III and III+ do. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26	[SPARC64]: Do not do TLB pre-filling any more.	David S. Miller
	In order to do it correctly on UltraSPARC-III+ and later we'd need to add some complicated code to set the TAG access extension register before loading the TLB. Since this optimization gives questionable gains, it's best to just remove it for now instead of adding the fix for Ultra-III+ Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26	[SPARC64]: Simplify Spitfire D-cache page flush.	David S. Miller
	It tries to batch up the tag loads and comparisons, and then the stores. And this is just complicated instead of efficient. Also, make the symbol of the Cheetah version more grepable. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26	[SPARC64]: Probe D/I/E-cache config and use.	David S. Miller
	At boot time, determine the D-cache, I-cache and E-cache size and line-size. Use them in cache flushes when appropriate. This change was motivated by discovering that the D-cache on UltraSparc-IIIi and later are 64K not 32K, and the flushes done by the Cheetah error handlers were assuming a 32K size. There are still some pieces of code that are hard coding things and will need to be fixed up at some point. While we're here, fix the D-cache and I-cache parity error handlers to run with interrupts disabled, and when the trap occurs at trap level > 1 log the event via a counter displayed in /proc/cpuinfo. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-25	[SPARC64]: Add CONFIG_DEBUG_PAGEALLOC support.	David S. Miller
	The trick is that we do the kernel linear mapping TLB miss starting with an instruction sequence like this: ba,pt %xcc, kvmap_load xor %g2, %g4, %g5 succeeded by an instruction sequence which performs a full page table walk starting at swapper_pg_dir. We first take over the trap table from the firmware. Then, using this constant PTE generation for the linear mapping area above, we build the kernel page tables for the linear mapping. After this is setup, we patch that branch above into a "nop", which will cause TLB misses to fall through to the full page table walk. With this, the page unmapping for CONFIG_DEBUG_PAGEALLOC is trivial. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-24	[SPARC64]: Fix mask formation in tomatillo_wsync_handler()	David S. Miller
	"1" needs to be "1UL", this is a 64-bit mask we're creating. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-23	[SPARC64]: Mark functions called by paging_init() as __init.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-23	[SPARC64]: Kill unused variable in setup_arch()	David S. Miller
	'highest_paddr' is set, but never actually used. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22	[SPARC64]: Fix comment typo in head.S	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22	[SPARC64]: Rewrite bootup sequence.	David S. Miller
	Instead of all of this cpu-specific code to remap the kernel to the correct location, use portable firmware calls to do this instead. What we do now is the following in position independant assembler: chosen_node = prom_finddevice("/chosen"); prom_mmu_ihandle_cache = prom_getint(chosen_node, "mmu"); vaddr = 4MB_ALIGN(current_text_addr()); prom_translate(vaddr, &paddr_high, &paddr_low, &mode); prom_boot_mapping_mode = mode; prom_boot_mapping_phys_high = paddr_high; prom_boot_mapping_phys_low = paddr_low; prom_map(-1, 8 * 1024 * 1024, KERNBASE, paddr_low); and that replaces the massive amount of by-hand TLB probing and programming we used to do here. The new code should also handle properly the case where the kernel is mapped at the correct address already (think: future kexec support). Consequently, the bulk of remap_kernel() dies as does the entirety of arch/sparc64/prom/map.S We try to share some strings in the PROM library with the ones used at bootup, and while we're here mark input strings to oplib.h routines with "const" when appropriate. There are many more simplifications now possible. For one thing, we can consolidate the two copies we now have of a lot of cpu setup code sitting in head.S and trampoline.S. This is a significant step towards CONFIG_DEBUG_PAGEALLOC support. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22	[SPARC64]: Kill readjust_prom_translations()	David S. Miller
	Testing shows that the prom_unmap() calls do absolutely nothing. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22	[SPARC64]: Remove unnecessary paging_init() cruft.	David S. Miller
	Because we don't access the PAGE_OFFSET linear mappings any longer before we take over the trap table from the firmware, we don't need to load dummy mappings there into the TLB and we don't need the bootmap_base hack any longer either. While we are here, check for a larger than 8MB kernel and halt the boot with an error message. We know that doesn't work, so instead of failing mysteriously we should let the user know exactly what's wrong. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22	[SPARC64]: Do not allocate OBP page tables using bootmem	David S. Miller
	Just allocate them physically starting from the end of the kernel image. This incredibly simplifies our MM bootstrap in that we don't need any mappings in the linear PAGE_OFFSET area working in order to bootstrap ourselves and take over the trap table from the firmware. Many further simplifications are possible now, and this also sets the stage for CONFIG_DEBUG_PAGEALLOC support. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-22	[SPARC64]: Break up inherit_prom_mappings() into it's constituent parts.	David S. Miller
	This thing was just a huge monolithic mess, so chop it up. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-21	[SPARC64]: Do not allocate prom translations using bootmem.	David S. Miller
	Use __initdata instead. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-21	[SPARC64]: Remove ktlb.S instruction patching.	David S. Miller
	This was kind of ugly, and actually buggy. The bug was that we didn't handle a machine with memory starting > 4GB. If the 'prompmd' was allocated in physical memory > 4GB we'd croak because the obp_iaddr_patch and obp_daddr_patch things only supported a 32-bit physical address. So fix this by just loading the appropriate values from two variables in the kernel image, which is locked into the TLB and thus accesses to them can't cause a recursive TLB miss. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-21	[SPARC64]: Kill SZ_BITS define from dtlb_backend.S	David S. Miller
	This is just a replica of the existing _PAGE_SZBITS, and thus unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-21	[SPARC64]: Move kernel TLB miss handling into a seperate file.	David S. Miller
	Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-20	[SPARC64]: Verify vmalloc TLB misses more strictly.	David S. Miller
	Arrange the modules, OBP, and vmalloc areas such that a range verification can be done quite minimally. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19	[SPARC64]: Move DCACHE_ALIASING_POSSIBLE define to asm/page.h	David S. Miller
	This showed that arch/sparc64/kernel/ptrace.c was not getting the define properly, and thus the code protected by this ifdef was never actually compiled before. So fix that too. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-19	[SPARC64]: Handle little-endian unaligned loads/stores correctly.	David S. Miller
	Because we use byte loads/stores to cons up the value in and out of registers, we can't expect the ASI endianness setting to take care of this for us. So do it by hand. This case is triggered by drivers/block/aoe/aoecmd.c in the ataid_complete() function where it goes: /* word 100: number lba48 sectors / ssize = le64_to_cpup((__le64 ) &id[100<<1]); This &id[100<<1] address is 4 byte, rather than 8 byte aligned, thus triggering the unaligned exception. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-14	[LIB]: Consolidate _atomic_dec_and_lock()	David S. Miller
	Several implementations were essentialy a common piece of C code using the cmpxchg() macro. Put the implementation in one spot that everyone can share, and convert sparc64 over to using this. Alpha is the lone arch-specific implementation, which codes up a special fast path for the common case in order to avoid GP reloading which a pure C version would require. Signed-off-by: David S. Miller <davem@davemloft.net>