Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2005-06-23	[PATCH] kprobes: Temporary disarming of reentrant probe for x86_64	Prasanna S Panchamukhi
	This patch includes x86_64 architecture specific changes to support temporary disarming on reentrancy of probes. Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] Move kprobe [dis]arming into arch specific code	Rusty Lynch
	The architecture independent code of the current kprobes implementation is arming and disarming kprobes at registration time. The problem is that the code is assuming that arming and disarming is a just done by a simple write of some magic value to an address. This is problematic for ia64 where our instructions look more like structures, and we can not insert break points by just doing something like: p->addr = BREAKPOINT_INSTRUCTION; The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent functions: void arch_arm_kprobe(struct kprobe p) void arch_disarm_kprobe(struct kprobe *p) and then adds the new functions for each of the architectures that already implement kprobes (spar64/ppc64/i386/x86_64). I thought arch_[dis]arm_kprobe was the most descriptive of what was really happening, but each of the architectures already had a disarm_kprobe() function that was really a "disarm and do some other clean-up items as needed when you stumble across a recursive kprobe." So... I took the liberty of changing the code that was calling disarm_kprobe() to call arch_disarm_kprobe(), and then do the cleanup in the block of code dealing with the recursive kprobe case. So far this patch as been tested on i386, x86_64, and ppc64, but still needs to be tested in sparc64. Signed-off-by: Rusty Lynch <rusty.lynch@intel.com> Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] x86_64 specific function return probes	Rusty Lynch
	The following patch adds the x86_64 architecture specific implementation for function return probes. Function return probes is a mechanism built on top of kprobes that allows a caller to register a handler to be called when a given function exits. For example, to instrument the return path of sys_mkdir: static int sys_mkdir_exit(struct kretprobe_instance i, struct pt_regs regs) { printk("sys_mkdir exited\n"); return 0; } static struct kretprobe return_probe = { .handler = sys_mkdir_exit, }; <inside setup function> return_probe.kp.addr = (kprobe_opcode_t ) kallsyms_lookup_name("sys_mkdir"); if (register_kretprobe(&return_probe)) { printk(KERN_DEBUG "Unable to register return probe!\n"); / do error path / } <inside cleanup function> unregister_kretprobe(&return_probe); The way this works is that: At system initialization time, kernel/kprobes.c installs a kprobe on a function called kretprobe_trampoline() that is implemented in the arch/x86_64/kernel/kprobes.c (More on this later) * When a return probe is registered using register_kretprobe(), kernel/kprobes.c will install a kprobe on the first instruction of the targeted function with the pre handler set to arch_prepare_kretprobe() which is implemented in arch/x86_64/kernel/kprobes.c. * arch_prepare_kretprobe() will prepare a kretprobe instance that stores: - nodes for hanging this instance in an empty or free list - a pointer to the return probe - the original return address - a pointer to the stack address With all this stowed away, arch_prepare_kretprobe() then sets the return address for the targeted function to a special trampoline function called kretprobe_trampoline() implemented in arch/x86_64/kernel/kprobes.c * The kprobe completes as normal, with control passing back to the target function that executes as normal, and eventually returns to our trampoline function. * Since a kprobe was installed on kretprobe_trampoline() during system initialization, control passes back to kprobes via the architecture specific function trampoline_probe_handler() which will lookup the instance in an hlist maintained by kernel/kprobes.c, and then call the handler function. * When trampoline_probe_handler() is done, the kprobes infrastructure single steps the original instruction (in this case just a top), and then calls trampoline_post_handler(). trampoline_post_handler() then looks up the instance again, puts the instance back on the free list, and then makes a long jump back to the original return instruction. So to recap, to instrument the exit path of a function this implementation will cause four interruptions: - A breakpoint at the very beginning of the function allowing us to switch out the return address - A single step interruption to execute the original instruction that we replaced with the break instruction (normal kprobe flow) - A breakpoint in the trampoline function where our instrumented function returned to - A single step interruption to execute the original instruction that we replaced with the break instruction (normal kprobe flow) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] xen: x86_64: use more usermode macro	Vincent Hanquez
	Make use of the user_mode macro where it's possible. This is useful for Xen because it will need only to redefine only the macro to a hypervisor call. Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] xen: x86_64: Add macro for debugreg	Vincent Hanquez
	Add 2 macros to set and get debugreg on x86_64. This is useful for Xen because it will need only to redefine each macro to a hypervisor call. Signed-off-by: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk> Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] x86_64: avoid wasting IRQs	Natalie Protasevich
	I suggest to change the way IRQs are handed out to PCI devices. Currently, each I/O APIC pin gets associated with an IRQ, no matter if the pin is used or not. It is expected that each pin can potentually be engaged by a device inserted into the corresponding PCI slot. However, this imposes severe limitation on systems that have designs that employ many I/O APICs, only utilizing couple lines of each, such as P64H2 chipset. It is used in ES7000, and currently, there is no way to boot the system with more that 9 I/O APICs. The simple change below allows to boot a system with say 64 (or more) I/O APICs, each providing 1 slot, which otherwise impossible because of the IRQ gaps created for unused lines on each I/O APIC. It does not resolve the problem with number of devices that exceeds number of possible IRQs, but eases up a tension for IRQs on any large system with potentually large number of devices. I only implemented this for the ACPI boot, since if the system is this big and using newer chipsets it is probably (better be!) an ACPI based system :). The change is completely "mechanical" and does not alter any internal structures or interrupt model/implementation. The patch works for both i386 and x86_64 archs. It works with MSIs just fine, and should not intervene with implementations like shared vectors, when they get worked out and incorporated. To illustrate, below is the interrupt distribution for 2-cell ES7000 with 20 I/O APICs, and an Ethernet card in the last slot, which should be eth1 and which was not configured because its IRQ exceeded allowable number (it actially turned out huge - 480!): zorro-tb2:~ # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 65716 30012 30007 30002 30009 30010 30010 30010 IO-APIC-edge timer 4: 373 0 725 280 0 0 0 0 IO-APIC-edge serial 8: 0 0 0 0 0 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 0 0 0 0 IO-APIC-level acpi 14: 39 3 0 0 0 0 0 0 IO-APIC-edge ide0 16: 108 13 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb1 18: 0 0 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb3 19: 15 0 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb2 23: 3 0 0 0 0 0 0 0 IO-APIC-level ehci_hcd:usb4 96: 4240 397 18 0 0 0 0 0 IO-APIC-level aic7xxx 97: 15 0 0 0 0 0 0 0 IO-APIC-level aic7xxx 192: 847 0 0 0 0 0 0 0 IO-APIC-level eth0 NMI: 0 0 0 0 0 0 0 0 LOC: 273423 274528 272829 274228 274092 273761 273827 273694 ERR: 7 MIS: 0 Even though the system doesn't have that many devices, some don't get enabled only because of IRQ numbering model. This is the IRQ picture after the patch was applied: zorro-tb2:~ # cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 44169 10004 10004 10001 10004 10003 10004 6135 IO-APIC-edge timer 4: 345 0 0 0 0 244 0 0 IO-APIC-edge serial 8: 0 0 0 0 0 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 0 0 0 0 IO-APIC-level acpi 14: 39 0 3 0 0 0 0 0 IO-APIC-edge ide0 17: 4425 0 9 0 0 0 0 0 IO-APIC-level aic7xxx 18: 15 0 0 0 0 0 0 0 IO-APIC-level aic7xxx, uhci_hcd:usb3 21: 231 0 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb1 22: 26 0 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb2 23: 3 0 0 0 0 0 0 0 IO-APIC-level ehci_hcd:usb4 24: 348 0 0 0 0 0 0 0 IO-APIC-level eth0 25: 6 192 0 0 0 0 0 0 IO-APIC-level eth1 NMI: 0 0 0 0 0 0 0 0 LOC: 107981 107636 108899 108698 108489 108326 108331 108254 ERR: 7 MIS: 0 Not only we see the card in the last I/O APIC, but we are not even close to using up available IRQs, since we didn't waste any. Signed-off-by: Natalie Protasevich <Natalie.Protasevich@unisys.com> Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] x86_64: never block forced SIGSEGV	Roland McGrath
	This is the x86_64 version of the signal fix I just posted for i386. This problem was first noticed on PPC and has already been fixed there. But the exact same issue applies to other platforms in the same way. The signal blocking for sa_mask and the handled signal takes place after the handler setup. When the stack is bogus, the handler setup forces a SIGSEGV. But then this will be blocked, and returning to user mode will fault again and iterate. This patch fixes the problem by checking whether signal handler setup failed, and not doing the signal-blocking if so. This copies what was done in the ppc code. I think all architectures' signal handler setup code follows this pattern and needs the change. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] x86_64: fix hpet for systems that don't support legacy replacement	john stultz
	Currently the x86-64 HPET code assumes the entire HPET implementation from the spec is present. This breaks on boxes that do not implement the optional legacy timer replacement functionality portion of the spec. This patch fixes this issue, allowing x86-64 systems that cannot use the HPET for the timer interrupt and RTC to still use the HPET as a time source. I've tested this patch on a system systems without HPET, with HPET but without legacy timer replacement, as well as HPET with legacy timer replacement. This version adds a minor check to cap the HPET counter value in gettimeoffset_hpet to avoid possible time inconsistencies. Please ignore the A2 version I sent to you earlier. Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] x86_64: i8259.c iso99 structure initialization	Alexander Nyberg
	Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] i386: Selectable Frequency of the Timer Interrupt	Christoph Lameter
	Make the timer frequency selectable. The timer interrupt may cause bus and memory contention in large NUMA systems since the interrupt occurs on each processor HZ times per second. Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] allow early printk to use more than 25 lines	Jan Beulich
	Allow early printk code to take advantage of the full size of the screen, not just the first 25 lines. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] x86/x86_64: pcibus_to_node	Christoph Lameter
	Define pcibus_to_node to be able to figure out which NUMA node contains a given PCI device. This defines pcibus_to_node(bus) in include/linux/topology.h and adjusts the macros for i386 and x86_64 that already provided a way to determine the cpumask of a pci device. x86_64 was changed to not build an array of cpumasks anymore. Instead an array of nodes is build which can be used to generate the cpumask via node_to_cpumask. Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] Platform SMIs and their interferance with tsc based delay calibration	Venkatesh Pallipadi
	Issue: Current tsc based delay_calibration can result in significant errors in loops_per_jiffy count when the platform events like SMIs (System Management Interrupts that are non-maskable) are present. This could lead to potential kernel panic(). This issue is becoming more visible with 2.6 kernel (as default HZ is 1000) and on platforms with higher SMI handling latencies. During the boot time, SMIs are mostly used by BIOS (for things like legacy keyboard emulation). Description: The psuedocode for current delay calibration with tsc based delay looks like (0) Estimate a value for loops_per_jiffy (1) While (loops_per_jiffy estimate is accurate enough) (2) wait for jiffy transition (jiffy1) (3) Note down current tsc (tsc1) (4) loop until tsc becomes tsc1 + loops_per_jiffy (5) check whether jiffy changed since jiffy1 or not and refine loops_per_jiffy estimate Consider the following cases Case 1: If SMIs happen between (2) and (3) above, we can end up with a loops_per_jiffy value that is too low. This results in shorted delays and kernel can panic () during boot (Mostly at IOAPIC timer initialization timer_irq_works() as we don't have enough timer interrupts in a specified interval). Case 2: If SMIs happen between (3) and (4) above, then we can end up with a loops_per_jiffy value that is too high. And with current i386 code, too high lpj value (greater than 17M) can result in a overflow in delay.c:__const_udelay() again resulting in shorter delay and panic(). Solution: The patch below makes the calibration routine aware of asynchronous events like SMIs. We increase the delay calibration time and also identify any significant errors (greater than 12.5%) in the calibration and notify it to user. Patch below changes both i386 and x86-64 architectures to use this new and improved calibrate_delay_direct() routine. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] use ${CROSS_COMPILE}installkernel in arch/*/boot/install.sh	Ian Campbell
	The attached patch causes the various arch specific install.sh scripts to look for ${CROSS_COMPILE}installkernel rather than just installkernel (in both /sbin/ and ~/bin/ where the script already did this). This allows you to have e.g. arm-linux-installkernel as a handy way to install on your cross target. It also prevents the script picking up on the host /sbin/installkernel which causes the script to fall through and do the install itself (which is what I actually use myself, with $INSTALL_PATH set). I don't believe it causes back-compatibility problems since calling the host installkernel was never likely to work or be what you wanted when cross compiling anyway. If $CROSS_COMPILE isn't set then nothing changes. I only use ARM and i386 myself but I figured it couldn't hurt to do the whole lot. I've cc'd those who I hope are the arch maintainers for files that I've touched. Signed-off-by: Ian Campbell <icampbell@arcom.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] add x86-64 specific support for sparsemem	Matt Tolentino
	This patch adds in the necessary support for sparsemem such that x86-64 kernels may use sparsemem as an alternative to discontigmem for NUMA kernels. Note that this does no preclude one from continuing to build NUMA kernels using discontigmem, but merely allows the option to build NUMA kernels with sparsemem. Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels results in reduced text size for otherwise equivalent kernels as shown in the example builds below: text data bss dec hex filename 2371036 765884 1237108 4374028 42be0c vmlinux.discontig 2366549 776484 1302772 4445805 43d66d vmlinux.sparse Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options	Matt Tolentino
	In order to use the alternative sparsemem implmentation for NUMA kernels, we need to reorganize the config options. This patch effectively abstracts out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases. Thus, the discontigmem implementation may be employed as always, but the sparsemem implementation may be used alternatively. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] add x86-64 Kconfig options for sparsemem	Matt Tolentino
	Add the requisite arch specific Kconfig options to enable the use of the sparsemem implementation for NUMA kernels on x86-64. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] remove direct ref to contig_page_data for x86-64	Matt Tolentino
	This patch pulls out all remaining direct references to contig_page_data from arch/x86-64, thus saving an ifdef in one case. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-23	[PATCH] make each arch use mm/Kconfig	Dave Hansen
	For all architectures, this just means that you'll see a "Memory Model" choice in your architecture menu. For those that implement DISCONTIGMEM, you may eventually want to make your ARCH_DISCONTIGMEM_ENABLE a "def_bool y" and make your users select DISCONTIGMEM right out of the new choice menu. The only disadvantage might be if you have some specific things that you need in your help option to explain something about DISCONTIGMEM. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] Avoiding mmap fragmentation	Wolfgang Wander
	Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: Wolfgang Wander <wwc@rentec.com> Credit-to: "Richard Purdie" <rpurdie@rpsys.net> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> (partly) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] smp_processor_id() cleanup	Ingo Molnar
	This patch implements a number of smp_processor_id() cleanup ideas that Arjan van de Ven and I came up with. The previous __smp_processor_id/_smp_processor_id/smp_processor_id API spaghetti was hard to follow both on the implementational and on the usage side. Some of the complexity arose from picking wrong names, some of the complexity comes from the fact that not all architectures defined __smp_processor_id. In the new code, there are two externally visible symbols: - smp_processor_id(): debug variant. - raw_smp_processor_id(): nondebug variant. Replaces all existing uses of _smp_processor_id() and __smp_processor_id(). Defined by every SMP architecture in include/asm-*/smp.h. There is one new internal symbol, dependent on DEBUG_PREEMPT: - debug_smp_processor_id(): internal debug variant, mapped to smp_processor_id(). Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new lib/smp_processor_id.c file. All related comments got updated and/or clarified. I have build/boot tested the following 8 .config combinations on x86: {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT} I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other architectures are untested, but should work just fine.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes	Suresh Siddha
	Appended patch will setup compatibility mode TASK_SIZE properly. This will fix atleast three known bugs that can be encountered while running compatibility mode apps. a) A malicious 32bit app can have an elf section at 0xffffe000. During exec of this app, we will have a memory leak as insert_vm_struct() is not checking for return value in syscall32_setup_pages() and thus not freeing the vma allocated for the vsyscall page. And instead of exec failing (as it has addresses > TASK_SIZE), we were allowing it to succeed previously. b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area may return addresses beyond 32bits, ultimately causing corruption because of wrap-around and resulting in SEGFAULT, instead of returning ENOMEM. c) 32bit app doing this below mmap will now fail. mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ\|PROT_WRITE, MAP_FIXED\|MAP_PRIVATE\|MAP_ANON, 0, 0); Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-08	[PATCH] revert x86_64-use-the-e820-hole-to-map-the-iommu-agp-aperture	Andrew Morton
	Martin Bligh determined that this patch is causing his test box to not boot. Revert. Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-31	[PATCH] x86_64 CONFIG_ACPI=n build fix	Andi Kleen
	Make CONFIG_X86_PM_TIMER dependent on CONFIG_ACPI Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-31	[PATCH] x86_64: More fixes for compilation without CONFIG_ACPI	Andi Kleen
	Suggested by Alexander Nyberg Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-28	[PATCH] x86_64: signal.c build fix	Oliver Korpilla
	For unspecified reasons, arch/x86_64/kernel/signal.c apparently needs ia32_unistd.h. Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-27	[PATCH] Note on ACPI build fix	Alexander Nyberg
	Even after the previous fix you can still set CONFIG_ACPI_BOOT indirectly even without CONFIG_ACPI by choosing CONFIG_PCI and CONFIG_PCI_MMCONFIG. That doesn't build very well either. This makes PCI_MMCONFIG depend on ACPI, fixing that hole. [ I guess in theory Kconfig could follow the whole chain of dependencies for things that get selected, but that sounds insanely complicated, so we'll just fix up these things by hand. --Linus ] Signed-off-by: Alexander Nyberg <alexn@telia.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-25	[PATCH] x86_64: CONFIG_BUG=n fixes	Alexander Nyberg
	Fixes some !CONFIG_BUG warnings: include/asm/mmu_context.h: I funktion `switch_mm': include/asm/mmu_context.h:57: varning: implicit declaration of function `out_of_line_bug' Signed-off-by: Alexander Nyberg <alexn@telia.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-20	[PATCH] x86_64: i386/x86-64: Export cpu_core_map	Andi Kleen
	Needed for the powernow k8 driver for dual core support. Signed-off-by: Andi Kleen <ak@suse.de> Cc: <mark.langsdorf@amd.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-20	[PATCH] x86_64: Add option to disable timer check	Andi Kleen
	This works around the too fast timer seen on some ATI boards. I don't feel confident enough about it yet to enable it by default, but give users the option. Patch and debugging from Christopher Allen Wing <wingc@engin.umich.edu>, with minor tweaks (renamed the option and documented it) Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-20	[PATCH] x86_64: Fix 32bit system call restart	Andi Kleen
	The test case at http://cvs.sourceforge.net/viewcvs.py/posixtest/posixtestsuite/conforman ce/interfaces/clock_nanosleep/1-5.c fails if it runs as a 32bit process on x86_86 machines. The root cause is the sub 32bit process fails to restart the syscall after it is interrupted by a signal. The syscall number of sys_restart_syscall in table sys_call_table is __NR_restart_syscall (219) while it's __NR_ia32_restart_syscall (0) in ia32_sys_call_table. When regs->rax==(unsigned long)-ERESTART_RESTARTBLOCK, function do_signal doesn't distinguish if the process is 64bit or 32bit, and always sets restart syscall number as __NR_restart_syscall (219). Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-20	[PATCH] x86_64: Fixed guard page handling again in iounmap	Andi Kleen
	Caused oopses again. Also fix potential mismatch in checking if change_page_attr was needed. To do it without races I needed to change mm/vmalloc.c to export a __remove_vm_area that does not take vmlist lock. Noticed by Terence Ripperda and based on a patch of his. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-20	[PATCH] x86_64: Don't allow accesses below register frame in ptrace	Andi Kleen
	There was a "off by one quad word" error in there. I don't think it is exploitable because it will only store into a unused area, but better to plug it. Found and fixed by John Blackwood Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-20	[PATCH] x86_64: 386/x86-64 Further AMD dual core fixes	Andi Kleen
	- Remove duplicated ifdef - Make core_id match what Intel uses - Initialize phys_proc_id correctly for non DC case - Handle non power of two core numbers. Fixes for both i386 and x86-64 Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Update defconfig	Andi Kleen
	Update defconfig Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Don't assume BSP has ID 0 in new smp bootup	Andi Kleen
	This patch removes the assumption that LAPIC entries contain the BSP as its first entry. This is a slight improvement to the temporary fix submitted by Suresh Siddha. - Removes assumption that LAPIC entries contain BSP first. - Builds x86_acpiid_to_apicid[] and bios_cpu_apicid[] properly with BSP as first entry. - Made maxcpus=1 boot on these systems. Since the parsing earlier in arch/x86_64/kernel/mpparse.c stopped after maxcpus entries, other entries were not processed, this causes kernel not to boot on these systems. TBD: x86_acpiid_to_apicid and bios_cpu_apicid[] seem to be exactly the same. This could be removed, but might need more work to cleanup. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Collected NMI watchdog fixes.	Andi Kleen
	Collected NMI watchdog fixes. - Fix call of check_nmi_watchdog - Remove earlier move of check_nmi_watchdog to later. It does not fix the race it was supposed to fix fully. - Remove unused P6 definitions - Add support for performance counter based watchdog on P4 systems. This allows to run it only once per second, which saves some CPU time. Previously it would run at 1000Hz, which was too much. Code ported from i386 Make this the default on Intel systems. - Use check_nmi_watchdog with local APIC based nmi - Fix race in touch_nmi_watchdog - Fix bug that caused incorrect performance counters to be programmed in a few cases on K8. - Remove useless check for local APIC - Use local_t and per_cpu variables for per CPU data. - Keep other CPUs busy during check_nmi_watchdog to make sure they really tick when in lapic mode. - Only check CPUs that are actually online. - Various other fixes. - Fix fallback path when MSRs are unimplemented Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Make vsyscall.c compile without CONFIG_SYSCTL	Andi Kleen
	Originally from Matt Tolentino Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Fix OEM hpet check	Suresh Siddha
	Use bitmap_zero instead of bitmap_empty to initialise cpu mask This makes it actually run reliable instead of relying on stack state. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: When checking vmalloc mappings don't use pte_page	Andi Kleen
	The PTEs can point to ioremap mappings too, and these are often outside mem_map. The NUMA hash page lookup functions cannot handle out of bounds accesses properly. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Fix canonical checking for segment registers in ptrace	Andi Kleen
	Allowed user programs to set a non canonical segment base, which would cause oopses in the kernel later. Credit-to: Alexander Nyberg <alexn@dsv.su.se> For identifying and reporting this bug. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: check if ptrace RIP is canonical	Andi Kleen
	This works around an AMD Erratum. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Add pmtimer support	Andi Kleen
	There are unfortunately more and more multi processor Opteron systems which don't have HPET timer support in the southbridge. This covers in particular Nvidia and VIA chipsets. They also don't guarantee that the TSCs are synchronized between CPUs; and especially with MP powernow the systems are nearly unusable because the time gets very inconsistent between CPUs. The timer code for x86-64 was originally written under the assumption that we could fall back to the HPET timer on such systems. But this doesn't work there. Another alternative is to use the ACPI PM timer as primary time source. This patch does that. The kernel only uses PM timer when there is no other choice because it has some disadvantages. Ported over from i386. It should be faster than the i386 version because I dropped the "read three times" workaround, but is still considerable slower than HPET and also does not work together with vsyscalls which have to be disabled. Cc: <mark.langsdorf@amd.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Remove unique APIC/IO-APIC ID check	Andi Kleen
	It is unnecessary on modern Intel or AMD systems, and that is all we support on x86-64 Also causes problems on various systems Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Don't print the internal k8c+ flag in /proc/cpuinfo	Andi Kleen
	It is not very useful to the user and more an kernel internal implementation detail. So hide it. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Remove x86_apicid field	Andi Kleen
	Remove x86_apicid field Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Update TSC sync algorithm	Andi Kleen
	The new TSC sync algorithm recently submitted did not work too well. The result was that some MP machines where the TSC came up of the BIOS very unsynchronized and that did not have HPET support were nearly unusable because the time would jump forwards and backwards between CPUs. After a lot of research ;-) and some more prototypes I ended up with just using the one from IA64 which looks best. It has some internal self tuning that should adapt to changing interconnect latencies. It holds up in my tests so far. I believe it was originally written by David Mosberger, I just ported it over to x86-64. See the inline comment for a description. This cleans up the code because it uses smp_call_function for syncing instead of having custom hooks in SMP bootup. Please note that the cycle numbers it outputs are too optimistic because they do not take into account the latency of WRMSR and RDTSC, which can be hundreds of cycles. It seems to be able to sync a dual Opteron to 200-300 cycles, which is probably good enough. There is a timing window during AP bootup where interrupts can see inconsistent time before the TSC is synced. It is hard to avoid unfortunately because we can only do the TSC sync after some setup, and we need to enable interrupts before that. I just ignored it for now. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Don't look up struct page pointer of physical address in iounmap	Andi Kleen
	It could be in a memory hole not mapped in mem_map and that causes the hash lookup to go off to nirvana. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64/i386: fix defaults for physical/core id in /proc/cpuinfo	Andi Kleen
	Last round hopefully of cpu_core_id changes hopefully fow now: - Always initialize cpu_core_id for all CPUs, even when no dual core setup is detected. This prevents funny /proc/cpuinfo output - Do the same with phys_proc_id[] even when no HyperThreading - dito. - Use the CPU APIC-ID from CPUID 1 instead of the linux virtual CPU number to identify the core for AMD dual core setups. Patch for i386/x86-64. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17	[PATCH] x86_64: Readd missing tests in entry.S	Andi Kleen
	Cleans up the system exit call slightly and synchronizes with my tree again. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>