aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2007-07-21x86_64: lower printk severityDan Aloni
Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: fix typo in acpi_pm.cAlessio Igor Bogani
Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: use the global PIT lockThomas Gleixner
Replace the pcspkr private PIT lock by the global PIT lock to serialize the PIT access all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: During VM oom condition, kill all threads in process groupWill Schmidt
During a VM oom condition, kill all threads in the process group. We have had complaints where a threaded application is left in a bad state after one of it's threads is killed when we hit a VM: out_of_memory condition. Killing just one of the process threads can leave the application in a bad state, whereas killing the entire process group would allow for the application to restart, or otherwise handled, and makes it very obvious that something has gone wrong. This change allows the entire process group to be taken down, rather than just the one thread. Signed-off-by: Will <will_schmidt@vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Move functions declarations to header fileGlauber de Oliveira Costa
Some interrupt entry points are currently defined in i8259.c They probably belong in a header. Right now, their only user is init_IRQ, justifying their declaration in-file. But when virtualization comes in, we may be interested in using that functions in late initializations. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: move the kernel to 16MB for NUMA-QAndy Whitcroft
We are seeing corruption of the decompressed kernel. It is suspected that this is platform specific as it has yet to be seen on any other x86. Move the kernel to the 16MB boundary. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: Remove unneeded test of 'task' in dump_trace()Jesper Juhl
Remove unneeded test of task != NULL from arch/i386/kernel/traps.c::dump_trace() At the start of the function we have this test: if (!task) task = current; so further down there's no need to test 'task'. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: divorce CONFIG_X86_PAE from CONFIG_HIGHMEM64GWilliam Lee Irwin III
PAE is useful for more than supporting more than 4GB RAM. It supports expanded swapspace and NX executable protections. Some users may want NX or expanded swapspace support without the overhead or instability of highmem. For these reasons, the following patch divorces CONFIG_X86_PAE from CONFIG_HIGHMEM64G. Cc: Mark Lord <lkml@rtr.ca> Signed-off-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: HPET, check if the counter worksThomas Gleixner
Some systems have a HPET which is not incrementing, which leads to a complete hang. Detect it during HPET setup. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: DMI_MATCH patch in reboot.c for SFF Dell OptiPlex 745 - fixes hang on ↵James Jarvis
reboot The following patch enables reboot through BIOS on the Dell Optiplex 745 Small Form Factor base, on which reboot hangs. The larger form factor does not require this, hence the match on DMI_BOARD_NAME. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: do not restore reserved memory after hibernationRafael J. Wysocki
On some systems the ACPI NVS area is located in the first 1 MB of RAM and it is overwritten by the i386 code during the restore after hibernation. This confuses the ACPI platform firmware that doesn't update the AC adapter status appropriately as a result (http://bugzilla.kernel.org/show_bug.cgi?id=7995). The solution is to register the reserved memory in the first 1 MB as 'nosave', so that swsusp doesn't touch it during the restore. Also, this has been done on x86_64 for a long time now, so this patch makes the i386 restore code behave like the x86_64 one. [akpm@linux-foundation.org: build fix] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: fix section mismatch warning in intel_cacheinfoSam Ravnborg
Fix following warning: WARNING: arch/i386/kernel/built-in.o(.init.text+0x3818): Section mismatch: reference to .exit.text:cache_remove_dev (between 'cacheinfo_cpu_callback' and 'cache_sysfs_init') It points out that a function marked __cpuexit is calling a function marked __cpuinit => oops. The call happens only in an error-condition which may explain why we have not seen it before. The offending function was not used anywhere else - so marked it __cpuexit. Note: This warning triggers only with a local copy of modpost but that version will soon be pushed out. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: pgd_{c,d}tor() staticAdrian Bunk
pgd_{c,d}tor() can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Calgary - fold in redundant functionsMuli Ben-Yehuda
After the bitmap changes we can get rid of the unlocked versions of calgary_unmap_sg and iommu_free. Fold __calgary_unmap_sg and __iommu_free into their calgary_unmap_sg and iommu_free, respectively. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Calgary - change _map_single, etc to staticYinghai Lu
there function are called via dma_ops->.., so change them to static Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Calgary - tighten up the bitmap lockingMuli Ben-Yehuda
Currently the IOMMU table's lock protects both the bitmap and access to the hardware's TCE table. Access to the TCE table is synchronized through the bitmap; therefore, only hold the lock while modifying the bitmap. This gives a yummy 10-15% reduction in CPU utilization for netperf on a large SMP machine. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Calgary - fix few style problems pointed out by checkpatch.plMuli Ben-Yehuda
No actual code was harmed in the production of this patch. Thanks to Andrew Morton <akpm@linux-foundation.org> for telling me about checkpatch.pl. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: tidy up debug printksMuli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: only reserve the first 1MB of IO space for CalIOC2Muli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: tabify and trim trailing whitespaceMuli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: cleanup of unneeded macrosGuillaume Thouvenin
Cleanup unneeded macros used for register space address calculation. Now we are using the EBDA to find the space address. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: reserve TCEs with the same address as MEM regionsMuli Ben-Yehuda
This works around a bug where DMAs that have the same addresses as some MEM regions do not go through. Not clear yet if this is due to a mis-configuration or something deeper. [akpm@linux-foundation.org: coding style fixlet] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: grab PLSSR too when a DMA error occursMuli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: make dump_error_regs a chip opMuli Ben-Yehuda
Provide seperate versions for Calgary and CalIOC2 Also print out the PCIe Root Complex Status on CalIOC2 errors Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: implement CalIOC2 TCE cache flush sequenceMuli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: add chip_ops and a quirk function for CalIOC2Muli Ben-Yehuda
[akpm@linux-foundation.org>: make calioc2_chip_ops static] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: introduce CalIOC2 supportMuli Ben-Yehuda
CalIOC2 is a PCI-e implementation of the Calgary logic. Most of the programming details are the same, but some differ, e.g., TCE cache flush. This patch introduces CalIOC2 support - detection and various support routines. It's not expected to work yet (but will with follow-on patches). Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: abstract how we find the iommu_table for a deviceMuli Ben-Yehuda
... in preparation for doing it differently for CalIOC2. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: introduce chipset specific opsMuli Ben-Yehuda
Calgary and CalIOC2 share most of the same logic. Introduce struct cal_chipset_ops for quirks and tce flush logic which are [akpm@linux-foundation.org: make calgary_chip_ops static] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: introduce handle_quirks() for various chipset quirksMuli Ben-Yehuda
Move the aic94xx split completion timeout handling there. Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: update copyright noticeMuli Ben-Yehuda
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: generalize calgary_increase_split_completion_timeoutMuli Ben-Yehuda
... will be used by CalIOC2 later Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86: remove support for the Rise CPUAdrian Bunk
The Rise CPUs were only very short-lived, and there are no reports of anyone both owning one and running Linux on it. Googling for the printk string "CPU: Rise iDragon" didn't find any dmesg available online. If it turns out that against all expectations there are actually users reverting this patch would be easy. This patch will make the kernel images smaller by a few bytes for all i386 users. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: check remote IRR bit before migrating level triggered irqEric W. Biederman
On x86_64 kernel, level triggered irq migration gets initiated in the context of that interrupt(after executing the irq handler) and following steps are followed to do the irq migration. 1. mask IOAPIC RTE entry; // write to IOAPIC RTE 2. EOI; // processor EOI write 3. reprogram IOAPIC RTE entry // write to IOAPIC RTE with new destination and // and interrupt vector due to per cpu vector // allocation. 4. unmask IOAPIC RTE entry; // write to IOAPIC RTE Because of the per cpu vector allocation in x86_64 kernels, when the irq migrates to a different cpu, new vector(corresponding to the new cpu) will get allocated. An EOI write to local APIC has a side effect of generating an EOI write for level trigger interrupts (normally this is a broadcast to all IOAPICs). The EOI broadcast generated as a side effect of EOI write to processor may be delayed while the other IOAPIC writes (step 3 and 4) can go through. Normally, the EOI generated by local APIC for level trigger interrupt contains vector number. The IOAPIC will take this vector number and search the IOAPIC RTE entries for an entry with matching vector number and clear the remote IRR bit (indicate EOI). However, if the vector number is changed (as in step 3) the IOAPIC will not find the RTE entry when the EOI is received later. This will cause the remote IRR to get stuck causing the interrupt hang (no more interrupt from this RTE). Current x86_64 kernel assumes that remote IRR bit is cleared by the time IOAPIC RTE is reprogrammed. Fix this assumption by checking for remote IRR bit and if it still set, delay the irq migration to the next interrupt arrival event(hopefully, next time remote IRR bit will get cleared before the IOAPIC RTE is reprogrammed). Initial analysis and patch from Nanhai. Clean up patch from Suresh. Rewritten to be less intrusive, and to contain a big fat comment by Eric. [akpm@linux-foundation.org: fix comments] Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nanhai Zou <nanhai.zou@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Keith Packard <keith.packard@intel.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86: round_jiffies() for i386 and x86-64 non-critical/corrected MCE pollingVenki Pallipadi
This helps to reduce the frequency at which the CPU must be taken out of a lower-power state. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Tim Hockin <thockin@hockin.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: add reference to the argumentsAndrew Morton
Prevent stuff like this: mm/vmalloc.c: In function 'unmap_kernel_range': mm/vmalloc.c:75: warning: unused variable 'start' Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86: Make Alt-SysRq-p display the debug register contentsAlan Stern
This patch (as921) adds code to the show_regs() routine in i386 and x86_64 to print the contents of the debug registers along with all the others. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86: PM_TRACE supportNigel Cunningham
Signed-off-by: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: fix section mismatch warnings in mtrrSam Ravnborg
Following section mismatch warnings were reported by Andrey Borzenkov: WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:amd_init_mtrr from .text between 'mtrr_bp_init' (at offset 0x967a) and 'mtrr_attrib_to_str' WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:cyrix_init_mtrr from .text between 'mtrr_bp_init' (at offset 0x967f) and 'mtrr_attrib_to_str' WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text:centaur_init_mtrr from .text between 'mtrr_bp_init' (at offset 0x9684) and 'mtrr_attrib_to_str' WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text: from .text between 'get_mtrr_state' (at offset 0xa735) and 'generic_get_mtrr' WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text: from .text between 'get_mtrr_state' (at offset 0xa749) and 'generic_get_mtrr' WARNING: arch/i386/kernel/built-in.o - Section mismatch: reference to .init.text: from .text between 'get_mtrr_state' (at offset 0xa770) and 'generic_get_mtrr' It was tracked down to a few functions missing __init tag. Compile tested only. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: fix machine rebootingTruxton Fulton
Commit 59f4e7d572980a521b7bdba74ab71b21f5995538 fixed machine rebooting on Truxton's machine (when no keyboard was present). But it broke it on Lee's machine. The patch reinstates the old (pre-59f4e7d572980a521b7bdba74ab71b21f5995538) code and if that doesn't work out, try the new, post-59f4e7d572980a521b7bdba74ab71b21f5995538 code instead. Cc: Lee Garrett <lee-in-berlin@web.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: mcelog tolerant level cleanupTim Hockin
Background: The MCE handler has several paths that it can take, depending on various conditions of the MCE status and the value of the 'tolerant' knob. The exact semantics are not well defined and the code is a bit twisty. Description: This patch makes the MCE handler's behavior more clear by documenting the behavior for various 'tolerant' levels. It also fixes or enhances several small things in the handler. Specifically: * If RIPV is set it is not safe to restart, so set the 'no way out' flag rather than the 'kill it' flag. * Don't panic() on correctable MCEs. * If the _OVER bit is set *and* the _UC bit is set (meaning possibly dropped uncorrected errors), set the 'no way out' flag. * Use EIPV for testing whether an app can be killed (SIGBUS) rather than RIPV. According to docs, EIPV indicates that the error is related to the IP, while RIPV simply means the IP is valid to restart from. * Don't clear the MCi_STATUS registers until after the panic() path. This leaves the status bits set after the panic() so clever BIOSes can find them (and dumb BIOSes can do nothing). This patch also calls nonseekable_open() in mce_open (as suggested by akpm). Result: Tolerant levels behave almost identically to how they always have, but not it's well defined. There's a slightly higher chance of panic()ing when multiple errors happen (a good thing, IMHO). If you take an MBE and panic(), the error status bits are not cleared. Alternatives: None. Testing: I used software to inject correctable and uncorrectable errors. With tolerant = 3, the system usually survives. With tolerant = 2, the system usually panic()s (PCC) but not always. With tolerant = 1, the system always panic()s. When the system panic()s, the BIOS is able to detect that the cause of death was an MC4. I was not able to reproduce the case of a non-PCC error in userspace, with EIPV, with (tolerant < 3). That will be rare at best. Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: support poll() on /dev/mcelogTim Hockin
Background: /dev/mcelog is typically polled manually. This is less than optimal for situations where accurate accounting of MCEs is important. Calling poll() on /dev/mcelog does not work. Description: This patch adds support for poll() to /dev/mcelog. This results in immediate wakeup of user apps whenever the poller finds MCEs. Because the exception handler can not take any locks, it can not call the wakeup itself. Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is caught at the next return from interrupt or exit from idle, calling the mce_user_notify() routine. This patch also disables the "fake panic" path of the mce_panic(), because it results in printk()s in the exception handler and crashy systems. This patch also does some small cleanup for essentially unused variables, and moves the user notification into the body of the poller, so it is only called once per poll, rather than once per CPU. Result: Applications can now poll() on /dev/mcelog. When an error is logged (whether through the poller or through an exception) the applications are woken up promptly. This should not affect any previous behaviors. If no MCEs are being logged, there is no overhead. Alternatives: I considered simply supporting poll() through the poller and not using TIF_MCE_NOTIFY at all. However, the time between an uncorrectable error happening and the user application being notified is *the*most* critical window for us. Many uncorrectable errors can be logged to the network if given a chance. I also considered doing the MCE poll directly from the idle notifier, but decided that was overkill. Testing: I used an error-injecting DIMM to create lots of correctable DRAM errors and verified that my user app is woken up in sync with the polling interval. I also used the northbridge to inject uncorrectable ECC errors, and verified (printk() to the rescue) that the notify routine is called and the user app does wake up. I built with PREEMPT on and off, and verified that my machine survives MCEs. [wli@holomorphy.com: build fix] Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: William Irwin <bill.irwin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: O_EXCL on /dev/mcelogTim Hockin
Background: /dev/mcelog is a clear-on-read interface. It is currently possible for multiple users to open and read() the device. Users are protected from each other during any one read, but not across reads. Description: This patch adds support for O_EXCL to /dev/mcelog. If a user opens the device with O_EXCL, no other user may open the device (EBUSY). Likewise, any user that tries to open the device with O_EXCL while another user has the device will fail (EBUSY). Result: Applications can get exclusive access to /dev/mcelog. Applications that do not care will be unchanged. Alternatives: A simpler choice would be to only allow one open() at all, regardless of O_EXCL. Testing: I wrote an application that opens /dev/mcelog with O_EXCL and observed that any other app that tried to open /dev/mcelog would fail until the exclusive app had closed the device. Caveats: None. Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: insert unclaimed MMCONFIG resourcesAaron Durbin
Insert the unclaimed MMCONFIG resources into the resource tree without the IORESOURCE_BUSY flag during late initialization. This allows the MMCONFIG regions to be visible in the iomem resource tree without interfering with other system resources that were discovered during PCI initialization. [akpm@linux-foundation.org: nanofixes] Signed-off-by: Aaron Durbin <adurbin@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: fake apicid_to_node mapping for fake numaDavid Rientjes
When we are in the emulated NUMA case, we need to make sure that all existing apicid_to_node mappings that point to real node ID's now point to the equivalent fake node ID's. If we simply iterate over all apicid_to_node[] members for each node, we risk remapping an entry if it shares a node ID with a real node. Since apicid's may not be consecutive, we're forced to create an automatic array of apicid_to_node mappings and then copy it over once we have finished remapping fake to real nodes. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: fake pxm-to-node mapping for fake numaDavid Rientjes
For NUMA emulation, our SLIT should represent the true NUMA topology of the system but our proximity domain to node ID mapping needs to reflect the emulated state. When NUMA emulation has successfully setup fake nodes on the system, a new function, acpi_fake_nodes() is called. This function determines the proximity domain (_PXM) for each true node found on the system. It then finds which emulated nodes have been allocated on this true node as determined by its starting address. The node ID to PXM mapping is changed so that each fake node ID points to the PXM of the true node that it is located on. If the machine failed to register a SLIT, then we assume there is no special requirement for emulated node affinity so we use the default LOCAL_DISTANCE, which is newly exported to this code, as our measurement if the emulated nodes appear in the same PXM. Otherwise, we use REMOTE_DISTANCE. PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we can compare node_to_pxm() results in generic code (in this case, the SRAT code). Cc: Len Brown <lenb@kernel.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: extract helper function from e820_register_active_regionsDavid Rientjes
The logic in e820_find_active_regions() for determining the true active regions for an e820 entry given a range of PFN's is needed for e820_hole_size() as well. e820_hole_size() is called from the NUMA emulation code to determine the reserved area within an address range on a per-node basis. Its logic should duplicate that of finding active regions in an e820 entry because these are the only true ranges we may register anyway. [akpm@linux-foundation.org: cleanup] Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21x86_64: Quicklist support for x86_64Christoph Lameter
This adds caching of pgds and puds, pmds, pte. That way we can avoid costly zeroing and initialization of special mappings in the pgd. A second quicklist is useful to separate out PGD handling. We can carry the initialized pgds over to the next process needing them. Also clean up the pgd_list handling to use regular list macros. There is no need anymore to avoid the lru field. Move the add/removal of the pgds to the pgdlist into the constructor / destructor. That way the implementation is congruent with i386. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Luck, Tony" <tony.luck@intel.com> Acked-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: timer_irq_works() static againAdrian Bunk
timer_irq_works() needlessly became global. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21i386: arch/i386/kernel/i8253.c should #include <asm/timer.h>Adrian Bunk
Every file should include the headers containing the prototypes for its global functions. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>