Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2007-09-26	Revert "x86-64: Disable local APIC timer use on AMD systems with C1E"	Linus Torvalds
	This reverts commit e66485d747505e9d960b864fc6c37f8b2afafaf0, since Rafael Wysocki noticed that the change only works for his in -mm, not in mainline (and that both "noapictimer" _and_ "apicmaintimer" are broken on his hardware, but that's apparently not a regression, just a symptom of the same issue that causes the automatic apic timer disable to not work). It turns out that it really doesn't work correctly on x86-64, since x86-64 doesn't use the generic clock events for timers yet. Thanks to Rafal for testing, and here's the ugly details on x86-64 as per Thomas: "I just looked into the code and the logic vs. noapictimer on SMP is completely broken. On i386 the noapictimer option not only disables the local APIC timer, it also registers the CPUs for broadcasting via IPI on SMP systems. The x86-64 code uses the broadcast only when the local apic timer is active, i.e. "noapictimer" is not on the command line. This defeats the whole purpose of "noapictimer". It should be there to make boxen work, where the local APIC timer actually has a hardware problem, e.g. the nx6325. The current implementation of x86_64 only fixes the ACPI c-states related problem where the APIC timer stops in C3(2), nothing else. On nx6325 and other AMD X2 equipped systems which have the C1E enabled we run into the following: PIT keeps jiffies (and the system) running, but the local APIC timer interrupts can get out of sync due to this C1E effect. I don't think this is a critical problem, but it is wrong nevertheless. I think it's safe to revert the C1E patch and postpone the fix to the clock events conversion." On further reflection, Thomas noted: "It's even worse than I thought on the first check: "noapictimer" on the command line of an SMP box prevents _ONLY_ the boot CPU apic timer from being used. But the secondary CPU is still unconditionally setting up the APIC timer and uses the non calibrated variable calibration_result, which is of course 0, to setup the APIC timer. Wreckage guaranteed." so we'll just have to wait for the x86 merge to hopefully fix this up for x86-64. Tested-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26	x86-64: Disable local APIC timer use on AMD systems with C1E	Thomas Gleixner
	commit 3556ddfa9284a86a59a9b78fe5894430f6ab4eef titled [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E solves a problem with AMD dual core laptops e.g. HP nx6325 (Turion 64 X2) with C1E enabled: When both cores go into idle at the same time, then the system switches into C1E state, which is basically the same as C3. This stops the local apic timer. This was debugged right after the dyntick merge on i386 and despite the patch title it fixes only the 32 bit path. x86_64 is still missing this fix. It seems that mainline is not really affected by this issue, as the PIT is running and keeps jiffies incrementing, but that's just waiting for trouble. -mm suffers from this problem due to the x86_64 high resolution timer patches. This is a quick and dirty port of the i386 code to x86_64. I spent quite a time with Rafael to debug the -mm / hrt wreckage until someone pointed us to this. I really had forgotten that we debugged this half a year ago already. Sigh, is it just me or is there something yelling arch/x86 into my ear? Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-21	Revert "x86_64: Quicklist support for x86_64"	Linus Torvalds
	This reverts commit 34feb2c83beb3bdf13535a36770f7e50b47ef299. Suresh Siddha points out that this one breaks the fundamental requirement that you cannot free page table pages before the TLB caches are flushed. The quicklists do not give the same kinds of guarantees that the mmu_gather structure does, at least not in NUMA configurations. Requested-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <clameter@sgi.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-24	Pull bugzilla-1641 into release branch	Len Brown

2007-08-21	ACPI: boot correctly with "nosmp" or "maxcpus=0"	Len Brown
	In MPS mode, "nosmp" and "maxcpus=0" boot a UP kernel with IOAPIC disabled. However, in ACPI mode, these parameters didn't completely disable the IO APIC initialization code and boot failed. init/main.c: Disable the IO_APIC if "nosmp" or "maxcpus=0" undefine disable_ioapic_setup() when it doesn't apply. i386: delete ioapic_setup(), it was a duplicate of parse_noapic() delete undefinition of disable_ioapic_setup() x86_64: rename disable_ioapic_setup() to parse_noapic() to match i386 define disable_ioapic_setup() in header to match i386 http://bugzilla.kernel.org/show_bug.cgi?id=1641 Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>
2007-08-18	x86_64: Fix to keep watchdog disabled by default for i386/x86_64	Daniel Gollub
	Fixed wrong expression which enabled watchdogs even if nmi_watchdog kernel parameter wasn't set. This regression got slightly introduced with commit b7471c6da94d30d3deadc55986cc38d1ff57f9ca. Introduced NMI_DISABLED (-1) which allows to switch the value of NMI_DEFAULT without breaking the APIC NMI watchdog code (again). Fixes: https://bugzilla.novell.com/show_bug.cgi?id=298084 http://bugzilla.kernel.org/show_bug.cgi?id=7839 And likely some more nmi_watchdog=0 related issues. Signed-off-by: Daniel Gollub <dgollub@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-11	finish i386 and x86-64 sysdata conversion	Muli Ben-Yehuda
	This patch finishes the i386 and x86-64 ->sysdata conversion and hopefully also fixes Riku's and Andy's observed bugs. It is based on Yinghai Lu's and Andy Whitcroft's patches (thanks!) with some changes: - introduce pci_scan_bus_with_sysdata() and use it instead of pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will allocate the sysdata structure and then call pci_scan_bus(). - always allocate pci_sysdata dynamically. The whole point of this sysdata work is to make it easy to do root-bus specific things (e.g., support PCI domains and IOMMU's). I dislike using a default struct pci_sysdata in some places and a dynamically allocated pci_sysdata elsewhere - the potential for someone indavertantly changing the default structure is too high. - this patch only makes the minimal changes necessary, i.e., the NUMA node is always initialized to -1. Patches to do the right thing with regards to the NUMA node can build on top of this (either add a 'node' parameter to pci_scan_bus_with_sysdata() or just update the node when it becomes known). The patch was compile tested with various configurations (e.g., NUMAQ, VISWS) and run-time tested on i386 and x86-64. Unfortunately none of my machines exhibited the bugs so caveat emptor. Andy, could you please see if this fixes the NUMA issues you've seen? Riku, does this fix "pci=noacpi" on your laptop? Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Andi Kleen <ak@suse.de> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: <riku.seppala@kymp.net> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31	revert "x86, serial: convert legacy COM ports to platform devices"	Andrew Morton
	Revert 7e92b4fc345f5b6f57585fbe5ffdb0f24d7c9b26. It broke Sébastien Dugué's machine and Jeff said (persuasively) This seems like it will break decades-long-working stuff, in favor of breaking new ground in our favorite area, "trusting the BIOS." It's just not worth it for serial ports, IMO. Serial ports are something that just shouldn't break at this late stage in the game. My new Intel platform boxes don't even have serial ports, so I question the value of messing with serial port probing even more... because... just wait a year, and your box won't have a serial port either! :) I certainly don't object to the use of platform devices (or isa_driver), but the probe change seems questionable. That's sorta analagous to rewriting the floppy driver probe routine. Sure you could do it... but why risk all that damage and go through debugging all over again? It seems clear from this report that we cannot, should not, trust BIOS for something (a) so simple and (b) that has been working for over a decade. Much discussion ensued and we've decided to have another go at all of this. Cc: Sébastien Dugué <sebastien.dugue@bull.net> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Len Brown <lenb@kernel.org> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Jeff Garzik <jeff@garzik.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Cc: Sascha Sommer <saschasommer@freenet.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31	remove unused TIF_NOTIFY_RESUME flag	Stephane Eranian
	Remove unused TIF_NOTIFY_RESUME flag for all processor architectures. The flag was not used excecpt on IA-64 where the patch replaces it with TIF_PERFMON_WORK. Signed-off-by: stephane eranian <eranian@hpl.hp.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-31	x86-64: Calgary: fix section mismatch warnings in tce	Randy Dunlap
	Fix section mismatch warnings: these functions are called only from __init functions. WARNING: vmlinux.o(.text+0x1861c): Section mismatch: reference to .init.text:free_bootmem (between 'free_tce_table' and 'build_tce_table') WARNING: vmlinux.o(.text+0x187e5): Section mismatch: reference to .init.text:__alloc_bootmem_low (between 'alloc_tce_table' and 'kretprobe_trampoline_holder') Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26	amd64: fix get_user() on bitwise	Al Viro
	We really need force-cast when converting to final result type; unsigned long can be silently converted to integer types and to pointers, but not to bitwise. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-25	[x86 setup] Make struct ist_info cross-architecture, and use in setup code	H. Peter Anvin
	Make "struct ist_info" valid on both i386 and x86-64, and use the structure by name in the setup code. Additionally, "Intel SpeedStep IST" is redundant, refer to it as IST consistently. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2007-07-25	ACPI: Kconfig: remove CONFIG_ACPI_SLEEP from source	Len Brown
	As it was a synonym for (CONFIG_ACPI && CONFIG_X86), the ifdefs for it were more clutter than they were worth. For ia64, just add a few stubs in anticipation of future S3 or S4 support. Signed-off-by: Len Brown <len.brown@intel.com>
2007-07-22	x86_64: Share msidef.h and hypertransport.h includes with i386	Andi Kleen
	They are identical Indirectly pointed out by Thomas Gleixner Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22	x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.	Juergen Beisert
	Due to index register access ordering problems, when using macros a line like this fails (and does nothing): setCx86(CX86_CCR2, getCx86(CX86_CCR2) \| 0x88); With inlined functions this line will work as expected. Note about a side effect: Seems on Geode GX1 based systems the "suspend on halt power saving feature" was never enabled due to this wrong macro expansion. With inlined functions it will be enabled, but this will stop the TSC when the CPU runs into a HLT instruction. Kernel output something like this: Clocksource tsc unstable (delta = -472746897 ns) This is the 3rd version of this patch. - Adding missed arch/i386/kernel/cpu/mtrr/state.c Thanks to Andres Salomon - Adding some big fat comments into the new header file Suggested by Andi Kleen AK: fixed x86-64 compilation Signed-off-by: Juergen Beisert <juergen@kreuzholzen.de> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22	x86_64: x86_64 - Use non locked version for local_cmpxchg()	Mathieu Desnoyers
	local_cmpxchg() should not use any LOCK prefix. This change probably got lost in the move to cmpxchg.h. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22	x86: Stop MCEs and NMIs during code patching	Andi Kleen
	When a machine check or NMI occurs while multiple byte code is patched the CPU could theoretically see an inconsistent instruction and crash. Prevent this by temporarily disabling MCEs and returning early in the NMI handler. Based on discussion with Mathieu Desnoyers. Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22	x86: Fix alternatives and kprobes to remap write-protected kernel text	Andi Kleen
	Reenable kprobes and alternative patching when the kernel text is write protected by DEBUG_RODATA Add a general utility function to change write protected text. The new function remaps the code using vmap to write it and takes care of CPU synchronization. It also does CLFLUSH to make icache recovery faster. There are some limitations on when the function can be used, see the comment. This is a newer version that also changes the paravirt_ops code. text_poke also supports multi byte patching now. Contains bug fixes from Zach Amsden and suggestions from Mathieu Desnoyers. Cc: Jan Beulich <jbeulich@novell.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Zach Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22	x86_64: Use read and write crX in .c files	Glauber de Oliveira Costa
	This patch uses the read and write functions provided at system.h for control registers instead of writting raw assembly over and over again in .c files. Functions to manipulate cr2 and cr8 were provided, as they were lacking. Also, removed some extra space after closing brackets Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22	x86: i386-show-unhandled-signals-v3	Masoud Asgharifard Sharbiani
	This patch makes the i386 behave the same way that x86_64 does when a segfault happens. A line gets printed to the kernel log so that tools that need to check for failures can behave more uniformly between debug.show_unhandled_signals sysctl variable to 0 (or by doing echo 0 > /proc/sys/debug/exception-trace) Also, all of the lines being printed are now using printk_ratelimit() to deny the ability of DoS from a local user with a program like the following: main() { while (1) if (!fork()) (int )0 = 0; } This new revision also includes the fix that Andrew did which got rid of new sysctl that was added to the system in earlier versions of this. Also, 'show-unhandled-signals' sysctl has been renamed back to the old 'exception-trace' to avoid breakage of people's scripts. AK: Enabling by default for i386 will be likely controversal, but let's see what happens AK: Really folks, before complaining just fix your segfaults AK: I bet this will find a lot of silent issues Signed-off-by: Masoud Sharbiani <masouds@google.com> Signed-off-by: Andi Kleen <ak@suse.de> [ Personally, I've found the complaints useful on x86-64, so I'm all for this. That said, I wonder if we could do it more prettily.. -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata	Muli Ben-Yehuda
	This patch introduces struct pci_sysdata to x86 and x86-64, and converts the existing two users (NUMA, Calgary) to use it. This lays the groundwork for having other users of sysdata, such as the PCI domains work. The Calgary bits are tested, the NUMA bits just look ok. Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: move iommu declaration from proto to iommu.h	Yinghai Lu
	[akpm@linux-foundation.org: build fix] Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: disable the GART in shutdown	Yinghai Lu
	For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM installed. when using kexec to load second kernel. In the second kernel, when mem is allocated for GART, it will do the memset for clear, it will cause restart, because some device still used that for dma. solution will be: in second kernel: disable that at first before we try to allocate mem for it. or in the first kernel: do disable that before shutdown. Andi/Eric/Alan prefer to second one for clean shutdown in first kernel. Andi also point out need to consider to AGP enable but mem less 4G case too. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: flush_tlb_kernel_range() warning fix	Andrew Morton
	mm/vmalloc.c: In function 'unmap_kernel_range': mm/vmalloc.c:75: warning: unused variable 'start' make it a C function so that the compiler thinks it used its arguments. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: fix wrong comment regarding set_fixmap()	Jiri Kosina
	The function name is set_fixmap(), not fixmap_set() as stated in the comment. Also fix a typo, punctuation and lower/uppercase a bit. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: use the global PIT lock	Thomas Gleixner
	Replace the pcspkr private PIT lock by the global PIT lock to serialize the PIT access all over the place. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: Move functions declarations to header file	Glauber de Oliveira Costa
	Some interrupt entry points are currently defined in i8259.c They probably belong in a header. Right now, their only user is init_IRQ, justifying their declaration in-file. But when virtualization comes in, we may be interested in using that functions in late initializations. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: make dump_error_regs a chip op	Muli Ben-Yehuda
	Provide seperate versions for Calgary and CalIOC2 Also print out the PCIe Root Complex Status on CalIOC2 errors Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: introduce chipset specific ops	Muli Ben-Yehuda
	Calgary and CalIOC2 share most of the same logic. Introduce struct cal_chipset_ops for quirks and tce flush logic which are [akpm@linux-foundation.org: make calgary_chip_ops static] Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86: remove support for the Rise CPU	Adrian Bunk
	The Rise CPUs were only very short-lived, and there are no reports of anyone both owning one and running Linux on it. Googling for the printk string "CPU: Rise iDragon" didn't find any dmesg available online. If it turns out that against all expectations there are actually users reverting this patch would be easy. This patch will make the kernel images smaller by a few bytes for all i386 users. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86: PM_TRACE support	Nigel Cunningham
	Signed-off-by: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: support poll() on /dev/mcelog	Tim Hockin
	Background: /dev/mcelog is typically polled manually. This is less than optimal for situations where accurate accounting of MCEs is important. Calling poll() on /dev/mcelog does not work. Description: This patch adds support for poll() to /dev/mcelog. This results in immediate wakeup of user apps whenever the poller finds MCEs. Because the exception handler can not take any locks, it can not call the wakeup itself. Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is caught at the next return from interrupt or exit from idle, calling the mce_user_notify() routine. This patch also disables the "fake panic" path of the mce_panic(), because it results in printk()s in the exception handler and crashy systems. This patch also does some small cleanup for essentially unused variables, and moves the user notification into the body of the poller, so it is only called once per poll, rather than once per CPU. Result: Applications can now poll() on /dev/mcelog. When an error is logged (whether through the poller or through an exception) the applications are woken up promptly. This should not affect any previous behaviors. If no MCEs are being logged, there is no overhead. Alternatives: I considered simply supporting poll() through the poller and not using TIF_MCE_NOTIFY at all. However, the time between an uncorrectable error happening and the user application being notified is themost* critical window for us. Many uncorrectable errors can be logged to the network if given a chance. I also considered doing the MCE poll directly from the idle notifier, but decided that was overkill. Testing: I used an error-injecting DIMM to create lots of correctable DRAM errors and verified that my user app is woken up in sync with the polling interval. I also used the northbridge to inject uncorrectable ECC errors, and verified (printk() to the rescue) that the notify routine is called and the user app does wake up. I built with PREEMPT on and off, and verified that my machine survives MCEs. [wli@holomorphy.com: build fix] Signed-off-by: Tim Hockin <thockin@google.com> Signed-off-by: William Irwin <bill.irwin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: fake pxm-to-node mapping for fake numa	David Rientjes
	For NUMA emulation, our SLIT should represent the true NUMA topology of the system but our proximity domain to node ID mapping needs to reflect the emulated state. When NUMA emulation has successfully setup fake nodes on the system, a new function, acpi_fake_nodes() is called. This function determines the proximity domain (_PXM) for each true node found on the system. It then finds which emulated nodes have been allocated on this true node as determined by its starting address. The node ID to PXM mapping is changed so that each fake node ID points to the PXM of the true node that it is located on. If the machine failed to register a SLIT, then we assume there is no special requirement for emulated node affinity so we use the default LOCAL_DISTANCE, which is newly exported to this code, as our measurement if the emulated nodes appear in the same PXM. Otherwise, we use REMOTE_DISTANCE. PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we can compare node_to_pxm() results in generic code (in this case, the SRAT code). Cc: Len Brown <lenb@kernel.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: Quicklist support for x86_64	Christoph Lameter
	This adds caching of pgds and puds, pmds, pte. That way we can avoid costly zeroing and initialization of special mappings in the pgd. A second quicklist is useful to separate out PGD handling. We can carry the initialized pgds over to the next process needing them. Also clean up the pgd_list handling to use regular list macros. There is no need anymore to avoid the lru field. Move the add/removal of the pgds to the pgdlist into the constructor / destructor. That way the implementation is congruent with i386. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Luck, Tony" <tony.luck@intel.com> Acked-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86: share hpet.h with i386	Thomas Gleixner
	hpet.h in asm-i386 and asm-x86_64 contain tons of duplicated stuff. Consolidate into one shared header file. AK: Fix i386 compilation with !X86_IO_APIC Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: Fix APIC typo	Thomas Gleixner
	Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: Untangle asm/hpet.h from asm/timex.h	Chris Wright
	When making changes to x86_64 timers, I noticed that touching hpet.h triggered an unreasonably large rebuild. Untangling it from timex.h quiets the extra rebuild quite a bit. Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: remove extra extern declaring about dmi_ioremap	Yinghai Lu
	Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu	Andi Kleen
	This implements new vDSO for x86-64. The concept is similar to the existing vDSOs on i386 and PPC. x86-64 has had static vsyscalls before, but these are not flexible enough anymore. A vDSO is a ELF shared library supplied by the kernel that is mapped into user address space. The vDSO mapping is randomized for each process for security reasons. Doing this was needed for clock_gettime, because clock_gettime always needs a syscall fallback and having one at a fixed address would have made buffer overflow exploits too easy to write. The vdso can be disabled with vdso=0 It currently includes a new gettimeofday implemention and optimized clock_gettime(). The gettimeofday implementation is slightly faster than the one in the old vsyscall. clock_gettime is significantly faster than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME. The new calls are generally faster than the old vsyscall. Advantages over the old x86-64 vsyscalls: - Extensible - Randomized - Cleaner - Easier to virtualize (the old static address range previously causes overhead e.g. for Xen because it has to create special page tables for it) Weak points: - glibc support still to be written The VM interface is partly based on Ingo Molnar's i386 version. Includes compile fix from Joachim Deguara Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: Always use builtin memcpy on gcc 4.3	Andi Kleen
	Jan asked to always use the builtin memcpy on gcc 4.3 mainline because it should generate better code than the old macro. Let's try it. Cc: Jan Hubicka <jh@suse.cz> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: asm/ptrace.h needs linux/compiler.h	Jean Delvare
	On x86_64, <asm/ptrace.h> uses __user but doesn't include <linux/compiler.h>. This could lead to build failures. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-21	x86_64: wbinvd macro fix	Nick Piggin
	Too many semicolons in this macro. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19	[PATCH] sched: sched_cacheflush is now unused	Ralf Baechle
	Since Ingo's recent scheduler rewrite which was merged as commit 0437e109e1841607f2988891eaa36c531c6aa6ac sched_cacheflush is unused. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-07-19	arch: personality independent stack top	Peter Zijlstra
	New arch macro STACK_TOP_MAX it gives the larges valid stack address for the architecture in question. It differs from STACK_TOP in that it will not distinguish between personalities but will always return the largest possible address. This is used to create the initial stack on execve, which we will move down to the proper location once the binfmt code has figured out where that is. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19	define new percpu interface for shared data	Fenghua Yu
	per cpu data section contains two types of data. One set which is exclusively accessed by the local cpu and the other set which is per cpu, but also shared by remote cpus. In the current kernel, these two sets are not clearely separated out. This can potentially cause the same data cacheline shared between the two sets of data, which will result in unnecessary bouncing of the cacheline between cpus. One way to fix the problem is to cacheline align the remotely accessed per cpu data, both at the beginning and at the end. Because of the padding at both ends, this will likely cause some memory wastage and also the interface to achieve this is not clean. This patch: Moves the remotely accessed per cpu data (which is currently marked as ____cacheline_aligned_in_smp) into a different section, where all the data elements are cacheline aligned. And as such, this differentiates the local only data and remotely accessed data cleanly. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: <linux-arch@vger.kernel.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19	jprobes: remove JPROBE_ENTRY()	Michael Ellerman
	AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything useful - so remove it .. I've left a do-nothing version so that out-of-tree jprobes code will still compile without modifications. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17	sys_fallocate() implementation on i386, x86_64 and powerpc	Amit Arora
	fallocate() is a new system call being proposed here which will allow applications to preallocate space to any file(s) in a file system. Each file system implementation that wants to use this feature will need to support an inode operation called ->fallocate(). Applications can use this feature to avoid fragmentation to certain level and thus get faster access speed. With preallocation, applications also get a guarantee of space for particular file(s) - even if later the the system becomes full. Currently, glibc provides an interface called posix_fallocate() which can be used for similar cause. Though this has the advantage of working on all file systems, but it is quite slow (since it writes zeroes to each block that has to be preallocated). Without a doubt, file systems can do this more efficiently within the kernel, by implementing the proposed fallocate() system call. It is expected that posix_fallocate() will be modified to call this new system call first and incase the kernel/filesystem does not implement it, it should fall back to the current implementation of writing zeroes to the new blocks. ToDos: 1. Implementation on other architectures (other than i386, x86_64, and ppc). Patches for s390(x) and ia64 are already available from previous posts, but it was decided that they should be added later once fallocate is in the mainline. Hence not including those patches in this take. 2. Changes to glibc, a) to support fallocate() system call b) to make posix_fallocate() and posix_fallocate64() call fallocate() Signed-off-by: Amit Arora <aarora@in.ibm.com>
2007-07-17	fbdev: detect primary display device	Antonino A. Daplas
	Add function helper, fb_is_primary_device(). Given struct fb_info, it will return a nonzero value if the device is the primary display. Currently, only the i386 is supported where the function checks for the IORESOURCE_ROM_SHADOW flag. Signed-off-by: Antonino Daplas <adaplas@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17	fbdev: move arch-specific bits to their respective subdirectories	Antonino A. Daplas
	Move arch-specific bits of fb_mmap() to their respective subdirectories [bob.picco@hp.com: efi_range_is_wc is referenced but not declared] [bunk@stusta.de: fix include/asm-m68k/fb.h] Signed-off-by: Antonino Daplas <adaplas@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17	Add __GFP_MOVABLE for callers to flag allocations from high memory that may ↵	Mel Gorman
	be migrated It is often known at allocation time whether a page may be migrated or not. This patch adds a flag called __GFP_MOVABLE and a new mask called GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated using the page migration mechanism or reclaimed by syncing with backing storage and discarding. An API function very similar to alloc_zeroed_user_highpage() is added for __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The flags used by alloc_zeroed_user_highpage() are not changed because it would change the semantics of an existing API. After this patch is applied there are no in-kernel users of alloc_zeroed_user_highpage() so it probably should be marked deprecated if this patch is merged. Note that this patch includes a minor cleanup to the use of __GFP_ZERO in shmem.c to keep all flag modifications to inode->mapping in the shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of Hugh Dickens. Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the concept. Credit to Hugh Dickens for catching issues with shmem swap vector and ramfs allocations. [akpm@linux-foundation.org: build fix] [hugh@veritas.com: __GFP_ZERO cleanup] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>