Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2007-05-08	utimensat implementation	Ulrich Drepper
	Implement utimensat(2) which is an extension to futimesat(2) in that it a) supports nano-second resolution for the timestamps b) allows to selectively ignore the atime/mtime value c) allows to selectively use the current time for either atime or mtime d) supports changing the atime/mtime of a symlink itself along the lines of the BSD lutimes(3) functions For this change the internally used do_utimes() functions was changed to accept a timespec time value and an additional flags parameter. Additionally the sys_utime function was changed to match compat_sys_utime which already use do_utimes instead of duplicating the work. Also, the completely missing futimensat() functionality is added. We have such a function in glibc but we have to resort to using /proc/self/fd/* which not everybody likes (chroot etc). Test application (the syscall number will need per-arch editing): #include <errno.h> #include <fcntl.h> #include <time.h> #include <sys/time.h> #include <stddef.h> #include <syscall.h> #define __NR_utimensat 280 #define UTIME_NOW ((1l << 30) - 1l) #define UTIME_OMIT ((1l << 30) - 2l) int main(void) { int status = 0; int fd = open("ttt", O_RDWR\|O_CREAT\|O_EXCL, 0666); if (fd == -1) error (1, errno, "failed to create test file \"ttt\""); struct stat64 st1; if (fstat64 (fd, &st1) != 0) error (1, errno, "fstat failed"); struct timespec t[2]; t[0].tv_sec = 0; t[0].tv_nsec = 0; t[1].tv_sec = 0; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); struct stat64 st2; if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != 0 \|\| st2.st_atim.tv_nsec != 0) { puts ("atim not reset to zero"); status = 1; } if (st2.st_mtim.tv_sec != 0 \|\| st2.st_mtim.tv_nsec != 0) { puts ("mtim not reset to zero"); status = 1; } if (status != 0) goto out; t[0] = st1.st_atim; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_OMIT; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec \|\| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("atim not set"); status = 1; } if (st2.st_mtim.tv_sec != 0 \|\| st2.st_mtim.tv_nsec != 0) { puts ("mtim changed from zero"); status = 1; } if (status != 0) goto out; t[0].tv_sec = 0; t[0].tv_nsec = UTIME_OMIT; t[1] = st1.st_mtim; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec \|\| st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("mtim changed from original time"); status = 1; } if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec \|\| st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec) { puts ("mtim not set"); status = 1; } if (status != 0) goto out; sleep (2); t[0].tv_sec = 0; t[0].tv_nsec = UTIME_NOW; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_NOW; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); struct timeval tv; gettimeofday(&tv,NULL); if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec \|\| st2.st_atim.tv_sec > tv.tv_sec) { puts ("atim not set to NOW"); status = 1; } if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec \|\| st2.st_mtim.tv_sec > tv.tv_sec) { puts ("mtim not set to NOW"); status = 1; } if (symlink ("ttt", "tttsym") != 0) error (1, errno, "cannot create symlink"); t[0].tv_sec = 0; t[0].tv_nsec = 0; t[1].tv_sec = 0; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0) error (1, errno, "utimensat failed"); if (lstat64 ("tttsym", &st2) != 0) error (1, errno, "lstat failed"); if (st2.st_atim.tv_sec != 0 \|\| st2.st_atim.tv_nsec != 0) { puts ("symlink atim not reset to zero"); status = 1; } if (st2.st_mtim.tv_sec != 0 \|\| st2.st_mtim.tv_nsec != 0) { puts ("symlink mtim not reset to zero"); status = 1; } if (status != 0) goto out; t[0].tv_sec = 1; t[0].tv_nsec = 0; t[1].tv_sec = 1; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != 1 \|\| st2.st_atim.tv_nsec != 0) { puts ("atim not reset to one"); status = 1; } if (st2.st_mtim.tv_sec != 1 \|\| st2.st_mtim.tv_nsec != 0) { puts ("mtim not reset to one"); status = 1; } if (status == 0) puts ("all OK"); out: close (fd); unlink ("ttt"); unlink ("tttsym"); return status; } [akpm@linux-foundation.org: add missing i386 syscall table entry] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Alexey Dobriyan <adobriyan@openvz.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08	Clean up mostly unused IOSPACE macros	David Gibson
	Most architectures defined three macros, MK_IOSPACE_PFN(), GET_IOSPACE() and GET_PFN() in pgtable.h. However, the only callers of any of these macros are in Sparc specific code, either in arch/sparc, arch/sparc64 or drivers/sbus. This patch removes the redundant macros from all architectures except sparc and sparc64. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08	i386: sched.h inclusion from module.h is baack	Alexey Dobriyan
	linux/module.h -> linux/elf.h -> asm-i386/elf.h -> linux/utsname.h -> linux/sched.h Noticeably cut the number of files which are rebuild upon touching sched.h and cut down pulled junk from every module.h inclusion. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08	kdump/kexec: calculate note size at compile time	Simon Horman
	Currently the size of the per-cpu region reserved to save crash notes is set by the per-architecture value MAX_NOTE_BYTES. Which in turn is currently set to 1024 on all supported architectures. While testing ia64 I recently discovered that this value is in fact too small. The particular setup I was using actually needs 1172 bytes. This lead to very tedious failure mode where the tail of one elf note would overwrite the head of another if they ended up being alocated sequentially by kmalloc, which was often the case. It seems to me that a far better approach is to caclculate the size that the area needs to be. This patch does just that. If a simpler stop-gap patch for ia64 to be squeezed into 2.6.21(.X) is needed then this should be as easy as making MAX_NOTE_BYTES larger in arch/asm-ia64/kexec.h. Perhaps 2048 would be a good choice. However, I think that the approach in this patch is a much more robust idea. Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08	move die notifier handling to common code	Christoph Hellwig
	This patch moves the die notifier handling to common code. Previous various architectures had exactly the same code for it. Note that the new code is compiled unconditionally, this should be understood as an appel to the other architecture maintainer to implement support for it aswell (aka sprinkling a notify_die or two in the proper place) arm had a notifiy_die that did something totally different, I renamed it to arm_notify_die as part of the patch and made it static to the file it's declared and used at. avr32 used to pass slightly less information through this interface and I brought it into line with the other architectures. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix vmalloc_sync_all bustage] [bryan.wu@analog.com: fix vmalloc_sync_all in nommu] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08	tty: i386/x86_64 arbitary speed support	Alan Cox
	Adds the needed TCGETS2/TCSETS2 ioctl calls, structures, defines and the like. Tested against the test suite and passes. Other platforms should need roughly the same change. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08	ipmi: add new IPMI nmi watchdog handling	Corey Minyard
	Convert over to the new NMI handling for getting IPMI watchdog timeouts via an NMI. This add config options to know if there is the ability to receive NMIs and if it has an NMI post processing call. Then it modifies the IPMI watchdog to take advantage of this so that it can know if an NMI comes in. It also adds testing that the IPMI NMI watchdog works. Signed-off-by: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08	i386: Add safe variants of rdmsr_on_cpu and wrmsr_on_cpu	Rudolf Marek
	Add safe (exception handled) variants of rdmsr_on_cpu and wrmsr_on_cpu. You should use these when the target MSR may not actually exist, as doing so could trigger an exception which the regular functions do not handle. The safe variants are slower, though. The upcoming coretemp hardware monitoring driver will need this. Signed-off-by: Rudolf Marek <r.marek@assembler.cz> Cc: Alexey Dobriyan <adobriyan@openvz.org> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
2007-05-07	i386: use page allocator to allocate thread_info structure	Christoph Lameter
	i386 uses kmalloc to allocate the threadinfo structure assuming that the allocations result in a page sized aligned allocation. That has worked so far because SLAB exempts page sized slabs from debugging and aligns them in special ways that goes beyond the restrictions imposed by KMALLOC_ARCH_MINALIGN valid for other slabs in the kmalloc array. SLUB also works fine without debugging since page sized allocations neatly align at page boundaries. However, if debugging is switched on then SLUB will extend the slab with debug information. The resulting slab is not longer of page size. It will only be aligned following the requirements imposed by KMALLOC_ARCH_MINALIGN. As a result the threadinfo structure may not be page aligned which makes i386 fail to boot with SLUB debug on. Replace the calls to kmalloc with calls into the page allocator. An alternate solution may be to create a custom slab cache where the alignment is set to PAGE_SIZE. That would allow slub debugging to be applied to the threadinfo structure. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07	i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}	Zachary Amsden
	If you actually clear the bit, you need to: + pte_update_defer(vma->vm_mm, addr, ptep); The reason is, when updating PTEs, the hypervisor must be notified. Using atomic operations to do this is fine for all hypervisors I am aware of. However, for hypervisors which shadow page tables, if these PTE modifications are not trapped, you need a post-modification call to fulfill the update of the shadow page table. Acked-by: Zachary Amsden <zach@vmware.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07	i386: add ptep_test_and_clear_{dirty,young}	David Rientjes
	Add ptep_test_and_clear_{dirty,young} to i386. They advertise that they have it and there is at least one place where it needs to be called without the page table lock: to clear the accessed bit on write to /proc/pid/clear_refs. ptep_clear_flush_{dirty,young} are updated to use the new functions. The overall net effect to current users of ptep_clear_flush_{dirty,young} is that we introduce an additional branch. Cc: Hugh Dickins <hugh@veritas.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-05	Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6	Linus Torvalds
	* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits) [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall [PATCH] i386: type may be unused [PATCH] i386: Some additional chipset register values validation. [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split. [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu [PATCH] i386: white space fixes in i387.h [PATCH] i386: Drop noisy e820 debugging printks [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems [PATCH] x86-64: Share identical video.S between i386 and x86-64 [PATCH] x86-64: Remove CONFIG_REORDER [PATCH] x86-64: Print type and size correctly for unknown compat ioctls [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0) [PATCH] i386: Little cleanups in smpboot.c [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible [PATCH] i386: Add X86_FEATURE_RDTSCP [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 [PATCH] i386: Implement alternative_io for i386 ... Fix up trivial conflict in include/linux/highmem.h manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-05	Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6	Linus Torvalds
	* master.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6: [VOYAGER] add smp alternatives [VOYAGER] Use modern techniques to setup and teardown low identiy mappings. [VOYAGER] Convert the monitor thread to use the kthread API [VOYAGER] clockevents driver: bring voyager in to line [VOYAGER] clockevents: correct boot cpu is zero assumption [VOYAGER] add smp_call_function_single
2007-05-04	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6	Linus Torvalds
	* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (59 commits) PCI: Free resource files in error path of pci_create_sysfs_dev_files() pci-quirks: disable MSI on RS400-200 and RS480 PCI hotplug: Use menuconfig objects PCI: ZT5550 CPCI Hotplug driver fix PCI: rpaphp: Remove semaphores PCI: rpaphp: Ensure more pcibios_add/pcibios_remove symmetry PCI: rpaphp: Use pcibios_remove_pci_devices() symmetrically PCI: rpaphp: Document is_php_dn() PCI: rpaphp: Document find_php_slot() PCI: rpaphp: Rename rpaphp_register_pci_slot() to rpaphp_enable_slot() PCI: rpaphp: refactor tail call to rpaphp_register_slot() PCI: rpaphp: remove rpaphp_set_attention_status() PCI: rpaphp: remove print_slot_pci_funcs() PCI: rpaphp: Remove setup_pci_slot() PCI: rpaphp: remove a call that does nothing but a pointer lookup PCI: rpaphp: Remove another wrappered function PCI: rpaphp: Remve another call that is a wrapper PCI: rpaphp: remove a function that does nothing but wrap debug printks PCI: rpaphp: Remove un-needed goto PCI: rpaphp: Fix a memleak; slot->location string was never freed ...
2007-05-04	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart	Linus Torvalds
	* master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart: [AGPGART] sworks-agp: Switch to PCI ref counting APIs [AGPGART] Nvidia AGP: Use refcount aware PCI interfaces [AGPGART] Fix sparse warning in sgi-agp.c [AGPGART] Intel-agp adjustments [AGPGART] Move [un]map_page_into_agp into asm/agp.h [AGPGART] Add missing calls to global_flush_tlb() to ali-agp [AGPGART] prevent probe collision of sis-agp and amd64_agp
2007-05-02	PCI: scatterlist.h needs types.h	Jean Delvare
	Most architectures' scatterlist.h use the type dma_addr_t, but omit to include <asm/types.h> which defines it. This could lead to build failures, so let's add the missing includes. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02	[PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu	Jan Kiszka
	There are two callers of __unlazy_fpu, unlazy_fpu and __switch_to, and none of them appear to require additional preempt_disable/enable here. Let's open-code save_init_fpu in __unlazy_fpu to save a few ops. Signed-off-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: white space fixes in i387.h	Jan Kiszka
	Signed-off-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] x86: Use RDTSCP for synchronous get_cycles if possible	Andi Kleen
	RDTSCP is already synchronous and doesn't need an explicit CPUID. This is a little faster and more importantly avoids VMEXITs on Hypervisors. Original patch from Joerg Roedel, but reworked by AK Also includes miscompilation fix by Eric Biederman Cc: "Joerg Roedel" <joerg.roedel@amd.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Add X86_FEATURE_RDTSCP	Andi Kleen
	Following x86-64 Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386	Andi Kleen
	Syncs up with x86-64. Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Implement alternative_io for i386	Andi Kleen
	Ported from x86-64. Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Evaluate constant cpu features at runtime	Andi Kleen
	Redefine cpu_has() to evaluate cpu features already checked in early boot at compile time. This way the compiler might eliminate some dead code. Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Verify important CPUID bits in real mode	Andi Kleen
	Check some CPUID bits that are needed for compiler generated early in boot. When the system is still in real mode before changing the VESA BIOS mode it is possible to still display an visible error message on the screen. Similar to x86-64. Includes cleanups from Eric Biederman Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Clean up NMI watchdog code	Andi Kleen
	- Introduce a wd_ops structure - Convert the various nmi watchdogs over to it - This allows to split the perfctr reservation from the watchdog setup cleanly. - Do perfctr reservation globally as it should have always been - Remove dead code referenced only by unused EXPORT_SYMBOLs Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: pte simplify ops	Zachary Amsden
	Add comment and condense code to make use of native_local_ptep_get_and_clear function. Also, it turns out the 2-level and 3-level paging definitions were identical, so move the common definition into pgtable.h Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: pte xchg optimization	Zachary Amsden
	In situations where page table updates need only be made locally, and there is no cross-processor A/D bit races involved, we need not use the heavyweight xchg instruction to atomically fetch and clear page table entries. Instead, we can just read and clear them directly. This introduces a neat optimization for non-SMP kernels; drop the atomic xchg operations from page table updates. Thanks to Michel Lespinasse for noting this potential optimization. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: pte clear optimization	Zachary Amsden
	When exiting from an address space, no special hypervisor notification of page table updates needs to occur; direct page table hypervisors, such as Xen, switch to another address space first (init_mm) and unprotects the page tables to avoid the cost of trapping to the hypervisor for each pte_clear. Shadow mode hypervisors, such as VMI and lhype don't need to do the extra work of calling through paravirt-ops, and can just directly clear the page table entries without notifiying the hypervisor, since all the page tables are about to be freed. So introduce native_pte_clear functions which bypass any paravirt-ops notification. This results in a significant performance win for VMI and removes some indirect calls from zap_pte_range. Note the 3-level paging already had a native_pte_clear function, thus demanding argument conformance and extra args for the 2-level definition. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: safe_apic_wait_icr_idle - i386	Fernando Luis VazquezCao
	apic_wait_icr_idle looks like this: static __inline__ void apic_wait_icr_idle(void) { while (apic_read(APIC_ICR) & APIC_ICR_BUSY) cpu_relax(); } The busy loop in this function would not be problematic if the corresponding status bit in the ICR were always updated, but that does not seem to be the case under certain crash scenarios. Kdump uses an IPI to stop the other CPUs in the event of a crash, but when any of the other CPUs are locked-up inside the NMI handler the CPU that sends the IPI will end up looping forever in the ICR check, effectively hard-locking the whole system. Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3: "A local APIC unit indicates successful dispatch of an IPI by resetting the Delivery Status bit in the Interrupt Command Register (ICR). The operating system polls the delivery status bit after sending an INIT or STARTUP IPI until the command has been dispatched. A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions. If the IPI is not successfully dispatched, the operating system can abort the command. Alternatively, the operating system can retry the IPI by writing the lower 32-bit double word of the ICR. This “time-out” mechanism can be implemented through an external interrupt, if interrupts are enabled on the processor, or through execution of an instruction or time-stamp counter spin loop." Intel's documentation suggests the implementation of a time-out mechanism, which, by the way, is already being open-coded in some parts of the kernel that tinker with ICR. Create a apic_wait_icr_idle replacement that implements the time-out mechanism and that can be used to solve the aforementioned problem. AK: moved both functions out of line AK: added improved loop from Keith Owens Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync	Bernhard Kaindl
	If our copy of the MTRRs of the BSP has RdMem or WrMem set, and we are running on an AMD64/K8 system, the boot CPU must have had MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would have copied these bits cleared), so we set them on this CPU as well. This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync across the CPUs of SMP systems in order to fullfill the duty of system software to "initialize and maintain MTRR consistency across all processors." as written in the AMD and Intel manuals. If an WRMSR instruction fails because MtrrFixDramModEn is not set, I expect that also the Intel-style MTRR bits are not updated. AK: minor cleanup, moved MSR defines around Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02	[PATCH] x86: Save the MTRRs of the BSP before booting an AP	Bernhard Kaindl
	Applied fix by Andew Morton: http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'. AMD and Intel x86 CPU manuals state that it is the responsibility of system software to initialize and maintain MTRR consistency across all processors in Multi-Processing Environments. Quote from page 188 of the AMD64 System Programming manual (Volume 2): 7.6.5 MTRRs in Multi-Processing Environments "In multi-processing environments, the MTRRs located in all processors must characterize memory in the same way. Generally, this means that identical values are written to the MTRRs used by the processors." (short omission here) "Failure to do so may result in coherency violations or loss of atomicity. Processor implementations do not check the MTRR settings in other processors to ensure consistency. It is the responsibility of system software to initialize and maintain MTRR consistency across all processors." Current Linux MTRR code already implements the above in the case that the BIOS does not properly initialize MTRRs on the secondary processors, but the case where the fixed-range MTRRs of the boot processor are changed after Linux started to boot, before the initialsation of a secondary processor, is not handled yet. In this case, secondary processors are currently initialized by Linux with MTRRs which the boot processor had very early, when mtrr_bp_init() did run, but not with the MTRRs which the boot processor uses at the time when that secondary processors is actually booted, causing differing MTRR contents on the secondary processors. Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs of the boot processor when it transitions the system into ACPI mode. The SMI handler of the BIOS does this in SMM, entered while Linux ACPI code runs acpi_enable(). Other occasions where the SMI handler of the BIOS may change bits in the MTRRs could occur as well. To initialize newly booted secodary processors with the fixed-range MTRRs which the boot processor uses at that time, this patch saves the fixed-range MTRRs of the boot processor before new secondary processors are started. When the secondary processors run their Linux initialisation code, their fixed-range MTRRs will be updated with the saved fixed-range MTRRs. If CONFIG_MTRR is not set, we define mtrr_save_state as an empty statement because there is nothing to do. Possible TODOs: ) CPU-hotplugging outside of SMP suspend/resume is not yet tested with this patch. ) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up, then the calls to mtrr_save_state() could be replaced by calls to mtrr_save_fixed_ranges(NULL) and mtrr_save_state() would not be needed. That would need either verification of the CPU-hotplug code or at least a test on a >2 CPU machine. *) The MTRRs of other running processors are not yet checked at this time but it might be interesting to syncronize the MTTRs of all processors before booting. That would be an incremental patch, but of rather low priority since there is no machine known so far which would require this. AK: moved prototypes on x86-64 around to fix warnings Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02	[PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches.	Bernhard Kaindl
	In this current implementation which is used in other patches, mtrr_save_fixed_ranges() accepts a dummy void pointer because in the current implementation of one of these patches, this function may be called from smp_call_function_single() which requires that this function takes a void pointer argument. This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges which is the element of the static struct which stores our current backup of the fixed-range MTRR values which all CPUs shall be using. Because mtrr_save_fixed_ranges calls get_fixed_ranges after kernel initialisation time, __init needs to be removed from the declaration of get_fixed_ranges(). If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges as an empty statement because there is nothing to do. AK: Moved prototypes for x86-64 around to fix warnings Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>
2007-05-02	[PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org>	Jeremy Fitzhardinge
	The other symbols used to delineate the alt-instructions sections have the form __foo/__foo_end. Rename parainstructions to match. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-05-02	[PATCH] i386: Convert VMI timer to use clock events	Zachary Amsden
	Convert VMI timer to use clock events, making it properly able to use the NO_HZ infrastructure. On UP systems, with no local APIC, we just continue to route these events through the PIT. On systems with a local APIC, or SMP, we provide a single source interrupt chip which creates the local timer IRQ. It actually gets delivered by the APIC hardware, but we don't want to use the same local APIC clocksource processing, so we create our own handler here. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> CC: Dan Hecht <dhecht@vmware.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de>
2007-05-02	[PATCH] x86: update for i386 and x86-64 check_bugs	Jeremy Fitzhardinge
	Remove spurious comments, headers and keywords from x86-64 bugs.[ch]. Use identify_boot_cpu() AK: merged with other patch Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Fix UP gdt bugs	Jeremy Fitzhardinge
	Fixes two problems with the GDT when compiling for uniprocessor: - There's no percpu segment, so trying to load its selector into %fs fails. Use a null selector instead. - The real gdt needs to be loaded at some point. Do it in cpu_init(). Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au>
2007-05-02	[PATCH] i386: Define per_cpu_offset	Jeremy Fitzhardinge
	Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like asm-generic/percpu.h does for UP. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: cleanups to help using per-cpu variables from asm	Jeremy Fitzhardinge
	This patch does a few small cleanups: - use PER_CPU_NAME to generate the names of per-cpu variables - use lea to add the per_cpu offset in PER_CPU(), because it doesn't affect condition flags - add PER_CPU_VAR which allows direct access to pre-cpu variables with the %fs: prefix on SMP. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Convert PDA into the percpu section	Jeremy Fitzhardinge
	Currently x86 (similar to x84-64) has a special per-cpu structure called "i386_pda" which can be easily and efficiently referenced via the %fs register. An ELF section is more flexible than a structure, allowing any piece of code to use this area. Indeed, such a section already exists: the per-cpu area. So this patch: (1) Removes the PDA and uses per-cpu variables for each current member. (2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU. (3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which can be used to calculate addresses for this CPU's variables. (4) Simplifies startup, because %fs doesn't need to be loaded with a special segment at early boot; it can be deferred until the first percpu area is allocated (or never for UP). The result is less code and one less x86-specific concept. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: Page-align the GDT	Jeremy Fitzhardinge
	Xen wants a dedicated page for the GDT. I believe VMI likes it too. lguest, KVM and native don't care. Simple transformation to page-aligned "struct gdt_page". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>
2007-05-02	[PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear	Jeremy Fitzhardinge
	In shadow mode hypervisors, ptep_get_and_clear achieves the desired purpose of keeping the shadows in sync by issuing a native_get_and_clear, followed by a call to pte_update, which indicates the PTE has been modified. Direct mode hypervisors (Xen) have no need for this anyway, and will trap the update using writable pagetables. This means no hypervisor makes use of ptep_get_and_clear; there is no reason to have it in the paravirt-ops structure. Change confusing terminology about raw vs. native functions into consistent use of native_pte_xxx for operations which do not invoke paravirt-ops. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers	Jeremy Fitzhardinge
	Replace all the open-coded macros for generating calls with a pair of more general macros (__PVOP_CALL/VCALL), and redefine all the PVOP_V?CALL[0-4] in terms of them. [ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h" (paravirt_ops-document-asm-i386-paravirth.patch) ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu>
2007-05-02	[PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi	Jeremy Fitzhardinge
	Remove #defines, add enum for PARAVIRT_LAZY_FLUSH. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages	Jeremy Fitzhardinge
	Xen and VMI both have special requirements when mapping a highmem pte page into the kernel address space. These can be dealt with by adding a new kmap_atomic_pte() function for mapping highptes, and hooking it into the paravirt_ops infrastructure. Xen specifically wants to map the pte page RO, so this patch exposes a helper function, kmap_atomic_prot, which maps the page with the specified page protections. This also adds a kmap_flush_unused() function to clear out the cached kmap mappings. Xen needs this to clear out any potential stray RW mappings of pages which will become part of a pagetable. [ Zach - vmi.c will need some attention after this patch. It wasn't immediately obvious to me what needs to be done. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>
2007-05-02	[PATCH] i386: PARAVIRT: revert map_pt_hook.	Jeremy Fitzhardinge
	Back out the map_pt_hook to clear the way for kmap_atomic_pte. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>
2007-05-02	[PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op	Jeremy Fitzhardinge
	This patch adds a pv_op for flush_tlb_others. Linux running on native hardware uses cross-CPU IPIs to flush the TLB on any CPU which may have a particular mm's pagetable entries cached in its TLB. This is inefficient in a paravirtualized environment, since the hypervisor knows which real CPUs actually contain cached mappings, which may be a small subset of a guest's VCPUs. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>
2007-05-02	[PATCH] i386: PARAVIRT: add common patching machinery	Jeremy Fitzhardinge
	Implement the actual patching machinery. paravirt_patch_default() contains the logic to automatically patch a callsite based on a few simple rules: - if the paravirt_op function is paravirt_nop, then patch nops - if the paravirt_op function is a jmp target, then jmp to it - if the paravirt_op function is callable and doesn't clobber too much for the callsite, call it directly paravirt_patch_default is suitable as a default implementation of paravirt_ops.patch, will remove most of the expensive indirect calls in favour of either a direct call or a pile of nops. Backends may implement their own patcher, however. There are several helper functions to help with this: paravirt_patch_nop nop out a callsite paravirt_patch_ignore leave the callsite as-is paravirt_patch_call patch a call if the caller and callee have compatible clobbers paravirt_patch_jmp patch in a jmp paravirt_patch_insns patch some literal instructions over the callsite, if they fit This patch also implements more direct patches for the native case, so that when running on native hardware many common operations are implemented inline. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com> Cc: Anthony Liguori <anthony@codemonkey.ws> Acked-by: Ingo Molnar <mingo@elte.hu>
2007-05-02	[PATCH] i386: PARAVIRT: Document asm-i386/paravirt.h	Jeremy Fitzhardinge
	Clean things up, and broadly document: - the paravirt_ops functions themselves - the patching mechanism Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au>
2007-05-02	[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make ↵	Jeremy Fitzhardinge
	them patchable Wrap a set of interesting paravirt_ops calls in a wrapper which makes the callsites available for patching. Unfortunately this is pretty ugly because there's no way to get gcc to generate a function call, but also wrap just the callsite itself with the necessary labels. This patch supports functions with 0-4 arguments, and either void or returning a value. 64-bit arguments must be split into a pair of 32-bit arguments (lower word first). Small structures are returned in registers. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com> Cc: Anthony Liguori <anthony@codemonkey.ws>
2007-05-02	[PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register	Jeremy Fitzhardinge
	Fix a few clobbers to include the return register. The clobbers set is the set of all registers modified (or may be modified) by the code snippet, regardless of whether it was deliberate or accidental. Also, make sure that callsites which are used in contexts which don't allow clobbers actually save and restore all clobberable registers. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>