Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2006-02-27	Revert "[PATCH] x86_64: Only do the clustered systems have unsynchronized ↵	Linus Torvalds
	TSC assumption on IBM systems" This reverts commit 13a229abc25640813f1480c0478dfc6bdbc1c19e. Quoth Andi: "After some consideration and feedback from various people it turns out this wasn't that good an idea. It has some problems and needs more work. Since it was only an optimization anyways it's best to just back it out again for now." Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26	[PATCH] x86_64: Move the SMP time selection earlier	Andi Kleen
	SMP time selection originally ran after all CPUs were brought up because it needed to know the number of CPUs to decide if it needs an MP safe timer or not. This is not needed anymore because we know present CPUs early. This fixes a couple of problems: - apicmaintimer didn't always work because it relied on state that was set up time_init_gtod too late. - The output for the used timer in early kernel log was misleading because time_init_gtod could actually change it later. Now always print the final timer choice Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26	[PATCH] x86_64: Fix the additional_cpus=.. option	Andi Kleen
	It didn't set up the CPU possible map early enough, so the option didn't actually work. Noticed by Heiko Carstens Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26	[PATCH] x86_64: Only do the clustered systems have unsynchronized TSC ↵	Andi Kleen
	assumption on IBM systems Big Unisys systems have multiple clusters too, but they have an synchronized TSC. I'm using the SMBIOS to check for vendor == IBM. Cc: Chris McDermott <lcm@us.ibm.com> Cc: "Protasevich, Natalie" <Natalie.Protasevich@unisys.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26	[PATCH] x86_64: fix USER_PTRS_PER_PGD	Jan Beulich
	The value, while currently unused in the native kernel, was off by one. Signed-Off-By: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-26	[PATCH] x86_64: no_iommu removal in pci-gart.c	Jon Mason
	In previous versions of pci-gart.c, no_iommu was used to determine if IOMMU was disabled in the GART DMA mapping functions. This changed in 2.6.16 and now gart_xxx() functions are only called if gart is enabled. Therefore, uses of no_iommu in the GART code are no longer necessary and can be removed. Also, it removes double deceleration of no_iommu and force_iommu in pci.h and proto.h, by removing the deceleration in pci.h. Lastly, end_pfn off by one error. Tested (along with patch 1/2) on dual opteron with gart enabled, iommu=soft, and iommu=off. Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-17	[PATCH] x86_64: Disable tsc when apicpmtimer is active	Andi Kleen
	Otherwise it has no effect anyways. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-15	[PATCH] add asm-generic/mman.h	Michael S. Tsirkin
	Make new MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK consistent across all arches. The idea is to make it possible to use them portably even before distros include them in libc headers. Move common flags to asm-generic/mman.h Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Cc: Roland Dreier <rolandd@cisco.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-14	[PATCH] madvise MADV_DONTFORK/MADV_DOFORK	Michael S. Tsirkin
	Currently, copy-on-write may change the physical address of a page even if the user requested that the page is pinned in memory (either by mlock or by get_user_pages). This happens if the process forks meanwhile, and the parent writes to that page. As a result, the page is orphaned: in case of get_user_pages, the application will never see any data hardware DMA's into this page after the COW. In case of mlock'd memory, the parent is not getting the realtime/security benefits of mlock. In particular, this affects the Infiniband modules which do DMA from and into user pages all the time. This patch adds madvise options to control whether memory range is inherited across fork. Useful e.g. for when hardware is doing DMA from/into these pages. Could also be useful to an application wanting to speed up its forks by cutting large areas out of consideration. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-11	[PATCH] x86-64: Fix HPET timer on x460	Chris McDermott
	[description from AK] The IBM Summit 3 chipset doesn't implement the HPET timer replacement option. Since the current Linux code relies on it use a mixed mode with both PIT for the interrupt and HPET counters for the time keeping. That was already implemented, but didn't work properly because it was still using the last interrupt offset in HPET. This resulted in x460 not booting. Fix this up by using the free running HPET counter. Shouldn't affect any other machine because they either use full HPET mode or no HPET at all. TBD needs a similar 32bit fix. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Bob Picco <bob.picco@hp.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-11	[PATCH] fstatat64 support	Ulrich Drepper
	The *at patches introduced fstatat and, due to inusfficient research, I used the newfstat functions generally as the guideline. The result is that on 32-bit platforms we don't have all the information needed to implement fstatat64. This patch modifies the code to pass up 64-bit information if __ARCH_WANT_STAT64 is defined. I renamed the syscall entry point to make this clear. Other archs will continue to use the existing code. On x86-64 the compat code is implemented using a new sys32_ function. this is what is done for the other stat syscalls as well. This patch might break some other archs (those which define __ARCH_WANT_STAT64 and which already wired up the syscall). Yet others might need changes to accomodate the compatibility mode. I really don't want to do that work because all this stat handling is a mess (more so in glibc, but the kernel is also affected). It should be done by the arch maintainers. I'll provide some stand-alone test shortly. Those who are eager could compile glibc and run 'make check' (no installation needed). The patch below has been tested on x86 and x86-64. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-08	[PATCH] x86-64: Add sys_unshare	Andi Kleen
	Add unshare syscall for x86-64 ppoll/pselect are not ready yet, but add reservations. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-07	[PATCH] x86_64: Fix the node cpumask of a cpu going down	Ravikiran G Thirumalai
	Currently, x86_64 and ia64 arches do not clear the corresponding bits in the node's cpumask when a cpu goes down or cpu bring up is cancelled. This is buggy since there are pieces of common code where the cpumask is checked in the cpu down code path to decide on things (like in the slab down path). PPC does the right thing, but x86_64 and ia64 don't (This was the reason Sonny hit upon a slab bug during cpu offline on ppc and could not reproduce on other arches). This patch fixes it for x86_64. I won't attempt ia64 as I cannot test it. Credit for spotting this should go to Alok. (akpm: this was applied, then reverted. But it's OK now because we now use for_each_cpu() in the right places). Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05	[PATCH] Fix "value computed is not used" compile warnings with gcc-4.1	Takashi Iwai
	Fix gcc4.1 compile warnings "value computed is not used" with set_current_state() and set_task_state() on i386/SMP and x86-64. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-05	Revert "[PATCH] x86_64: Fix the node cpumask of a cpu going down"	Linus Torvalds
	This reverts commit 10f4dc8b27ac42f930ac55adb8c521264dc997f8. Quoth Andi Kleen: "Kiran decided that it makes the problem worse than it was before. Fixing it fully requires more work which is too much for 2.6.16. So please revert that commit for now." Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04	[PATCH] i386/x86-64: Don't ack the APIC for bad interrupts when the APIC is ↵	Andi Kleen
	not enabled It's bad juju to touch the APIC when it hasn't been enabled. I also moved ack_bad_irq for x86-64 out of line following i386. Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04	[PATCH] x86_64: Calibrate APIC timer using PM timer	Andi Kleen
	On some broken motherboards (at least one NForce3 based AMD64 laptop) the PIT timer runs at a incorrect frequency. This patch adds a new option "apicpmtimer" that allows to use the APIC timer and calibrate it using the PMTimer. It requires the earlier patch that allows to run the main timer from the APIC. Specifying apicpmtimer implies apicmaintimer. The option defaults to off for now. I tested it on a few systems and the resulting APIC timer frequencies were usually a bit off, but always <1%, which should be tolerable. TBD figure out heuristic to enable this automatically on the affected systems TBD perhaps do it on all NForce3s or using DMI? Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04	[PATCH] x86_64: Fix the node cpumask of a cpu going down	Ravikiran G Thirumalai
	Currently, x86_64 and ia64 arches do not clear the corresponding bits in the node's cpumask when a cpu goes down or cpu bring up is cancelled. This is buggy since there are pieces of common code where the cpumask is checked in the cpu down code path to decide on things (like in the slab down path). PPC does the right thing, but x86_64 and ia64 don't (This was the reason Sonny hit upon a slab bug during cpu offline on ppc and could not reproduce on other arches). This patch fixes it for x86_64. I won't attempt ia64 as I cannot test it. Credit for spotting this should go to Alok. Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04	[PATCH] x86_64: Undo the earlier changes to remove unrolled copy/memset ↵	Andi Kleen
	functions They cause quite bad performance regressions on Netburst This is temporary until we can get new optimized functions for these CPUs. This undoes changes that were done in 2.6.15 and in 2.6.16-rc1, essentially bringing the code back to 2.6.14 level. Only change is I renamed the X86_FEATURE_K8_C flag to X86_FEATURE_REP_GOOD and fixed the check for the flag and also fixed some comments. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04	[PATCH] x86_64: [PATCH] timer resume	Shaohua Li
	At resume time, TSC's value or something similar might be changed a lot against suspend time. This could make system gets a very big lost ticks. See http://bugzilla.kernel.org/show_bug.cgi?id=5825 Signed-off-by: Shaohua Li<shaohua.li@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04	[PATCH] x86_64: Allow to run main time keeping from the local APIC interrupt	Andi Kleen
	Another piece from the no-idle-tick patch. This can be enabled with the "apicmaintimer" option. This is mainly useful when the PIT/HPET interrupt is unreliable. Note there are some systems that are known to stop the APIC timer in C3. For those it will never work, but this case should be automatically detected. It also only works with PM timer right now. When HPET is used the way the main timer handler computes the delay doesn't work. It should be a bit more efficient because there is one less regular interrupt to process on the boot processor. Requires earlier bugfix from Venkatesh Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-04	[PATCH] x86_64: Define pmtmr_ioport to 0 when PM_TIMER is not available	Andi Kleen
	Avoids some ifdef mess later. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03	[PATCH] Compilation of kexec/kdump broken	Fernando Luis Vazquez Cao
	The compilation of kexec/kdump seems to be broken for x86_64. Remove the dependency of kexec on CONFIG_IA32_EMULATION. Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-03	[PATCH] Export cpu topology in sysfs	Zhang, Yanmin
	The patch implements cpu topology exportation by sysfs. Items (attributes) are similar to /proc/cpuinfo. 1) /sys/devices/system/cpu/cpuX/topology/physical_package_id: represent the physical package id of cpu X; 2) /sys/devices/system/cpu/cpuX/topology/core_id: represent the cpu core id to cpu X; 3) /sys/devices/system/cpu/cpuX/topology/thread_siblings: represent the thread siblings to cpu X in the same core; 4) /sys/devices/system/cpu/cpuX/topology/core_siblings: represent the thread siblings to cpu X in the same physical package; To implement it in an architecture-neutral way, a new source file, driver/base/topology.c, is to export the 5 attributes. If one architecture wants to support this feature, it just needs to implement 4 defines, typically in file include/asm-XXX/topology.h. The 4 defines are: #define topology_physical_package_id(cpu) #define topology_core_id(cpu) #define topology_thread_siblings(cpu) #define topology_core_siblings(cpu) The type of **_id is int. The type of siblings is cpumask_t. To be consistent on all architectures, the 4 attributes should have deafult values if their values are unavailable. Below is the rule. 1) physical_package_id: If cpu has no physical package id, -1 is the default value. 2) core_id: If cpu doesn't support multi-core, its core id is 0. 3) thread_siblings: Just include itself, if the cpu doesn't support HT/multi-thread. 4) core_siblings: Just include itself, if the cpu doesn't support multi-core and HT/Multi-thread. So be careful when declaring the 4 defines in include/asm-XXX/topology.h. If an attribute isn't defined on an architecture, it won't be exported. Thank Nathan, Greg, Andi, Paul and Venki. The patch provides defines for i386/x86_64/ia64. Signed-off-by: Zhang, Yanmin <yanmin.zhang@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01	Merge branch 'release' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
2006-02-01	[PATCH] prototypes for *at functions & typo fix	Ulrich Drepper
	Here's the follow-up patch which introduces the prototypes for the new syscalls. There was also a typo in one of the new symbols. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-24	[ACPI] merge 3549 4320 4485 4588 4980 5483 5651 acpica asus fops pnpacpi ↵	Len Brown
	branches into release Signed-off-by: Len Brown <len.brown@intel.com>
2006-01-18	[PATCH] EDAC: core EDAC support code	Alan Cox
	This is a subset of the bluesmoke project core code, stripped of the NMI work which isn't ready to merge and some of the "interesting" proc functionality that needs reworking or just has no place in kernel. It requires no core kernel changes except the added scrub functions already posted. The goal is to merge further functionality only after the core code is accepted and proven in the base kernel, and only at the point the upstream extras are really ready to merge. From: doug thompson <norsk5@xmission.com> This converts EDAC to sysfs and is the final chunk neccessary before EDAC has a stable user space API and can be considered for submission into the base kernel. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: doug thompson <norsk5@xmission.com> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18	[PATCH] EDAC: atomic scrub operations	Alan Cox
	EDAC requires a way to scrub memory if an ECC error is found and the chipset does not do the work automatically. That means rewriting memory locations atomically with respect to all CPUs _and_ bus masters. That means we can't use atomic_add(foo, 0) as it gets optimised for non-SMP This adds a function to include/asm-foo/atomic.h for the platforms currently supported which implements a scrub of a mapped block. It also adjusts a few other files include order where atomic.h is included before types.h as this now causes an error as atomic_scrub uses u32. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-18	[PATCH] vfs: *at functions: x86_64	Ulrich Drepper
	Wire up the x86_64 syscalls. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-16	[PATCH] x86_64: Fix VSMP build	Ravikiran G Thirumalai
	Patch fixes a build problem with CONFIG_X86_VSMP. The vSMP bits probably gathered some fuzz on its way to mainline, and safe_halt() which was outside the #endif (CONFIG_X86_VSMP) somehow got inside the !CONFIG_X86_VSMP condition, hence being undefined and breaking CONFIG_X86_VSMP builds. Patch takes safe_halt() and halt() macros out of the #endif Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-16	[PATCH] x86_64: Flexmap for 32bit and randomized mappings for 64bit	Andi Kleen
	Another try at this. For 32bit follow the 32bit implementation from Ingo - mappings are growing down from the end of stack now and vary randomly by 1GB. Randomized mappings for 64bit just vary the normal mmap break by 1TB. I didn't bother implementing full flex mmap for 64bit because it shouldn't be needed there. Cc: mingo@elte.hu Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-16	[PATCH] x86_64: Increase NR_IRQ_VECTORS to 32 * NR_CPUS	Andi Kleen
	This prevents running out of GSIs on large Unisys ES7000 machines. Follows i386 Cc: "Protasevich, Natalie" <Natalie.Protasevich@unisys.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-16	[PATCH] x86_64: Allow nesting of int3 by default for kprobes	Andi Kleen
	This unbreaks recursive kprobes which didn't work anymore due to an earlier patch which converted the debug entry point to use an IST. This also allows nesting of the debug entry point too. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14	[PATCH] mark several functions __always_inline	Ingo Molnar
	Arjan van de Ven <arjan@infradead.org> Mark a number of functions as 'must inline'. The functions affected by this patch need to be inlined because they use knowledge that their arguments are constant so that most of the function optimizes away. At this point this patch does not change behavior, it's for documentation only (and for future patches in the inline series) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12	[PATCH] death of get_thread_info/put_thread_info	Al Viro
	{get,put}_thread_info() were introduced in 2.5.4 and never had been called by anything in the tree. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12	[PATCH] amd64: task_pt_regs()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12	[PATCH] amd64: task_thread_info()	Al Viro
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12	[PATCH] scheduler cache-hot-autodetect	akpm@osdl.org
	) From: Ingo Molnar <mingo@elte.hu> This is the latest version of the scheduler cache-hot-auto-tune patch. The first problem was that detection time scaled with O(N^2), which is unacceptable on larger SMP and NUMA systems. To solve this: - I've added a 'domain distance' function, which is used to cache measurement results. Each distance is only measured once. This means that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT distances 0 and 1, and on SMP distance 0 is measured. The code walks the domain tree to determine the distance, so it automatically follows whatever hierarchy an architecture sets up. This cuts down on the boot time significantly and removes the O(N^2) limit. The only assumption is that migration costs can be expressed as a function of domain distance - this covers the overwhelming majority of existing systems, and is a good guess even for more assymetric systems. [ People hacking systems that have assymetries that break this assumption (e.g. different CPU speeds) should experiment a bit with the cpu_distance() function. Adding a ->migration_distance factor to the domain structure would be one possible solution - but lets first see the problem systems, if they exist at all. Lets not overdesign. ] Another problem was that only a single cache-size was used for measuring the cost of migration, and most architectures didnt set that variable up. Furthermore, a single cache-size does not fit NUMA hierarchies with L3 caches and does not fit HT setups, where different CPUs will often have different 'effective cache sizes'. To solve this problem: - Instead of relying on a single cache-size provided by the platform and sticking to it, the code now auto-detects the 'effective migration cost' between two measured CPUs, via iterating through a wide range of cachesizes. The code searches for the maximum migration cost, which occurs when the working set of the test-workload falls just below the 'effective cache size'. I.e. real-life optimized search is done for the maximum migration cost, between two real CPUs. This, amongst other things, has the positive effect hat if e.g. two CPUs share a L2/L3 cache, a different (and accurate) migration cost will be found than between two CPUs on the same system that dont share any caches. (The reliable measurement of migration costs is tricky - see the source for details.) Furthermore i've added various boot-time options to override/tune migration behavior. Firstly, there's a blanket override for autodetection: migration_cost=1000,2000,3000 will override the depth 0/1/2 values with 1msec/2msec/3msec values. Secondly, there's a global factor that can be used to increase (or decrease) the autodetected values: migration_factor=120 will increase the autodetected values by 20%. This option is useful to tune things in a workload-dependent way - e.g. if a workload is cache-insensitive then CPU utilization can be maximized by specifying migration_factor=0. I've tested the autodetection code quite extensively on x86, on 3 P3/Xeon/2MB, and the autodetected values look pretty good: Dual Celeron (128K L2 cache): --------------------- migration cost matrix (max_cache_size: 131072, cpu: 467 MHz): --------------------- [00] [01] [00]: - 1.7(1) [01]: 1.7(1) - --------------------- cacheflush times [2]: 0.0 (0) 1.7 (1784008) --------------------- Here the slow memory subsystem dominates system performance, and even though caches are small, the migration cost is 1.7 msecs. Dual HT P4 (512K L2 cache): --------------------- migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz): --------------------- [00] [01] [02] [03] [00]: - 0.4(1) 0.0(0) 0.4(1) [01]: 0.4(1) - 0.4(1) 0.0(0) [02]: 0.0(0) 0.4(1) - 0.4(1) [03]: 0.4(1) 0.0(0) 0.4(1) - --------------------- cacheflush times [2]: 0.0 (33900) 0.4 (448514) --------------------- Here it can be seen that there is no migration cost between two HT siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory system makes inter-physical-CPU migration pretty cheap: 0.4 msecs. 8-way P3/Xeon [2MB L2 cache]: --------------------- migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz): --------------------- [00] [01] [02] [03] [04] [05] [06] [07] [00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) [04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) [05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) [06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) [07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - --------------------- cacheflush times [2]: 0.0 (0) 19.2 (19281756) --------------------- This one has huge caches and a relatively slow memory subsystem - so the migration cost is 19 msecs. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: <wilder@us.ibm.com> Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-12	[PATCH] sched: add cacheflush() asm	Ingo Molnar
	Add per-arch sched_cacheflush() which is a write-back cacheflush used by the migration-cost calibration code at bootup time. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: Some housekeeping in local APIC code	Andi Kleen
	Remove support for obsolete hardware and cleanup. - Remove checks for non integrated APICs - Replace apic_write_around with apic_write. - Remove apic_read_around - Remove APIC version reads used by old workarounds - Remove old workaround for Simics - Fix indentation Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: Display meaningful part of filename during BUG()	Jan Beulich
	When building in a separate objtree, file names produced by BUG() & Co. can get fairly long; printing only the first 50 characters may thus result in (almost) no useful information. The following change makes it so that rather the last 50 characters of the filename get printed. Signed-Off-By: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: Remove unused AMD K8 C stepping flag	Andi Kleen
	X86_FEATURE_K8_C was a synthetic Linux CPUID flag that was used for some code optimizations in Opteron C stepping or later. But support for pre C stepping optimizations has been removed, so this isn't needed anymore. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: sparse warning cleanups	Stephen Hemminger
	Fix some trivial sparse warnings in x86_64 code. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: Move NUMA page_to_pfn/pfn_to_page functions out of line	Andi Kleen
	Saves about ~18K .text in defconfig There would be more optimization potential, but that's for later. Suggestion originally from Bill Irwin. Fix from Andy Whitcroft. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: Remove unused segments	Andi Kleen
	They used to be used by the reboot code, but not anymore. Noticed by Jan Beulich Cc: JBeulich@novell.com Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: Inclusion of ScaleMP vSMP architecture patches - vsmp_arch	Ravikiran G Thirumalai
	Introduce vSMP arch to the kernel. This patch: 1. Adds CONFIG_X86_VSMP 2. Adds machine specific macros for local_irq_disabled, local_irq_enabled and irqs_disabled 3. Writes to the vSMP CTL device to indicate kernel compiled with CONFIG_VSMP Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com> Signed-off-by: Shai Fultheim <shai@scalemp.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: Inclusion of ScaleMP vSMP architecture patches - vsmp_align	Ravikiran G Thirumalai
	vSMP specific alignment patch to 1. Define INTERNODE_CACHE_SHIFT for vSMP 2. Use this for alignment of critical structures 3. Use INTERNODE_CACHE_SHIFT for ARCH_MIN_TASKALIGN, and let the slab align task_struct allocations to the internode cacheline size 4. Introduce and use ARCH_MIN_MMSTRUCT_ALIGN for mm_struct slab allocations. Signed-off-by: Ravikiran Thirumalai <kiran@scalemp.com> Signed-off-by: Shai Fultheim <shai@scalemp.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: Make sure BITS_PER_ATOMIC is defined in asm-generic/atomic.h	Andi Kleen
	Fixes CC fs/nfsctl.o In file included from include2/asm/atomic.h:427, from /home/lsrc/quilt/linux/include/linux/file.h:8, from /home/lsrc/quilt/linux/fs/nfsctl.c:8: /home/lsrc/quilt/linux/include/asm-generic/atomic.h:20:5: warning: "BITS_PER_LONG" is not defined Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-11	[PATCH] x86_64: cleanup enter_lazy_tlb()	Brian Gerst
	Move the #ifdef into the function body. Signed-off-by: Brian Gerst <bgerst@didntduck.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>