aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2009-06-25x86, mce: percpu mcheck_timer should be pinnedHidetoshi Seto
If CONFIG_NO_HZ + CONFIG_SMP, timer added via add_timer() might be migrated on other cpu. Use add_timer_on() instead. Avoids the following failure: Maciej Rutecki wrote: > > After normal boot I try: > > > > echo 1 > /sys/devices/system/machinecheck/machinecheck0/check_interval > > > > I found this in dmesg: > > > > [ 141.704025] ------------[ cut here ]------------ > > [ 141.704039] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102 > > mcheck_timer+0xf5/0x100() Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-25x86: Add sysctl to allow panic on IOCK NMI errorKurt Garloff
This patch introduces a new sysctl: /proc/sys/kernel/panic_on_io_nmi which defaults to 0 (off). When enabled, the kernel panics when the kernel receives an NMI caused by an IO error. The IO error triggered NMI indicates a serious system condition, which could result in IO data corruption. Rather than contiuing, panicing and dumping might be a better choice, so one can figure out what's causing the IO error. This could be especially important to companies running IO intensive applications where corruption must be avoided, e.g. a bank's databases. [ SuSE has been shipping it for a while, it was done at the request of a large database vendor, for their users. ] Signed-off-by: Kurt Garloff <garloff@suse.de> Signed-off-by: Roberto Angelino <robertangelino@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <20090624213211.GA11291@kroah.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24x86: Fix uv bau sending buffer initializationCliff Wickman
The initialization of the UV Broadcast Assist Unit's sending buffers was making an invalid assumption about the initialization of an MMR that defines its address. The BIOS will not be providing that MMR. So uv_activation_descriptor_init() should unconditionally set it. Tested on UV simulator. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: <stable@kernel.org> # for v2.6.30.x LKML-Reference: <E1MJTfj-0005i1-W8@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-23x86: Move init_gbpages() to setup_arch()Pekka J Enberg
The init_gbpages() function is conditionally called from init_memory_mapping() function. There are two call-sites where this 'after_bootmem' condition can be true: setup_arch() and mem_init() via pci_iommu_alloc(). Therefore, it's safe to move the call to init_gbpages() to setup_arch() as it's always called before mem_init(). This removes an after_bootmem use - paving the way to remove all uses of that state variable. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <Pine.LNX.4.64.0906221731210.19474@melkki.cs.Helsinki.FI> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-22Merge branch 'for-tip' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into x86/urgent
2009-06-22x86: ensure percpu lpage doesn't consume too much vmalloc spaceTejun Heo
On extreme configuration (e.g. 32bit 32-way NUMA machine), lpage percpu first chunk allocator can consume too much of vmalloc space. Make it fall back to 4k allocator if the consumption goes over 20%. [ Impact: add sanity check for lpage percpu first chunk allocator ] Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22x86: implement percpu_alloc kernel parameterTejun Heo
According to Andi, it isn't clear whether lpage allocator is worth the trouble as there are many processors where PMD TLB is far scarcer than PTE TLB. The advantage or disadvantage probably depends on the actual size of percpu area and specific processor. As performance degradation due to TLB pressure tends to be highly workload specific and subtle, it is difficult to decide which way to go without more data. This patch implements percpu_alloc kernel parameter to allow selecting which first chunk allocator to use to ease debugging and testing. While at it, make sure all the failure paths report why something failed to help determining why certain allocator isn't working. Also, kill the "Great future plan" comment which had already been realized quite some time ago. [ Impact: allow explicit percpu first chunk allocator selection ] Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22x86: fix pageattr handling for lpage percpu allocator and re-enable itTejun Heo
lpage allocator aliases a PMD page for each cpu and returns whatever is unused to the page allocator. When the pageattr of the recycled pages are changed, this makes the two aliases point to the overlapping regions with different attributes which isn't allowed and known to cause subtle data corruption in certain cases. This can be handled in simliar manner to the x86_64 highmap alias. pageattr code should detect if the target pages have PMD alias and split the PMD alias and synchronize the attributes. pcpur allocator is updated to keep the allocated PMD pages map sorted in ascending address order and provide pcpu_lpage_remapped() function which binary searches the array to determine whether the given address is aliased and if so to which address. pageattr is updated to use pcpu_lpage_remapped() to detect the PMD alias and split it up as necessary from cpa_process_alias(). Jan Beulich spotted the original problem and incorrect usage of vaddr instead of laddr for lookup. With this, lpage percpu allocator should work correctly. Re-enable it. [ Impact: fix subtle lpage pageattr bug and re-enable lpage ] Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22x86: prepare setup_pcpu_lpage() for pageattr fixTejun Heo
Make the following changes in preparation of coming pageattr updates. * Define and use array of struct pcpul_ent instead of array of pointers. The only difference is ->cpu field which is set but unused yet. * Rename variables according to the above change. * Rename local variable vm to pcpul_vm and move it out of the function. [ Impact: no functional difference ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22x86: rename remap percpu first chunk allocator to lpageTejun Heo
The "remap" allocator remaps large pages to build the first chunk; however, the name isn't very good because 4k allocator remaps too and the whole point of the remap allocator is using large page mapping. The allocator will be generalized and exported outside of x86, rename it to lpage before that happens. percpu_alloc kernel parameter is updated to accept both "remap" and "lpage" for lpage allocator. [ Impact: code cleanup, kernel parameter argument updated ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22x86: fix duplicate free in setup_pcpu_remap() failure pathTejun Heo
In the failure path, setup_pcpu_remap() tries to free the area which has already been freed to make holes in the large page. Fix it. [ Impact: fix duplicate free in failure path ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-21x86: Set cpu_llc_id on AMD CPUsAndreas Herrmann
This counts when building sched domains in case NUMA information is not available. ( See cpu_coregroup_mask() which uses llc_shared_map which in turn is created based on cpu_llc_id. ) Currently Linux builds domains as follows: (example from a dual socket quad-core system) CPU0 attaching sched-domain: domain 0: span 0-7 level CPU groups: 0 1 2 3 4 5 6 7 ... CPU7 attaching sched-domain: domain 0: span 0-7 level CPU groups: 7 0 1 2 3 4 5 6 Ever since that is borked for multi-core AMD CPU systems. This patch fixes that and now we get a proper: CPU0 attaching sched-domain: domain 0: span 0-3 level MC groups: 0 1 2 3 domain 1: span 0-7 level CPU groups: 0-3 4-7 ... CPU7 attaching sched-domain: domain 0: span 4-7 level MC groups: 7 4 5 6 domain 1: span 0-7 level CPU groups: 4-7 0-3 This allows scheduler to assign tasks to cores on different sockets (i.e. that don't share last level cache) for performance reasons. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20090619085909.GJ5218@alberich.amd.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-20Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...
2009-06-20Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix out of scope variable access in sched_slice() sched: Hide runqueues from direct refer at source code level sched: Remove unneeded __ref tag sched, x86: Fix cpufreq + sched_clock() TSC scaling
2009-06-20Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (24 commits) tracing/urgent: warn in case of ftrace_start_up inbalance tracing/urgent: fix unbalanced ftrace_start_up function-graph: add stack frame test function-graph: disable when both x86_32 and optimize for size are configured ring-buffer: have benchmark test print to trace buffer ring-buffer: do not grab locks in nmi ring-buffer: add locks around rb_per_cpu_empty ring-buffer: check for less than two in size allocation ring-buffer: remove useless compile check for buffer_page size ring-buffer: remove useless warn on check ring-buffer: use BUF_PAGE_HDR_SIZE in calculating index tracing: update sample event documentation tracing/filters: fix race between filter setting and module unload tracing/filters: free filter_string in destroy_preds() ring-buffer: use commit counters for commit pointer accounting ring-buffer: remove unused variable ring-buffer: have benchmark test handle discarded events ring-buffer: prevent adding write in discarded area tracing/filters: strloc should be unsigned short tracing/filters: operand can be negative ... Fix up kmemcheck-induced conflict in kernel/trace/ring_buffer.c manually
2009-06-20Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits) x86, mce: fix error path in mce_create_device() x86: use zalloc_cpumask_var for mce_dev_initialized x86: fix duplicated sysfs attribute x86: de-assembler-ize asm/desc.h i386: fix/simplify espfix stack switching, move it into assembly i386: fix return to 16-bit stack from NMI handler x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present x86: Remove duplicated #include's x86: msr.h linux/types.h is only required for __KERNEL__ x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround x86, mce: mce_intel.c needs <asm/apic.h> x86: apic/io_apic.c: dmar_msi_type should be static x86, io_apic.c: Work around compiler warning x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present x86: mce: Handle banks == 0 case in K7 quirk x86, boot: use .code16gcc instead of .code16 x86: correct the conversion of EFI memory types x86: cap iomem_resource to addressable physical memory x86, mce: rename _64.c files which are no longer 64-bit-specific x86, mce: mce.h cleanup ... Manually fix up trivial conflict in arch/x86/mm/fault.c
2009-06-20Merge branch 'x86/mce3' into x86/urgentIngo Molnar
2009-06-19perf_counter: Make callchain samples extensiblePeter Zijlstra
Before exposing upstream tools to a callchain-samples ABI, tidy it up to make it more extensible in the future: Use markers in the IP chain to denote context, use (u64)-1..-4095 range for these context markers because we use them for ERR_PTR(), so these addresses are unlikely to be mapped. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18function-graph: add stack frame testSteven Rostedt
In case gcc does something funny with the stack frames, or the return from function code, we would like to detect that. An arch may implement passing of a variable that is unique to the function and can be saved on entering a function and can be tested when exiting the function. Usually the frame pointer can be used for this purpose. This patch also implements this for x86. Where it passes in the stack frame of the parent function, and will test that frame on exit. There was a case in x86_32 with optimize for size (-Os) where, for a few functions, gcc would align the stack frame and place a copy of the return address into it. The function graph tracer modified the copy and not the actual return address. On return from the funtion, it did not go to the tracer hook, but returned to the parent. This broke the function graph tracer, because the return of the parent (where gcc did not do this funky manipulation) returned to the location that the child function was suppose to. This caused strange kernel crashes. This test detected the problem and pointed out where the issue was. This modifies the parameters of one of the functions that the arch specific code calls, so it includes changes to arch code to accommodate the new prototype. Note, I notice that the parsic arch implements its own push_return_trace. This is now a generic function and the ftrace_push_return_trace should be used instead. This patch does not touch that code. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-06-18gcov: enable GCOV_PROFILE_ALL for x86_64Peter Oberparleiter
Enable gcov profiling of the entire kernel on x86_64. Required changes include disabling profiling for: * arch/kernel/acpi/realmode and arch/kernel/boot/compressed: not linked to main kernel * arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet: profiling causes segfaults during boot (incompatible context) Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Cc: Li Wei <W.Li@Sun.COM> Cc: Michael Ellerman <michaele@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com> Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18x86, mce: fix error path in mce_create_device()Hidetoshi Seto
Don't skip removing mce_attrs in route from error2. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17x86: use zalloc_cpumask_var for mce_dev_initializedYinghai Lu
We need a cleared cpu_mask to record if mce is initialized, especially when MAXSMP is used. used zalloc_... instead Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17x86: fix duplicated sysfs attributeYinghai Lu
The sysfs attribute cmci_disabled was accidentall turned into a duplicate of ignore_ce, breaking all other attributes. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17x86: de-assembler-ize asm/desc.hAlexander van Heukelum
asm/desc.h is included in three assembly files, but the only macro it defines, GET_DESC_BASE, is never used. This patch removes the includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard from asm/desc.h. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17i386: fix/simplify espfix stack switching, move it into assemblyAlexander van Heukelum
The espfix code triggers if we have a protected mode userspace application with a 16-bit stack. On returning to userspace, with iret, the CPU doesn't restore the high word of the stack pointer. This is an "official" bug, and the work-around used in the kernel is to temporarily switch to a 32-bit stack segment/pointer pair where the high word of the pointer is equal to the high word of the userspace stackpointer. The current implementation uses THREAD_SIZE to determine the cut-off, but there is no good reason not to use the more natural 64kb... However, implementing this by simply substituting THREAD_SIZE with 65536 in patch_espfix_desc crashed the test application. patch_espfix_desc tries to do what is described above, but gets it subtly wrong if the userspace stack pointer is just below a multiple of THREAD_SIZE: an overflow occurs to bit 13... With a bit of luck, when the kernelspace stackpointer is just below a 64kb-boundary, the overflow then ripples trough to bit 16 and userspace will see its stack pointer changed by 65536. This patch moves all espfix code into entry_32.S. Selecting a 16-bit cut-off simplifies the code. The game with changing the limit dynamically is removed too. It complicates matters and I see no value in it. Changing only the top 16-bit word of ESP is one instruction and it also implies that only two bytes of the ESPFIX GDT entry need to be changed and this can be implemented in just a handful simple to understand instructions. As a side effect, the operation to compute the original ESP from the ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining three instructions have been expanded inline in entry_32.S. impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit stack segment Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Stas Sergeev <stsp@aknet.ru> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17i386: fix return to 16-bit stack from NMI handlerAlexander van Heukelum
Returning to a task with a 16-bit stack requires special care: the iret instruction does not restore the high word of esp in that case. The espfix code fixes this, but currently is not invoked on NMIs. This means that a running task gets the upper word of esp clobbered due intervening NMIs. To reproduce, compile and run the following program with the nmi watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can see that the high bits of esp contain garbage, while the low bits are still correct. This patch puts the espfix code back into the NMI code path. The patch is slightly complicated due to the irqtrace infrastructure not being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET. Otherwise, the tail of the normal iret-code is correct for the nmi code path too. To be able to share this code-path, the TRACE_IRQS_IRET was move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but this code explicitly disables interrupts. This short interrupts-off section is now not traced anymore. The return-to-kernel path now always includes the preliminary test to decide if the espfix code should be called. This is never the case, but doing it this way keeps the patch as simple as possible and the few extra instructions should not affect timing in any significant way. #define _GNU_SOURCE #include <stdio.h> #include <sys/types.h> #include <sys/mman.h> #include <unistd.h> #include <sys/syscall.h> #include <asm/ldt.h> int modify_ldt(int func, void *ptr, unsigned long bytecount) { return syscall(SYS_modify_ldt, func, ptr, bytecount); } /* this is assumed to be usable */ #define SEGBASEADDR 0x10000 #define SEGLIMIT 0x20000 /* 16-bit segment */ struct user_desc desc = { .entry_number = 0, .base_addr = SEGBASEADDR, .limit = SEGLIMIT, .seg_32bit = 0, .contents = 0, /* ??? */ .read_exec_only = 0, .limit_in_pages = 0, .seg_not_present = 0, .useable = 1 }; int main(void) { setvbuf(stdout, NULL, _IONBF, 0); /* map a 64 kb segment */ char *pointer = mmap((void *)SEGBASEADDR, SEGLIMIT+1, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0); if (pointer == NULL) { printf("could not map space\n"); return 0; } /* write ldt, new mode */ int err = modify_ldt(0x11, &desc, sizeof(desc)); if (err) { printf("error modifying ldt: %i\n", err); return 0; } for (int i=0; i<1000; i++) { asm volatile ( "pusha\n\t" "mov %ss, %eax\n\t" /* preserve ss:esp */ "mov %esp, %ebp\n\t" "push $7\n\t" /* index 0, ldt, user mode */ "push $65536-4096\n\t" /* esp */ "lss (%esp), %esp\n\t" /* switch to new stack */ "push %eax\n\t" /* save old ss:esp on new stack */ "push %ebp\n\t" "add $17*65536, %esp\n\t" /* set high bits */ "mov %esp, %edx\n\t" "mov $10000000, %ecx\n\t" /* wait... */ "1: loop 1b\n\t" /* ... a bit */ "cmp %esp, %edx\n\t" "je 1f\n\t" "ud2\n\t" /* esp changed inexplicably! */ "1:\n\t" "sub $17*65536, %esp\n\t" /* restore high bits */ "lss (%esp), %esp\n\t" /* restore old ss:esp */ "popa\n\t"); printf("\rx%ix", i); } return 0; } Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Stas Sergeev <stsp@aknet.ru> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-17x86, ioapic: Don't call disconnect_bsp_APIC if no APIC presentCyrill Gorcunov
Vegard Nossum reported: [ 503.576724] ACPI: Preparing to enter system sleep state S5 [ 503.710857] Disabling non-boot CPUs ... [ 503.716853] Power down. [ 503.717770] ------------[ cut here ]------------ [ 503.717770] WARNING: at arch/x86/kernel/apic/apic.c:249 native_apic_write_du) [ 503.717770] Hardware name: OptiPlex GX100 [ 503.717770] Modules linked in: [ 503.717770] Pid: 2136, comm: halt Not tainted 2.6.30 #443 [ 503.717770] Call Trace: [ 503.717770] [<c154d327>] ? printk+0x18/0x1a [ 503.717770] [<c1017358>] ? native_apic_write_dummy+0x38/0x50 [ 503.717770] [<c10360fc>] warn_slowpath_common+0x6c/0xc0 [ 503.717770] [<c1017358>] ? native_apic_write_dummy+0x38/0x50 [ 503.717770] [<c1036165>] warn_slowpath_null+0x15/0x20 [ 503.717770] [<c1017358>] native_apic_write_dummy+0x38/0x50 [ 503.717770] [<c1017173>] disconnect_bsp_APIC+0x63/0x100 [ 503.717770] [<c1019e48>] disable_IO_APIC+0xb8/0xc0 [ 503.717770] [<c1214231>] ? acpi_power_off+0x0/0x29 [ 503.717770] [<c1015e55>] native_machine_shutdown+0x65/0x80 [ 503.717770] [<c1015c36>] native_machine_power_off+0x26/0x30 [ 503.717770] [<c1015c49>] machine_power_off+0x9/0x10 [ 503.717770] [<c1046596>] kernel_power_off+0x36/0x40 [ 503.717770] [<c104680d>] sys_reboot+0xfd/0x1f0 [ 503.717770] [<c109daa0>] ? perf_swcounter_event+0xb0/0x130 [ 503.717770] [<c109db7d>] ? perf_counter_task_sched_out+0x5d/0x120 [ 503.717770] [<c102dfc6>] ? finish_task_switch+0x56/0xd0 [ 503.717770] [<c154da1e>] ? schedule+0x49e/0xb40 [ 503.717770] [<c10444b0>] ? sys_kill+0x70/0x160 [ 503.717770] [<c119d9db>] ? selinux_file_ioctl+0x3b/0x50 [ 503.717770] [<c10dd443>] ? sys_ioctl+0x63/0x70 [ 503.717770] [<c1003024>] sysenter_do_call+0x12/0x22 [ 503.717770] ---[ end trace 8157b5d0ed378f15 ]--- | | That's including this commit: | | commit 103428e57be323c3c5545db8ad12667099bc6005 |Author: Cyrill Gorcunov <gorcunov@openvz.org> |Date: Sun Jun 7 16:48:40 2009 +0400 | | x86, apic: Fix dummy apic read operation together with broken MP handling | If we have apic disabled we don't even switch to APIC mode and do not calling for connect_bsp_APIC. Though on SMP compiled kernel the native_machine_shutdown does try to write the apic register anyway. Fix it with explicit check if we really should touch apic registers. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090617181322.GG10822@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17perf_counter: x86: Set the period in the intel overflow handlerPeter Zijlstra
Commit 9e350de37ac960 ("perf_counter: Accurate period data") missed a spot, which caused all Intel-PMU samples to have a period of 0. This broke auto-freq sampling. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17x86: Remove duplicated #include'sHuang Weiyi
Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> LKML-Reference: <1244895686-2348-1-git-send-email-weiyi.huang@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17Merge branch 'fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c [CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c [CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c [CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c [CPUFREQ] powernow-k8: get drv data for correct CPU [CPUFREQ] powernow-k8: read P-state from HW [CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[] [CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier() [CPUFREQ] minor correction to cpu-freq documentation [CPUFREQ] powernow-k8.c: mess cleanup [CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful [CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0 [CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case
2009-06-17Merge branch 'x86/urgent' into x86/mce3Ingo Molnar
Conflicts: arch/x86/kernel/cpu/mcheck/mce_intel.c Merge reason: merge with an urgent-branch MCE fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaroundPrarit Bhargava
Expand Intel NMI perfctr1 workaround to include a Core2 processor stepping (cpuid family-6, model-f, stepping-4). Resolves a situation where the NMI would not enable on these processors. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: prarit@redhat.com Cc: suresh.b.siddha@intel.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17x86, mce: mce_intel.c needs <asm/apic.h>H. Peter Anvin
mce_intel.c uses apic_write() and lapic_get_maxlvt(), and so it needs <asm/apic.h>. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
2009-06-17x86: apic/io_apic.c: dmar_msi_type should be staticJaswinder Singh Rajput
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17x86, io_apic.c: Work around compiler warningFigo.zhang
This compiler warning: arch/x86/kernel/apic/io_apic.c: In function ‘ioapic_write_entry’: arch/x86/kernel/apic/io_apic.c:466: warning: ‘eu’ is used uninitialized in this function arch/x86/kernel/apic/io_apic.c:465: note: ‘eu’ was declared here Is bogus as 'eu' is always initialized. But annotate it away by initializing the variable, to make it easier for people to notice real warnings. A compiler that sees through this logic will optimize away the initialization. Signed-off-by: Figo.zhang <figo1802@gmail.com> LKML-Reference: <1245248720.3312.27.camel@myhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC presentCyrill Gorcunov
If APIC was disabled (for some reason) and as result it's not even mapped we should not try to enable thermal interrupts at all. Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk> Tested-by: Simon Holm Thøgersen <odie@cs.aau.dk> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090615182633.GA7606@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17sched, x86: Fix cpufreq + sched_clock() TSC scalingPeter Zijlstra
For freqency dependent TSCs we only scale the cycles, we do not account for the discrepancy in absolute value. Our current formula is: time = cycles * mult (where mult is a function of the cpu-speed on variable tsc machines) Suppose our current cycle count is 10, and we have a multiplier of 5, then our time value would end up being 50. Now cpufreq comes along and changes the multiplier to say 3 or 7, which would result in our time being resp. 30 or 70. That means that we can observe random jumps in the time value due to frequency changes in both fwd and bwd direction. So what this patch does is change the formula to: time = cycles * frequency + offset And we calculate offset so that time_before == time_after, thereby ridding us of these jumps in time. [ Impact: fix/reduce sched_clock() jumps across frequency changing events ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu> Chucked-on-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-06-17Merge branch 'linus' into perfcounters/coreIngo Molnar
Conflicts: arch/x86/include/asm/kmap_types.h include/linux/mm.h include/asm-generic/kmap_types.h Merge reason: We crossed changes with kmap_types.h cleanups in mainline. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17x86: mce: Handle banks == 0 case in K7 quirkAndi Kleen
Vegard Nossum reported: > I get an MCE-related crash like this in latest linus tree: > > [ 0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) > [ 0.116396] CPU: L2 Cache: 512K (64 bytes/line) > [ 0.120570] mce: CPU supports 0 MCE banks > [ 0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000010 > [ 0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320 > [ 0.128001] PGD 0 > [ 0.128001] Thread overran stack, or stack corrupted > [ 0.128001] Oops: 0002 [#1] PREEMPT SMP > [ 0.128001] last sysfs file: > [ 0.128001] CPU 0 > [ 0.128001] Modules linked in: > [ 0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426 > [ 0.128001] RIP: 0010:[<ffffffff813b98ad>] [<ffffffff813b98ad>] mcheck_init+0x278/0x320 > [ 0.128001] RSP: 0018:ffffffff81595e38 EFLAGS: 00000246 > [ 0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000 > [ 0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010 > [ 0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000 > [ 0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000 > [ 0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000 > [ 0.128001] FS: 0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000 > 00000000000 > [ 0.128001] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > [ 0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0 > [ 0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000 > [ 0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff > ff8152a4a0) > [ 0.128001] Stack: > [ 0.128001] 0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f > 914 > [ 0.128001] ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8 > 69c > [ 0.128001] 5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb > e6e > [ 0.128001] Call Trace: > [ 0.128001] [<ffffffff813b869c>] identify_cpu+0x331/0x392 > [ 0.128001] [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e > [ 0.128001] [<ffffffff815a14ac>] check_bugs+0x1c/0x60 > [ 0.128001] [<ffffffff8159c075>] start_kernel+0x403/0x46e > [ 0.128001] [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5 > [ 0.128001] [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b > [ 0.128001] [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71 This happens on QEMU which reports MCA capability, but no banks. Without this patch there is a buffer overrun and boot ops because the code would try to initialize the 0 element of a zero length kmalloc() buffer. Reported-by: Vegard Nossum <vegard.nossum@gmail.com> Tested-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20090615125200.GD31969@one.firstfloor.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17Merge branch 'linus' into x86/urgentIngo Molnar
Merge reason: pull in latest to fix a bug in it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-16Merge branch 'akpm'Linus Torvalds
* akpm: (182 commits) fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset fbdev: *bfin*: fix __dev{init,exit} markings fbdev: *bfin*: drop unnecessary calls to memset fbdev: bfin-t350mcqb-fb: drop unused local variables fbdev: blackfin has __raw I/O accessors, so use them in fb.h fbdev: s1d13xxxfb: add accelerated bitblt functions tcx: use standard fields for framebuffer physical address and length fbdev: add support for handoff from firmware to hw framebuffers intelfb: fix a bug when changing video timing fbdev: use framebuffer_release() for freeing fb_info structures radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb? s3c-fb: CPUFREQ frequency scaling support s3c-fb: fix resource releasing on error during probing carminefb: fix possible access beyond end of carmine_modedb[] acornfb: remove fb_mmap function mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF mb862xxfb: restrict compliation of platform driver to PPC Samsung SoC Framebuffer driver: add Alpha Channel support atmel-lcdc: fix pixclock upper bound detection offb: use framebuffer_alloc() to allocate fb_info struct ... Manually fix up conflicts due to kmemcheck in mm/slab.c
2009-06-16use printk_once() in several placesMinchan Kim
There are some places to be able to use printk_once instead of hard coding. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: David S. Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16mm: consolidate init_mm definitionAlexey Dobriyan
* create mm/init-mm.c, move init_mm there * remove INIT_MM, initialize init_mm with C99 initializer * unexport init_mm on all arches: init_mm is already unexported on x86. One strange place is some OMAP driver (drivers/video/omap/) which won't build modular, but it's already wants get_vm_area() export. Somebody should look there. [akpm@linux-foundation.org: add missing #includes] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Frysinger <vapier.adi@gmail.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16time: move PIT_TICK_RATE to linux/timex.hArnd Bergmann
PIT_TICK_RATE is currently defined in four architectures, but in three different places. While linux/timex.h is not the perfect place for it, it is still a reasonable replacement for those drivers that traditionally use asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency. Note that for Alpha, the actual value changed from 1193182UL to 1193180UL. This is unlikely to make a difference, and probably can only improve accuracy. There was a discussion on the correct value of CLOCK_TICK_RATE a few years ago, after which every existing instance was getting changed to 1193182. According to the specification, it should be 1193181.818181... Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: john stultz <johnstul@us.ibm.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16x86: correct the conversion of EFI memory typesCliff Wickman
This patch causes all the EFI_RESERVED_TYPE memory reservations to be recorded in the e820 table as type E820_RESERVED. (This patch replaces one called 'x86: vendor reserved memory type'. This version has been discussed a bit with Peter and Yinghai but not given a final opinion.) Without this patch EFI_RESERVED_TYPE memory reservations may be marked usable in the e820 table. There may be a collision between kernel use and some reserver's use of this memory. (An example use of this functionality is the UV system, which will access extremely large areas of memory with a memory engine that allows a user to address beyond the processor's range. Such areas are reserved in the EFI table by the BIOS. Some loaders have a restricted number of entries possible in the e820 table, hence the need to record the reservations in the unrestricted EFI table.) The call to do_add_efi_memmap() is only made if "add_efi_memmap" is specified on the kernel command line. Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16x86: cap iomem_resource to addressable physical memoryH. Peter Anvin
iomem_resource is by default initialized to -1, which means 64 bits of physical address space if 64-bit resources are enabled. However, x86 CPUs cannot address 64 bits of physical address space. Thus, we want to cap the physical address space to what the union of all CPU can actually address. Without this patch, we may end up assigning inaccessible values to uninitialized 64-bit PCI memory resources. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Martin Mares <mj@ucw.cz> Cc: stable@kernel.org
2009-06-16x86, mce: rename _64.c files which are no longer 64-bit-specificHidetoshi Seto
Rename files that are no longer 64bit specific: mce_amd_64.c => mce_amd.c mce_intel_64.c => mce_intel.c Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16x86, mce: remove therm_throt.hHidetoshi Seto
Now all symbols in the header are static. Remove the header. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16x86, mce: remove intel_set_thermal_handler()Hidetoshi Seto
and make intel_thermal_interrupt() static. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-16x86, mce: squash mce_intel.c into therm_throt.cHidetoshi Seto
move intel_init_thermal() into therm_throt.c Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>