aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2009-06-03x86, mce: define MCE_VECTORAndi Kleen
Add MCE_VECTOR for the #MC exception. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86, mce: rename mce_notify_user to mce_notify_irqAndi Kleen
Rename the mce_notify_user function to mce_notify_irq. The next patch will split the wakeup handling of interrupt context and of process context and it's better to give it a clearer name for this. Contains a fix from Ying Huang [ Impact: cleanup ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86: fix panic with interrupts off (needed for MCE)Andi Kleen
For some time each panic() called with interrupts disabled triggered the !irqs_disabled() WARN_ON in smp_call_function(), producing ugly backtraces and confusing users. This is a common situation with machine checks for example which tend to call panic with interrupts disabled, but will also hit in other situations e.g. panic during early boot. In fact it means that panic cannot be called in many circumstances, which would be bad. This all started with the new fancy queued smp_call_function, which is then used by the shutdown path to shut down the other CPUs. On closer examination it turned out that the fancy RCU smp_call_function() does lots of things not suitable in a panic situation anyways, like allocating memory and relying on complex system state. I originally tried to patch this over by checking for panic there, but it was quite complicated and the original patch was also not very popular. This also didn't fix some of the underlying complexity problems. The new code in post 2.6.29 tries to patch around this by checking for oops_in_progress, but that is not enough to make this fully safe and I don't think that's a real solution because panic has to be reliable. So instead use an own vector to reboot. This makes the reboot code extremly straight forward, which is definitely a big plus in a panic situation where it is important to avoid relying on too much kernel state. The new simple code is also safe to be called from interupts off region because it is very very simple. There can be situations where it is important that panic is reliable. For example on a fatal machine check the panic is needed to get the system up again and running as quickly as possible. So it's important that panic is reliable and all function it calls simple. This is why I came up with this simple vector scheme. It's very hard to beat in simplicity. Vectors are not particularly precious anymore since all big systems are using per CPU vectors. Another possibility would have been to use an NMI similar to kdump, but there is still the problem that NMIs don't work reliably on some systems due to BIOS issues. NMIs would have been able to stop CPUs running with interrupts off too. In the sake of universal reliability I opted for using a non NMI vector for now. I put the reboot vector into the highest priority bucket of the APIC vectors and moved the 64bit UV_BAU message down instead into the next lower priority. [ Impact: bug fix, fixes an old regression ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86, mce: implement new status bitsAndi Kleen
The x86 architecture recently added some new machine check status bits: S(ignalled) and AR (Action-Required). Signalled allows to check if a specific event caused an exception or was just logged through CMCI. AR allows the kernel to decide if an event needs immediate action or can be delayed or ignored. Implement support for these new status bits. mce_severity() uses the new bits to grade the machine check correctly and decide what to do. The exception handler uses AR to decide to kill or not. The S bit is used to separate events between the poll/CMCI handler and the exception handler. Classical UC always leads to panic. That was true before anyways because the existing CPUs always passed a PCC with it. Also corrects the rules whether to kill in user or kernel context and how to handle missing RIPV. The machine check handler largely uses the mce-severity grading engine now instead of making its own decisions. This means the logic is centralized in one place. This is useful because it has to be evaluated multiple times. v2: Some rule fixes; Add AO events Fix RIPV, RIPV|EIPV order (Ying Huang) Fix UCNA with AR=1 message (Ying Huang) Add comment about panicing in m_c_p. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86, mce: implement bootstrapping for machine check wakeupsAndi Kleen
Machine checks support waking up the mcelog daemon quickly. The original wake up code for this was pretty ugly, relying on a idle notifier and a special process flag. The reason it did it this way is that the machine check handler is not subject to normal interrupt locking rules so it's not safe to call wake_up(). Instead it set a process flag and then either did the wakeup in the syscall return or in the idle notifier. This patch adds a new "bootstraping" method as replacement. The idea is that the handler checks if it's in a state where it is unsafe to call wake_up(). If it's safe it calls it directly. When it's not safe -- that is it interrupted in a critical section with interrupts disables -- it uses a new "self IPI" to trigger an IPI to its own CPU. This can be done safely because IPI triggers are atomic with some care. The IPI is raised once the interrupts are reenabled and can then safely call wake_up(). When APICs are disabled the event is just queued and will be picked up eventually by the next polling timer. I think that's a reasonable compromise, since it should only happen quite rarely. Contains fixes from Ying Huang. [ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86, mce: extend struct mce user interface with more information.Andi Kleen
Experience has shown that struct mce which is used to pass an machine check to the user space daemon currently a few limitations. Also some data which is useful to print at panic level is also missing. This patch addresses most of them. The same information is also printed out together with mce panic. struct mce can be painlessly extended in a compatible way, the mcelog user space code just ignores additional fields with a warning. - It doesn't provide a wall time timestamp. There have been a few complaints about that. Fix that by adding a 64bit time_t - It doesn't provide the exact CPU identification. This makes it awkward for mcelog to decode the event correctly, especially when there are variations in the supported MCE codes on different CPU models or when mcelog is running on a different host after a panic. Previously the administrator had to specify the correct CPU when mcelog ran on a different host, but with the more variation in machine checks now it's better to auto detect that. It's also useful for more detailed analysis of CPU events. Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead. - Socket ID and initial APIC ID are useful to report because they allow to identify the failing CPU in some (not all) cases. This is also especially useful for the panic situation. This addresses one of the complaints from Thomas Gleixner earlier. - The MCG capabilities MSR needs to be reported for some advanced error processing in mcelog Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86, mce: support more than 256 CPUs in struct mceAndi Kleen
The old struct mce had a limitation to 256 CPUs. But x86 Linux supports more than that now with x2apic. Add a new field extcpu to report the extended number. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86, mce: store record length into memory struct mce anchorAndi Kleen
This makes it easier for tools who want to extract the mcelog out of crash images or memory dumps to adapt to changing struct mce size. The length field replaces padding, so it's fully compatible. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86, mce: add MCE poll count to /proc/interruptsAndi Kleen
Keep a count of the machine check polls (or CMCI events) in /proc/interrupts. Andi needs this for debugging, but it's also useful in general to see what's going in by the kernel. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-03x86, mce: add machine check exception count in /proc/interruptsAndi Kleen
Useful for debugging, but it's also good general policy to have a counter for all special interrupts there. This makes it easier to diagnose where a CPU is spending its time. [ Impact: feature, debugging tool ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-01Merge branch 'irq/numa' into x86/mce3H. Peter Anvin
Merge reason: arch/x86/kernel/irqinit_{32,64}.c unified in irq/numa and modified in x86/mce3; this merge resolves the conflict. Conflicts: arch/x86/kernel/irqinit.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-06-01Merge branch 'x86/cpufeature' into irq/numaIngo Molnar
Merge reason: irq/numa didnt build because this commit: 2759c32: x86: don't call read_apic_id if !cpu_has_apic Had a dependency on x86/cpufeature changes. Pull in that (small) branch to fix the dependency. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-01Merge branch 'linus' into irq/numaIngo Molnar
Conflicts: arch/mips/sibyte/bcm1480/irq.c arch/mips/sibyte/sb1250/irq.c Merge reason: we gathered a few conflicts plus update to latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-28x86, mce: drop "extern" from function prototypes in asm/mce.hH. Peter Anvin
Function prototypes don't need to be prefixed by "extern". [ Impact: cleanup ] Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86: trivial clean up for irq_vectors.hAndi Kleen
Fix a wrong comment. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86, mce: add basic error injection infrastructureAndi Kleen
Allow user programs to write mce records into /dev/mcelog. When they do that a fake machine check is triggered to test the machine check code. This uses the MCE MSR wrappers added earlier. The implementation is straight forward. There is a struct mce record per CPU and the MCE MSR accesses get data from there if there is valid data injected there. This allows to test the machine check code relatively realistically because only the lowest layer of hardware access is intercepted. The test suite and injector are available at git://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git git://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86, mce: enable MCE_INTEL for 32bit new MCEAndi Kleen
Enable the 64bit MCE_INTEL code (CMCI, thermal interrupts) for 32bit NEW_MCE. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86, mce: use 64bit machine check code on 32bitAndi Kleen
The 64bit machine check code is in many ways much better than the 32bit machine check code: it is more specification compliant, is cleaner, only has a single code base versus one per CPU, has better infrastructure for recovery, has a cleaner way to communicate with user space etc. etc. Use the 64bit code for 32bit too. This is the second attempt to do this. There was one a couple of years ago to unify this code for 32bit and 64bit. Back then this ran into some trouble with K7s and was reverted. I believe this time the K7 problems (and some others) are addressed. I went over the old handlers and was very careful to retain all quirks. But of course this needs a lot of testing on old systems. On newer 64bit capable systems I don't expect much problems because they have been already tested with the 64bit kernel. I made this a CONFIG for now that still allows to select the old machine check code. This is mostly to make testing easier, if someone runs into a problem we can ask them to try with the CONFIG switched. The new code is default y for more coverage. Once there is confidence the 64bit code works well on older hardware too the CONFIG_X86_OLD_MCE and the associated code can be easily removed. This causes a behaviour change for 32bit installations. They now have to install the mcelog package to be able to log corrected machine checks. The 64bit machine check code only handles CPUs which support the standard Intel machine check architecture described in the IA32 SDM. The 32bit code has special support for some older CPUs which have non standard machine check architectures, in particular WinChip C3 and Intel P5. I made those a separate CONFIG option and kept them for now. The WinChip variant could be probably removed without too much pain, it doesn't really do anything interesting. P5 is also disabled by default (like it was before) because many motherboards have it miswired, but according to Alan Cox a few embedded setups use that one. Forward ported/heavily changed version of old patch, original patch included review/fixes from Thomas Gleixner, Bert Wesarg. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86, mce: rename 64bit mce_dont_init to mce_disabledAndi Kleen
Give it the same name as on 32bit. This makes further merging easier. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86, mce: Cleanup MCG definitionsThomas Gleixner
Decode more magic constants and turn them into symbols. [ Sort definitions bitwise, introduce MCG_EXT_CNT - HS ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86, mce: Cleanup symbols in intel thermal codesThomas Gleixner
Decode magic constants and turn them into symbols. [ Cleanup to use symbols already exists - HS ] [ Impact: cleanup ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86, mce: Rename sysfs variablesIngo Molnar
Shorten variable names. This also compacts the code a bit. device_mce => mce_dev mce_device_initialized => mce_dev_initialized mce_attribute => mce_attrs [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86, mce: unify, prepare 64bit in mce.hIngo Molnar
Prepare mce.h for unification, so that it will build on 32-bit x86 kernels too. [ Impact: cleanup ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-28x86: enable_update_mptable should be a macroYinghai Lu
instead of declaring one variant as an inline function... because other case is a variable Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4A13B344.7030307@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabledYinghai Lu
Len expressed concern that the update_mptable feature has side-effects on the ACPI code. Make it sure explicitly that the code only ever gets called if the (default disabled) update_mptable boot quirk option is disabled. [ Impact: isolate the update_mptable feature from ACPI code more ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4A0DC832.5090200@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18x86, apic: introduce io_apic_irq_attrYinghai Lu
according to Ingo, io_apic irq-setup related functions have too many parameters with a repetitive signature. So reduce related funcs to get less params by passing a pointer to a newly defined io_apic_irq_attr structure. v2: io_apic_irq ==> irq_attr triggering ==> trigger v3: add set_io_apic_irq_attr [ Impact: cleanup ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4A08ACD3.2070401@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15x86: Fix performance regression caused by paravirt_ops on native kernelsJeremy Fitzhardinge
Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: | commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8 | Date: Mon Jul 7 12:07:51 2008 -0700 | | paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq *0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop /* or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com> Analysed-by: "Li Xin" <xin.li@intel.com> Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4A0B62F7.5030802@goop.org> [ fixed the help text ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12x86: read apic ID in the !acpi_lapic caseYinghai Lu
Ed found that on 32-bit, boot_cpu_physical_apicid is not read right, when the mptable is broken. Interestingly, actually three paths use/set it: 1. acpi: at that time that is already read from reg 2. mptable: only read from mptable 3. no madt, and no mptable, that use default apic id 0 for 64-bit, -1 for 32-bit so we could read the apic id for the 2/3 path. We trust the hardware register more than we trust a BIOS data structure (the mptable). We can also avoid the double set_fixmap() when acpi_lapic is used, and also need to move cpu_has_apic earlier and call apic_disable(). Also when need to update the apic id, we'd better read and set the apic version as well - so that quirks are applied precisely. v2: make path 3 with 64bit, use -1 as apic id, so could read it later. v3: fix whitespace problem pointed out by Ed Swierk v5: fix boot crash [ Impact: get correct apic id for bsp other than acpi path ] Reported-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <49FC85A9.2070702@kernel.org> [ v4: sanity-check in the ACPI case too ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12Merge branch 'x86/apic' into irq/numaIngo Molnar
Merge reason: both topics modify the APIC code but were able to do it in parallel so far. An upcoming patch generates a conflict so merge them to avoid the conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12x86, 32-bit: fix kernel_trap_sp()Masami Hiramatsu
Use &regs->sp instead of regs for getting the top of stack in kernel mode. (on x86-64, regs->sp always points the top of stack) [ Impact: Oprofile decodes only stack for backtracing on i386 ] Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> [ v2: rename the API to kernel_stack_pointer(), move variable inside ] Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: systemtap@sources.redhat.com Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Jan Blunck <jblunck@suse.de> Cc: Christoph Hellwig <hch@infradead.org> LKML-Reference: <20090511210300.17332.67549.stgit@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11x86: fix percpu_{to,from}_op()Jan Beulich
- the byte operand constraints were wrong for 32-bit - the to-op's input operands weren't properly parenthesized [ Impact: fix possible miscompilation or build failure ] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-11x86: display extended apic registers with print_local_APIC and cpu_debug codeAndreas Herrmann
Both print_local_APIC (used when apic=debug kernel param is set) and cpu_debug code missed support for some extended APIC registers that I'd like to see. This adds support to show: - extended APIC feature register - extended APIC control register - extended LVT registers [ Impact: print more debug info ] Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Cyrill Gorcunov <gorcunov@openvz.org> LKML-Reference: <20090508162350.GO29045@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11x86: clean up and fix setup_clear/force_cpu_cap handlingYinghai Lu
setup_force_cpu_cap() only have one user (Xen guest code), but it should not reuse cleared_cpu_cpus, otherwise it will have problems on SMP. Need to have a separate cpu_cpus_set array too, for forced-on flags, beyond the forced-off flags. Also need to setup handling before all cpus caps are combined. [ Impact: fix the forced-set CPU feature flag logic ] Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11x86/acpi: move setup io apic routing out of CONFIG_ACPI scopeYinghai Lu
So we could set io apic routing when ACPI is not enabled. [ Impact: prepare for new functionality ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4A01C422.5070400@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector()Yinghai Lu
To prepare those params for pcibios_irq_enable() to call setup_io_apic_routing(). [ Impact: extend function call API to prepare for new functionality ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Len Brown <lenb@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A01C406.2040303@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11x86/acpi: call mp_config_acpi_gsi() in mp_register_gsi()Yinghai Lu
The patch to call mp_config_acpi_gsi() from the ACPI IRQ registration code never got mainline because there were open discussions about it. This call is needed to properly update the kernel's copy of the mptable, when the update_mptable boot parameter is needed. Now that the dust has settled with the APIC unification, and since there were no objections when the patch was re-submitted, try this again. [ Impact: fix the update_mptable boot parameter ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4A01C387.7090103@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11Merge commit 'v2.6.30-rc5' into x86/apicIngo Molnar
Merge reason: this branch was on a .30-rc2 base - sync it up with all the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-03x86: cpufeature.h fix name for X86_FEATURE_MCEJaswinder Singh Rajput
X86_FEATURE_MCE = Machine Check Exception X86_FEATURE_MCA = Machine Check Architecture [ Impact: cleanup ] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> LKML-Reference: <1241329295.6321.1.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-02Merge branch 'x86-mce-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip: x86, mce: fix boot logging logic x86, mce: make polling timer interval per CPU
2009-05-01Merge branch 'x86/apic' into irq/numaIngo Molnar
Conflicts: arch/x86/kernel/apic/io_apic.c Merge reason: non-trivial interaction between ongoing work in io_apic.c and the NUMA migration feature in the irq tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-28irq: change ACPI GSI APIs to also take a device argumentYinghai Lu
We want to use dev_to_node() later on, to be aware of the 'home node' of the GSI in question. [ Impact: cleanup, prepare the IRQ code to be more NUMA aware ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Len Brown <lenb@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Len Brown <lenb@kernel.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-acpi@vger.kernel.org Cc: linux-ia64@vger.kernel.org LKML-Reference: <49F65560.20904@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-22x86/PCI: set_pci_bus_resources_arch_default cleanupsYinghai Lu
Rename set_pci_bus_resources_arch_default to x86_pci_root_bus_res_quirks, move the weak version from common.c to i386.c, and before calling, make sure it's a root bus. Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-04-22x86, mce: fix boot logging logicAndi Kleen
The earlier patch to change the poller to a separate function subtly broke the boot logging logic. This could lead to machine checks getting logged at boot even when disabled or defaulting to off on some systems. Fix that. [ Impact: bug fix - avoid spurious MCE in log ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-04-22x86: x2apic, IR: remove reinit_intr_remapped_IO_APIC()Suresh Siddha
When interrupt-remapping is enabled, we are relying on setup_IO_APIC_irqs() to configure remapped entries in the IO-APIC, which comes little bit later after enabling interrupt-remapping. Meanwhile, restoration of old io-apic entries after enabling interrupt-remapping will not make the interrupts through io-apic functional anyway. So remove the unnecessary reinit_intr_remapped_IO_APIC() step. The longer story: When interrupt-remapping is enabled, IO-APIC entries need to be setup in the re-mappable format (pointing to interrupt-remapping table entries setup by the OS). This remapping configuration is happening in the same place where we traditionally configure IO-APIC (i.e., in setup_IO_APIC_irqs()). So when we enable interrupt-remapping successfully, there is no need to restore old io-apic RTE entries before we actually do a complete configuration shortly in setup_IO_APIC_irqs(). Old IO-APIC RTE's may be in traditional format (non re-mappable) or in re-mappable format pointing to interrupt-remapping table entries setup by BIOS. Restoring both of these will not make IO-APIC functional. We have to rely on setup_IO_APIC_irqs() for proper configuration by OS. So I am removing this unnecessary and broken step. [ Impact: remove unnecessary/broken IO-APIC setup step ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Weidong Han <weidong.han@intel.com> Cc: dwmw2@infradead.org LKML-Reference: <20090420200450.552359000@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-21FRV: Fix the section attribute on UP DECLARE_PER_CPU()David Howells
In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU() does not agree with that specified by DEFINE_PER_CPU(). This means that architectures that have a small data section references relative to a base register may throw up linkage errors due to too great a displacement between where the base register points and the per-CPU variable. On FRV, the .h declaration says that the variable is in the .sdata section, but the .c definition says it's actually in the .data section. The linker throws up the following errors: kernel/built-in.o: In function `release_task': kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o To fix this, DECLARE_PER_CPU() should simply apply the same section attribute as does DEFINE_PER_CPU(). However, this is made slightly more complex by virtue of the fact that there are several variants on DEFINE, so these need to be matched by variants on DECLARE. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-21x86: x2apic, IR: Clean up X86_X2APIC and INTR_REMAP config checksSuresh Siddha
Add x2apic_supported() to clean up CONFIG_X86_X2APIC checks. Fix CONFIG_INTR_REMAP checks. [ Impact: cleanup ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: dwmw2@infradead.org Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Weidong Han <weidong.han@intel.com> LKML-Reference: <20090420200450.128993000@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-19lguest: fix guest crash on non-linear addresses in gdt pvopsRusty Russell
Fixes guest crash 'lguest: bad read address 0x4800000 len 256' The new per-cpu allocator ends up handing a non-linear address to write_gdt_entry. We do __pa() on it, and hand it to the host, which kills us. I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT code, but had no pressing reason until now. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: lguest@ozlabs.org
2009-04-19x86, intr-remap: enable interrupt remapping earlyWeidong Han
Currently, when x2apic is not enabled, interrupt remapping will be enabled in init_dmars(), where it is too late to remap ioapic interrupts, that is, ioapic interrupts are really in compatibility mode, not remappable mode. This patch always enables interrupt remapping before ioapic setup, it guarantees all interrupts will be remapped when interrupt remapping is enabled. Thus it doesn't need to set the compatibility interrupt bit. [ Impact: refactor intr-remap init sequence, enable fuller remap mode ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Weidong Han <weidong.han@intel.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: iommu@lists.linux-foundation.org Cc: allen.m.kay@intel.com Cc: fenghua.yu@intel.com LKML-Reference: <1239957736-6161-4-git-send-email-weidong.han@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-19x86, intr-remap: fix ack for interrupt remappingWeidong Han
Shouldn't call ack_apic_edge() in ir_ack_apic_edge(), because ack_apic_edge() does more than just ack: it also does irq migration in the non-interrupt-remapping case. But there is no such need for interrupt-remapping case, as irq migration is done in the process context. Similarly, ir_ack_apic_level() shouldn't call ack_apic_level, and instead should do the local cpu's EOI + directed EOI to the io-apic. ack_x2APIC_irq() is not neccessary, because ack_APIC_irq() will use MSR write for x2apic, and uncached write for non-x2apic. [ Impact: simplify/standardize intr-remap IRQ acking, fix on !x2apic ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Weidong Han <weidong.han@intel.com> Acked-by: David Woodhouse <David.Woodhouse@intel.com> Cc: iommu@lists.linux-foundation.org Cc: allen.m.kay@intel.com Cc: fenghua.yu@intel.com LKML-Reference: <1239957736-6161-3-git-send-email-weidong.han@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-17Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix microcode driver newly spewing warnings x86, PAT: Remove page granularity tracking for vm_insert_pfn maps x86: disable X86_PTRACE_BTS for now x86, documentation: kernel-parameters replace X86-32,X86-64 with X86 x86: pci-swiotlb.c swiotlb_dma_ops should be static x86, PAT: Remove duplicate memtype reserve in devmem mmap x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() x86, PAT: Changing memtype to WC ensuring no WB alias x86, PAT: Handle faults cleanly in set_memory_ APIs x86, PAT: Change order of cpa and free in set_memory_wb x86, CPA: Change idmap attribute before ioremap attribute setup