aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2008-01-30x86: introduce ldt_write accessorThomas Gleixner
Create a ldt write accessor like the 32 bit one. Preparatory patch for merging ldt.c and anyway necessary for 64bit paravirt ops. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: clean up arch/x86/kernel/ldt_32/64.cThomas Gleixner
White space and coding style clenaup. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: clean up arch/x86/kernel/e820_64.cThomas Gleixner
White space and coding style cleanup. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: code cleanups in arch/x86/kernel/pci-gart_64.cIngo Molnar
code cleanups: errors lines of code errors/KLOC arch/x86/kernel/pci-gart_64.c 183 748 244.6 arch/x86/kernel/pci-gart_64.c 0 790 0 Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: clean up arch/x86/kernel/aperture_64.c printk()sIngo Molnar
clean up arch/x86/kernel/aperture_64.c printk()s. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: clean up arch/x86/kernel/aperture_64.cIngo Molnar
whitespace cleanup. No code changed: text data bss dec hex filename 2080 76 4 2160 870 aperture_64.o.before 2080 76 4 2160 870 aperture_64.o.after errors lines of code errors/KLOC arch/x86/kernel/aperture_64.c 114 299 381.2 arch/x86/kernel/aperture_64.c 0 315 0 Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: enable irq in default_idle on 64-bitHiroshi Shimamoto
local_irq_enable() is missing after sched_clock_idle_wakeup_event(). Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: idle wakeup event in the HLT loopIngo Molnar
do a proper idle-wakeup event on HLT as well - some CPUs stop the TSC in HLT too, not just when going through the ACPI methods. (the ACPI idle code already does this.) [ update the 64-bit side too, as noticed by Jiri Slaby. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: scale cyc_2_nsec according to CPU frequencyGuillaume Chazarain
scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ mingo@elte.hu: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: protect against sigaltstack wraparoundRoland McGrath
cf http://lkml.org/lkml/2007/10/3/41 To summarize: on Linux, SA_ONSTACK decides whether you are already on the signal stack based on the value of the SP at the time of a signal. If you are not already inside the range, you are not "on the signal stack" and so the new signal handler frame starts over at the base of the signal stack. sigaltstack (and sigstack before it) was invented in BSD. There, the SA_ONSTACK behavior has always been different. It uses a kernel state flag to decide, rather than the SP value. When you first take an SA_ONSTACK signal and switch to the alternate signal stack, it sets the SS_ONSTACK flag in the thread's sigaltstack state in the kernel. Thereafter you are "on the signal stack" and don't switch SP before pushing a handler frame no matter what the SP value is. Only when you sigreturn from the original handler context do you clear the SS_ONSTACK flag so that a new handler frame will start over at the base of the alternate signal stack. The undesireable effect of the Linux behavior is that an overflow of the alternate signal stack can not only go undetected, but lead to a ring buffer effect of clobbering the original handler frame at the base of the signal stack for each successive signal that comes just after the overflow. This is what Shi Weihua's test case demonstrates. Normally this does not come up because of the signal mask, but the test case uses SA_NODEFER for its SIGSEGV handler. The other subtle part of the existing Linux semantics is that a simple longjmp out of a signal handler serves to take you off the signal stack in a safe and reliable fashion without having used sigreturn (nor having just returned from the handler normally, which means the same). After the longjmp (or even informal stack switching not via any proper libc or kernel interface), the alternate signal stack stands ready to be used again. A paranoid program would allocate a PROT_NONE red zone around its alternate signal stack. Then a small overflow would trigger a SIGSEGV in handler setup, and be fatal (core dump) whether or not SIGSEGV is blocked. As with thread stack red zones, that cannot catch all overflows (or underflows). e.g., a local array as large as page size allocated in a function called from a handler, but not actually touched before more calls push more stack, could cause an overflow that silently pushes into some unrelated allocated pages. The BSD behavior does not do anything in particular about overflow. But it does at least avoid the wraparound or "ring buffer effect", so you'll just get a straightforward all-out overflow down your address space past the low end of the alternate signal stack. I don't know what the BSD behavior is for longjmp out of an SA_ONSTACK handler. The POSIX wording relating to sigaltstack is pretty minimal. I don't think it speaks to this issue one way or another. (The program that overflows its stack is clearly in undefined behavior territory of one sort or another anyhow.) Given the longjmp issue and the potential for highly subtle complications in existing programs relying on this in arcane ways deep in their code, I am very dubious about changing the behavior to the BSD style persistent flag. I think Shi Weihua's patches have a similar effect by tracking the SP used in the last handler setup. I think it would be sensible for the signal handler setup code to detect when it would itself be causing a stack overflow. Maybe something like the following patch (untested). This issue exists in the same way on all machines, so ideally they would all do a similar check. When it's the handler function itself or its callees that cause the overflow, rather than the signal handler frame setup alone crossing the boundary, this still won't help. But I don't see any way to distinguish that from the valid longjmp case. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: add DMI quirk for io-delay hangs on Compaq Presario V6000 laptopsIngo Molnar
add the DMI strings provided by Islam Amer <pharon@gmail.com>, for the Compaq Presario V6000 (Quanta/30B7). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: various changes and cleanups to in_p/out_p delay detailsIngo Molnar
various changes to the in_p/out_p delay details: - add the io_delay=none method - make each method selectable from the kernel config - simplify the delay code a bit by getting rid of an indirect function call - add the /proc/sys/kernel/io_delay_type sysctl - change 'io_delay=standard|alternate' to io_delay=0x80 and io_delay=0xed - make the io delay config not depend on CONFIG_DEBUG_KERNEL Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: "David P. Reed" <dpreed@reed.com>
2008-01-30x86: provide a DMI based port 0x80 I/O delay override.Rene Herman
x86: provide a DMI based port 0x80 I/O delay override. Certain (HP) laptops experience trouble from our port 0x80 I/O delay writes. This patch provides for a DMI based switch to the "alternate diagnostic port" 0xed (as used by some BIOSes as well) for these. David P. Reed confirmed that port 0xed works for him and provides a proper delay. The symptoms of _not_ working are a hanging machine, with "hwclock" use being a direct trigger. Earlier versions of this attempted to simply use udelay(2), with the 2 being a value tested to be a nicely conservative upper-bound with help from many on the linux-kernel mailinglist but that approach has two problems. First, pre-loops_per_jiffy calibration (which is post PIT init while some implementations of the PIT are actually one of the historically problematic devices that need the delay) udelay() isn't particularly well-defined. We could initialise loops_per_jiffy conservatively (and based on CPU family so as to not unduly delay old machines) which would sort of work, but... Second, delaying isn't the only effect that a write to port 0x80 has. It's also a PCI posting barrier which some devices may be explicitly or implicitly relying on. Alan Cox did a survey and found evidence that additionally some drivers may be racy on SMP without the bus locking outb. Switching to an inb() makes the timing too unpredictable and as such, this DMI based switch should be the safest approach for now. Any more invasive changes should get more rigid testing first. It's moreover only very few machines with the problem and a DMI based hack seems to fit that situation. This also introduces a command-line parameter "io_delay" to override the DMI based choice again: io_delay=<standard|alternate> where "standard" means using the standard port 0x80 and "alternate" port 0xed. This retains the udelay method as a config (CONFIG_UDELAY_IO_DELAY) and command-line ("io_delay=udelay") choice for testing purposes as well. This does not change the io_delay() in the boot code which is using the same port 0x80 I/O delay but those do not appear to be a problem as David P. Reed reported the problem was already gone after using the udelay version. He moreover reported that booting with "acpi=off" also fixed things and seeing as how ACPI isn't touched until after this DMI based I/O port switch I believe it's safe to leave the ones in the boot code be. The DMI strings from David's HP Pavilion dv9000z are in there already and we need to get/verify the DMI info from other machines with the problem, notably the HP Pavilion dv6000z. This patch is partly based on earlier patches from Pavel Machek and David P. Reed. Signed-off-by: Rene Herman <rene.herman@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: fix: s2ram + P4 + tsc = annoyanceMike Galbraith
s2ram recently became useful here, except for the kernel's annoying habit of disabling my P4's perfectly good TSC. [ 107.894470] CPU 1 is now offline [ 107.894474] SMP alternatives: switching to UP code [ 107.895832] CPU0 attaching sched-domain: [ 107.895836] domain 0: span 1 [ 107.895838] groups: 1 [ 107.896097] CPU1 is down [ 3.726156] Intel machine check architecture supported. [ 3.726165] Intel machine check reporting enabled on CPU#0. [ 3.726167] CPU0: Intel P4/Xeon Extended MCE MSRs (12) available [ 3.726170] CPU0: Thermal monitoring enabled [ 3.726175] Back to C! [ 3.726708] Force enabled HPET at resume [ 3.726775] Enabling non-boot CPUs ... [ 3.727049] CPU0 attaching NULL sched-domain. [ 3.727165] SMP alternatives: switching to SMP code [ 3.727858] Booting processor 1/1 eip 3000 [ 3.727862] CPU 1 irqstacks, hard=b042f000 soft=b042d000 [ 3.738173] Initializing CPU#1 [ 3.798912] Calibrating delay using timer specific routine.. 5986.12 BogoMIPS (lpj=2993061) [ 3.798920] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000 [ 3.798931] CPU: Trace cache: 12K uops, L1 D cache: 8K [ 3.798934] CPU: L2 cache: 512K [ 3.798936] CPU: Physical Processor ID: 0 [ 3.798938] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000 [ 3.798946] Intel machine check architecture supported. [ 3.798952] Intel machine check reporting enabled on CPU#1. [ 3.798955] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available [ 3.798959] CPU1: Thermal monitoring enabled [ 3.799161] CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09 [ 3.799187] checking TSC synchronization [CPU#0 -> CPU#1]: [ 3.819181] Measured 63588552840 cycles TSC warp between CPUs, turning off TSC clock. [ 3.819184] Marking TSC unstable due to: check_tsc_sync_source failed. If check_tsc_warp() is called after initial boot, and the TSC has in the meantime been set (BIOS, user, silicon, elves) to a value lower than the last stored/stale value, we blame the TSC. Reset to pristine condition after every test. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: hibernation: document __save_processor_state() on x86Rafael J. Wysocki
Document the fact that __save_processor_state() has to save all CPU registers referred to by the kernel in case a different kernel is used to load and restore a hibernation image containing it. Sigend-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: assign IRQs to HPET timers, fixBalaji Rao
Looks like IRQ 31 is assigned to timer 3, even without the patch! I wonder who wrote the number 31. But the manual says that it is zero by default. I think we should check whether the timer has been allocated an IRQ before proceeding to assign one to it. Here is a patch that does this. Signed-off-by: Balaji Rao <balajirrao@gmail.com> Tested-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: assign IRQs to HPET timersBalaji Rao
The userspace API for the HPET (see Documentation/hpet.txt) did not work. The HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer device. This patch fixes it by allocating IRQs to timer blocks in the HPET. arch/x86/kernel/hpet.c | 13 +++++-------- drivers/char/hpet.c | 45 ++++++++++++++++++++++++++++++++++++++------- include/linux/hpet.h | 2 +- 3 files changed, 44 insertions(+), 16 deletions(-) Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: unregister PIT clocksource when PIT is disabledThomas Gleixner
The following scenario might leave PIT as a disfunctional clock source: PIT is registered as clocksource PM_TIMER is registered as clocksource and enables highres/dyntick mode PIT is switched to oneshot mode -> now the readout of PIT is bogus, but the user might select PIT via the sysfs override, which would break the box as the time readout is unusable. Unregister the PIT clocksource when the PIT clock event device is switched into shutdown / oneshot mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30clocksource: add unregister function to disable unusable clocksourcesThomas Gleixner
On x86 the PIT might become an unusable clocksource. Add an unregister function to provide a possibilty to remove the PIT from the list of available clock sources. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: restrict PIT clocksource usageThomas Gleixner
PIT clocksource is registered unconditionally even when HPET is enabled or when PIT is replaced by the local APIC timer. In both cases PIT can not be used as it is stopped and the readout would be stale. Prevent registering PIT in those cases. patch depends on: x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30time: more timer related cleanupsPavel Machek
I was confused by FSEC = 10^15 NSEC statement, plus small whitespace fixes. When there's copyright, there should be GPL. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30kobj: fix threshold_init_device/kobject_uevent_env oopsGreg KH
the logic in this function is just crazy. It's recursive, but we can circumvent the creation for the kobject and whole creation of the threshold_block if some conditions are met. That's why we see the allocate_threshold_blocks so many times in the callstack, yet only a few kobjects created. Then we blow up in kobject_uevent_env() on the first debug printk. Which means that we are just passing in garbage. Man, this is one time that comments in code would have been very nice to have, and why forward goto's into major code blocks are just evil... Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-28all archs: consolidate init and exit sections in vmlinux.lds.hSam Ravnborg
This patch consolidate all definitions of .init.text, .init.data and .exit.text, .exit.data section definitions in the generic vmlinux.lds.h. This is a preparational patch - alone it does not buy us much good. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-25sched: latencytop supportArjan van de Ven
LatencyTOP kernel infrastructure; it measures latencies in the scheduler and tracks it system wide and per process. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25sched: high-res preemption tickPeter Zijlstra
Use HR-timers (when available) to deliver an accurate preemption tick. The regular scheduler tick that runs at 1/HZ can be too coarse when nice level are used. The fairness system will still keep the cpu utilisation 'fair' by then delaying the task that got an excessive amount of CPU time but try to minimize this by delivering preemption points spot-on. The average frequency of this extra interrupt is sched_latency / nr_latency. Which need not be higher than 1/HZ, its just that the distribution within the sched_latency period is important. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25cpu-hotplug: replace lock_cpu_hotplug() with get_online_cpus()Gautham R Shenoy
Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use get_online_cpus and put_online_cpus instead as it highlights the refcount semantics in these operations. The new API guarantees protection against the cpu-hotplug operation, but it doesn't guarantee serialized access to any of the local data structures. Hence the changes needs to be reviewed. In case of pseries_add_processor/pseries_remove_processor, use cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the cpu_present_map there. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-24Driver core: change sysdev classes to use dynamic kobject namesKay Sievers
All kobjects require a dynamically allocated name now. We no longer need to keep track if the name is statically assigned, we can just unconditionally free() all kobject names on cleanup. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24Kobject: convert arch/* from kobject_unregister() to kobject_put()Greg Kroah-Hartman
There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24Kobject: change arch/x86/kernel/cpu/mcheck/mce_amd_64.c to use ↵Greg Kroah-Hartman
kobject_init_and_add Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Jacob Shin <jacob.shin@amd.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24Kobject: change arch/x86/kernel/cpu/mcheck/mce_amd_64.c to use ↵Greg Kroah-Hartman
kobject_create_and_add Make this kobject dynamic and convert it to not use kobject_register, which is going away. Cc: Jacob Shin <jacob.shin@amd.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24Kobject: change arch/x86/kernel/cpu/intel_cacheinfo.c to use ↵Greg Kroah-Hartman
kobject_init_and_add Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24PM: Acquire device locks on suspendRafael J. Wysocki
This patch reorganizes the way suspend and resume notifications are sent to drivers. The major changes are that now the PM core acquires every device semaphore before calling the methods, and calls to device_add() during suspends will fail, while calls to device_del() during suspends will block. It also provides a way to safely remove a suspended device with the help of the PM core, by using the device_pm_schedule_removal() callback introduced specifically for this purpose, and updates two drivers (msr and cpuid) that need to use it. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-22x86: GEODE fix a race condition in the MFGPT timer tickJordan Crouse
When we set the MFGPT timer tick, there is a chance that we'll immediately assert an event. If for some reason the IRQ routing for this clock has been setup for some other purpose, then we could end up firing an interrupt into the SMM handler or worse. This rearranges the timer tick init function to initalize the handler before we set up the MFGPT clock to make sure that even if we get an event, it will go to the handler. Furthermore, in the handler we need to make sure that we clear the event, even if the timer isn't running. Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Tested-by: Arnd Hannemann <hannemann@i4.informatik.rwth-aachen.de>
2008-01-22Revert "x86: fix NMI watchdog & 'stopped time' problem"Thomas Gleixner
This reverts commit d4d25deca49ec2527a634557bf5a6cf449f85deb. It tried to fix long standing bugzilla entries, but the solution was reported to break other systems. The reporter of http://bugzilla.kernel.org/show_bug.cgi?id=9791 tracked it down to this commit and confirmed that reverting the patch restores the correct behaviour. It's too late in the release cycle to find a better solution than reverting the commit to avoid regressions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu>
2008-01-16lockdep: more hardirq annotations for notify_die()Peter Zijlstra
On Sat, 2007-12-29 at 18:06 +0100, Marcin Slusarz wrote: > Hi > Today I've got this (while i was upgrading my gentoo box): > > WARNING: at kernel/lockdep.c:2658 check_flags() > Pid: 21680, comm: conftest Not tainted 2.6.24-rc6 #63 > > Call Trace: > [<ffffffff80253457>] check_flags+0x1c7/0x1d0 > [<ffffffff80257217>] lock_acquire+0x57/0xc0 > [<ffffffff8024d5c0>] __atomic_notifier_call_chain+0x60/0xd0 > [<ffffffff8024d641>] atomic_notifier_call_chain+0x11/0x20 > [<ffffffff8024d67e>] notify_die+0x2e/0x30 > [<ffffffff8020da0a>] do_divide_error+0x5a/0xa0 > [<ffffffff80522bdd>] trace_hardirqs_on_thunk+0x35/0x3a > [<ffffffff80255b89>] trace_hardirqs_on+0xd9/0x180 > [<ffffffff80522bdd>] trace_hardirqs_on_thunk+0x35/0x3a > [<ffffffff80523c2d>] error_exit+0x0/0xa9 > > possible reason: unannotated irqs-off. > irq event stamp: 4693 > hardirqs last enabled at (4693): [<ffffffff80522bdd>] trace_hardirqs_on_thunk+0x35/0x3a > hardirqs last disabled at (4692): [<ffffffff80522c17>] trace_hardirqs_off_thunk+0x35/0x37 > softirqs last enabled at (3546): [<ffffffff80238343>] __do_softirq+0xb3/0xd0 > softirqs last disabled at (3521): [<ffffffff8020c97c>] call_softirq+0x1c/0x30 more early fixups for notify_die().. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-15x86: fix RTC_AIE with CONFIG_HPET_EMULATE_RTCBernhard Walle
In the current code, RTC_AIE doesn't work if the RTC relies on CONFIG_HPET_EMULATE_RTC because the code sets the RTC_AIE flag in hpet_set_rtc_irq_bit(). The interrupt handles does accidentally check for RTC_PIE and not RTC_AIE when comparing the time which was set in hpet_set_alarm_time(). I now verified on a test system here that without the patch applied, the attached test program fails on a system that has HPET with 2.6.24-rc7-default. That's not critical since I guess the problem has been there for several kernel releases, but as the fix is quite obvious. Configuration is CONFIG_RTC=y and CONFIG_HPET_EMULATE_RTC=y. Signed-off-by: Bernhard Walle <bwalle@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-14Kick CPUS that might be sleeping in cpus_idle_waitSteven Rostedt
Sometimes cpu_idle_wait gets stuck because it might miss CPUS that are already in idle, have no tasks waiting to run and have no interrupts going to them. This is common on bootup when switching cpu idle governors. This patch gives those CPUS that don't check in an IPI kick. Background: ----------- I notice this while developing the mcount patches, that every once in a while the system would hang. Looking deeper, the hang was always at boot up when registering init_menu of the cpu_idle menu governor. Talking with Thomas Gliexner, we discovered that one of the CPUS had no timer events scheduled for it and it was in idle (running with NO_HZ). So the CPU would not set the cpu_idle_state bit. Hitting sysrq-t a few times would eventually route the interrupt to the stuck CPU and the system would continue. Note, I would have used the PDA isidle but that is set after the cpu_idle_state bit is cleared, and would leave a window open where we may miss being kicked. hmm, looking closer at this, we still have a small race window between clearing the cpu_idle_state and disabling interrupts (hence the RFC). CPU0: CPU 1: --------- --------- cpu_idle_wait(): cpu_idle(): | __cpu_cpu_var(is_idle) = 1; | if (__get_cpu_var(cpu_idle_state)) /* == 0 */ per_cpu(cpu_idle_state, 1) = 1; | if (per_cpu(is_idle, 1)) /* == 1 */ | smp_call_function(1) | | receives ipi and runs do_nothing. wait on map == empty idle(); /* waits forever */ So really we need interrupts off for most of this then. One might think that we could simply clear the cpu_idle_state from do_nothing, but I'm assuming that cpu_idle governors can be removed, and this might cause a race that a governor might be used after the module was removed. Venki said: I think your RFC patch is the right solution here. As I see it, there is no race with your RFC patch. As long as you call a dummy smp_call_function on all CPUs, we should be OK. We can get rid of cpu_idle_state and the current wait forever logic altogether with dummy smp_call_function. And so there wont be any wait forever scenario. The whole point of cpu_idle_wait() is to make all CPUs come out of idle loop atleast once. The caller will use cpu_idle_wait something like this. // Want to change idle handler - Switch global idle handler to always present default_idle - call cpu_idle_wait so that all cpus come out of idle for an instant and stop using old idle pointer and start using default idle - Change the idle handler to a new handler - optional cpu_idle_wait if you want all cpus to start using the new handler immediately. Maybe the below 1s patch is safe bet for .24. But for .25, I would say we just replace all complicated logic by simple dummy smp_call_function and remove cpu_idle_state altogether. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-11Pull bugzilla-9194 into release branchLen Brown
2008-01-11PM: ACPI and APM must not be enabled at the same timeLen Brown
ACPI and APM used "pm_active" to guarantee that they would not be simultaneously active. But pm_active was recently moved under CONFIG_PM_LEGACY, so that without CONFIG_PM_LEGACY, pm_active became a NOP -- allowing ACPI and APM to both be simultaneously enabled. This caused unpredictable results, including boot hangs. Further, the code under CONFIG_PM_LEGACY is scheduled for removal. So replace pm_active with pm_flags. pm_flags depends only on CONFIG_PM, which is present for both CONFIG_APM and CONFIG_ACPI. http://bugzilla.kernel.org/show_bug.cgi?id=9194 Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2008-01-08x86: fix do_fork_idle section mismatchThomas Gleixner
With CPU_HOTPLUG=n: WARNING: vmlinux.o(.text+0x104f8): Section mismatch: reference to .init.text:fork_idle (between 'do_fork_idle' and 'lapic_timer_broadcast') do_fork_idle() needs to be __cpuinit. It can be static as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-01fix lguest rmmod "bad pgd"Rusty Russell
After 17d57a9206b4de6ad082ac9f2d2346985abbd2aa ("x86: fix x86-32 early fixmap initialization.") removing lg.ko caused a printk from vunmap: mm/memory.c:115: bad pgd 004b3027. On the second use after module load, the kernel crashes. This fixes the immediate problem (accessed and dirty bits not set as expected in pmd_none_or_clear_bad). I can't see why this would cause a crash, but I haven't been able to reproduce it once this is applied. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-25Revert "x86: fix show cpuinfo cpu number always zero"Linus Torvalds
This reverts commit fbdcf18df73758b2e187ab94678b30cd5f6ff9f9. As pointed out by Yanmin Zhang, the problem was already fixed differently (and correctly), and rather than fix anything, it actually causes us to create a sub-optimal sched-domains hierarchy (not setting up the domain belonging to the core) when CONFIG_X86_HT=y. Requested-by: Yanmin Zhang <yanmin_zhang@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-21x86: intel_cacheinfo.c: cpu cache info entry for Intel TolapaiJason Gaston
This patch adds a cpu cache info entry for the Intel Tolapai cpu. Signed-off-by: Jason Gaston <jason.d.gaston@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-12-21x86: fix die() to not be preemptibleIngo Molnar
Andrew "Eagle Eye" Morton noticed that we use raw_local_save_flags() instead of raw_local_irq_save(flags) in die(). This allows the preemption of oopsing contexts - which is highly undesirable. It also causes CONFIG_DEBUG_PREEMPT to complain, as reported by Miles Lane. this bug was introduced via: commit 39743c9ef717fd4f2b5583f010115c5f2482b8ae Author: Andi Kleen <ak@suse.de> Date: Fri Oct 19 20:35:03 2007 +0200 x86: use raw locks during oopses - spin_lock_irqsave(&die.lock, flags); + __raw_spin_lock(&die.lock); + raw_local_save_flags(flags); that is not a correct open-coding of spin_lock_irqsave(): both the ordering is wrong (irqs should be disabled _first_), and the wrong flags-saving API was used. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-12-19x86: fix show cpuinfo cpu number always zeroMike Travis
when called by setup_arch) after smp_store_cpu_info() had set it to the correct value. The error shows up in 'cat /proc/cpuinfo' will all cpus = 0. Signed-off-by: Mike Travis <travis@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Suresh B Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-12-19x86_32: disable_pse must be __cpuinitdataAdrian Bunk
CONFIG_HOTPLUG_CPU=y: WARNING: vmlinux.o(.text+0xfa52): Section mismatch: reference to .init.data:disable_pse (between 'identify_cpu' and 'identify_secondary_cpu') [ akpm@linux-foundation.org: initializer fix. ] Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-12-19x86_32: select_idle_routine() must be __cpuinitAdrian Bunk
CONFIG_HOTPLUG_CPU=y: WARNING: vmlinux.o(.text+0x1199a): Section mismatch: reference to .init.text.5:select_idle_routine (between 'init_intel' and 'init_nexgen') Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-12-19x86 smpboot_32.c section fixesAdrian Bunk
CONFIG_HOTPLUG_CPU=y: WARNING: vmlinux.o(.text+0x22c60): Section mismatch: reference to .init.data:cpu_idle_tasks (between 'do_boot_cpu' and 'do_warm_boot_cpu') WARNING: vmlinux.o(.text+0x22c99): Section mismatch: reference to .init.data:cpu_idle_tasks (between 'do_boot_cpu' and 'do_warm_boot_cpu') WARNING: vmlinux.o(.text+0x2359b): Section mismatch: reference to .init.data:smp_b_stepping (between 'smp_store_cpu_info' and 'cpu_exit_clear') WARNING: vmlinux.o(.text+0x235a0): Section mismatch: reference to .init.data:smp_b_stepping (between 'smp_store_cpu_info' and 'cpu_exit_clear') Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-12-19x86 apic_32.c section fixAdrian Bunk
CONFIG_HOTPLUG_CPU=y: WARNING: vmlinux.o(.text+0x2390d): Section mismatch: reference to .init.text.5:setup_local_APIC (between 'start_secondary' and 'check_tsc_warp') Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-12-18x86: fix "Kernel panic - not syncing: IO-APIC + timer doesn't work!"Ingo Molnar
this is the tale of a full day spent debugging an ancient but elusive bug. after booting up thousands of random .config kernels, i finally happened to generate a .config that produced the following rare bootup failure on 32-bit x86: | ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1 | ..MP-BIOS bug: 8254 timer not connected to IO-APIC | ...trying to set up timer (IRQ0) through the 8259A ... failed. | ...trying to set up timer as Virtual Wire IRQ... failed. | ...trying to set up timer as ExtINT IRQ... failed :(. | Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug | and send a report. Then try booting with the 'noapic' option this bug has been reported many times during the years, but it was never reproduced nor fixed. the bug that i hit was extremely sensitive to .config details. First i did a .config-bisection - suspecting some .config detail. That led to CONFIG_X86_MCE: enabling X86_MCE magically made the bug disappear and the system would boot up just fine. Debugging my way through the MCE code ended up identifying two unlikely candidates: the thing that made a real difference to the hang was that X86_MCE did two printks: Intel machine check architecture supported. Intel machine check reporting enabled on CPU#1. Adding the same printks to a !CONFIG_X86_MCE kernel made the bug go away! this left timing as the main suspect: i experimented with adding various udelay()s to the arch/x86/kernel/io_apic_32.c:check_timer() function, and the race window turned out to be narrower than 30 microseconds (!). That made debugging especially funny, debugging without having printk ability before the bug hits is ... interesting ;-) eventually i started suspecting IRQ activities - those are pretty much the only thing that happen this early during bootup and have the timescale of a few dozen microseconds. Also, check_timer() changes the IRQ hardware in various creative ways, so the main candidate became IRQ0 interaction. i've added a counter to track timer irqs (on which core they arrived, at what exact time, etc.) and found that no timer IRQ would arrive after the bug condition hits - even if we re-enable IRQ0 and re-initialize the i8259A, but that we'd get a small number of timer irqs right around the time when we call the check_timer() function. Eventually i got the following backtrace triggered from debug code in the timer interrupt: ...trying to set up timer as Virtual Wire IRQ... failed. ...trying to set up timer as ExtINT IRQ... Pid: 1, comm: swapper Not tainted (2.6.24-rc5 #57) EIP: 0060:[<c044d57e>] EFLAGS: 00000246 CPU: 0 EIP is at _spin_unlock_irqrestore+0x5/0x1c EAX: c0634178 EBX: 00000000 ECX: c4947d63 EDX: 00000246 ESI: 00000002 EDI: 00010031 EBP: c04e0f2e ESP: f7c41df4 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: ffe04000 CR3: 00630000 CR4: 000006d0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 [<c05f5784>] setup_IO_APIC+0x9c3/0xc5c the spin_unlock() was called from init_8259A(). Wait ... we have an IRQ0 entry while we are in the middle of setting up the local APIC, the i8259A and the PIT?? That is certainly not how it's supposed to work! check_timer() was supposed to be called with irqs turned off - but this eroded away sometime in the past. This code would still work most of the time because this code runs very quickly, but just the right timing conditions are present and IRQ0 hits in this small, ~30 usecs window, timer irqs stop and the system does not boot up. Also, given how early this is during bootup, the hang is very deterministic - but it would only occur on certain machines (and certain configs). The fix was quite simple: disable/restore interrupts properly in this function. With that in place the test-system now boots up just fine. (64-bit x86 io_apic_64.c had the same bug.) Phew! One down, only 1500 other kernel bugs are left ;-) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>