aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/tsc_32.c
AgeCommit message (Collapse)Author
2008-06-19x86, 32-bit: fix boot failure on TSC-less processorsMikael Pettersson
Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6" (invalid opcode) and a kernel halt immediately after the kernel has been uncompressed. The BUG shows EIP pointing to an rdtsc instruction in native_read_tsc(), invoked from native_sched_clock(). (This error occurs so early that not even the serial console can capture it.) A bisection showed that this bug first occurs in 2.6.26-rc3-git7, via commit 9ccc906c97e34fd91dc6aaf5b69b52d824386910: >x86: distangle user disabled TSC from unstable > >tsc_enabled is set to 0 from the command line switch "notsc" and from >the mark_tsc_unstable code. Seperate those functionalities and replace >tsc_enable with tsc_disable. This makes also the native_sched_clock() >decision when to use TSC understandable. > >Preparatory patch to solve the sched_clock() issue on 32 bit. > >Signed-off-by: Thomas Gleixner <tglx@linutronix.de> The core reason for this bug is that native_sched_clock() gets called before tsc_init(). Before the commit above, tsc_32.c used a "tsc_enabled" variable which defaulted to 0 == disabled, and which only got enabled late in tsc_init(). Thus early calls to native_sched_clock() would skip the TSC and use jiffies instead. After the commit above, tsc_32.c uses a "tsc_disabled" variable which defaults to 0, meaning that the TSC is Ok to use. Early calls to native_sched_clock() now erroneously try to use the TSC on !cpu_has_tsc processors, leading to invalid opcode exceptions. My proposed fix is to initialise tsc_disabled to a "soft disabled" state distinct from the hard disabled state set up by the "notsc" kernel option. This fixes the native_sched_clock() problem. It also allows tsc_init() to be simplified: instead of setting tsc_disabled = 1 on every error return, we just set tsc_disabled = 0 once when all checks have succeeded. I've verified that this lets my 486 boot again. I've also verified that a Core2 machine still uses the TSC as clocksource after the patch. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-23x86: disable TSC for sched_clock() when calibration failedThomas Gleixner
When the TSC calibration fails then TSC is still used in sched_clock(). Disable it completely in that case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
2008-05-23x86: distangle user disabled TSC from unstableThomas Gleixner
tsc_enabled is set to 0 from the command line switch "notsc" and from the mark_tsc_unstable code. Seperate those functionalities and replace tsc_enable with tsc_disable. This makes also the native_sched_clock() decision when to use TSC understandable. Preparatory patch to solve the sched_clock() issue on 32 bit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-19x86: tsc prevent time going backwardsThomas Gleixner
We already catch most of the TSC problems by sanity checks, but there is a subtle bug which has been in the code forever. This can cause time jumps in the range of hours. This was reported in: http://lkml.org/lkml/2007/8/23/96 and http://lkml.org/lkml/2008/3/31/23 I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the sync check. Still there exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well. CPU 0 updates the clock source variables under xtime/vyscall lock and CPU1, where the TSC is slighty behind CPU0, is reading the time right after the seqlock was unlocked. The clocksource reference data was updated with the TSC from CPU0 and the value which is read from TSC on CPU1 is less than the reference data. This results in a huge delta value due to the unsigned subtraction of the TSC value and the reference value. This algorithm can not be changed due to the support of wrapping clock sources like pm timer. The huge delta is converted to nanoseconds and added to xtime, which is then observable by the caller. The next gettimeofday call on CPU1 will show the correct time again as now the TSC has advanced above the reference value. To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value. I pondered to mark the TSC unstable when the readout is smaller than the reference value, but this would render an otherwise good and fast clocksource unusable without a real good reason. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-19x86: clean up =0 initializations in arch/x86/kernel/tsc_32.cPavel Machek
Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-17x86: set_cyc2ns_scale() remove prev scaleIngo Molnar
Peter Zijlstra pointed out that it's unused. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-17x86: if we cannot calibrate the TSC, we panic.Rusty Russell
The current tsc_init() clears the TSC feature bit if the TSC khz cannot be calculated, causing us to panic in arch/x86/kernel/cpu/bugs.c check_config(). We should simply mark it unstable. Frankly, someone should take an axe to this code. mark_tsc_unstable() not only marks it unstable, but sets tsc_enabled to 0, which seems redundant but is actually important here because means it won't be used by sched_clock() either. Perhaps a tristate enum "UNUSABLE, UNSTABLE, OK" would be clearer, and separate mark_tsc_unstable() and mark_tsc_broken() functions? Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-07x86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier()Karsten Wiese
In time_cpufreq_notifier() the cpu id to act upon is held in freq->cpu. Use it instead of smp_processor_id() in the call to set_cyc2ns_scale(). This makes the preempt_*able() unnecessary and lets set_cyc2ns_scale() update the intended cpu's cyc2ns. Related mail/thread: http://lkml.org/lkml/2007/12/7/130 Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-07revert "x86: tsc prevent time going backwards"Ingo Molnar
revert: | commit 47001d603375f857a7fab0e9c095d964a1ea0039 | Author: Thomas Gleixner <tglx@linutronix.de> | Date: Tue Apr 1 19:45:18 2008 +0200 | | x86: tsc prevent time going backwards it has been identified to cause suspend regression - and the commit fixes a longstanding bug that existed before 2.6.25 was opened - so it can wait some more until the effects are better understood. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04x86: tsc prevent time going backwardsThomas Gleixner
We already catch most of the TSC problems by sanity checks, but there is a subtle bug which has been in the code for ever. This can cause time jumps in the range of hours. This was reported in: http://lkml.org/lkml/2007/8/23/96 and http://lkml.org/lkml/2008/3/31/23 I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the sync check. Still there exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well. CPU 0 updates the clock source variables under xtime/vyscall lock and CPU1, where the TSC is slighty behind CPU0, is reading the time right after the seqlock was unlocked. The clocksource reference data was updated with the TSC from CPU0 and the value which is read from TSC on CPU1 is less than the reference data. This results in a huge delta value due to the unsigned subtraction of the TSC value and the reference value. This algorithm can not be changed due to the support of wrapping clock sources like pm timer. The huge delta is converted to nanoseconds and added to xtime, which is then observable by the caller. The next gettimeofday call on CPU1 will show the correct time again as now the TSC has advanced above the reference value. To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value. I pondered to mark the TSC unstable when the readout is smaller than the reference value, but this would render an otherwise good and fast clocksource unusable without a real good reason. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-26x86: notsc is ignored on common configurationsPavel Machek
notsc is ignored in 32-bit kernels if CONFIG_X86_TSC is on.. which is bad, fix it. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-30x86: convert TSC disabling to generic cpuid disable bitmapAndi Kleen
Fix from: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: allow TSC clock source on AMD Fam10h and some cleanupAndi Kleen
After a lot of discussions with AMD it turns out that TSC on Fam10h CPUs is synchronized when the CONSTANT_TSC cpuid bit is set. Or rather that if there are ever systems where that is not true it would be their BIOS' task to disable the bit. So finally use TSC gettimeofday on Fam10h by default. Or rather it is always used now on CPUs where the AMD specific CONSTANT_TSC bit is set. This gives a nice speed bost for gettimeofday() on these systems which tends to be by far the most common v/syscall. On a Fam10h system here TSC gtod uses about 20% of the CPU time of acpi_pm based gtod(). This was measured on 32bit, on 64bit it is even better because TSC gtod() can use a vsyscall and stay in ring 3, which acpi_pm doesn't. The Intel check simply checks for CONSTANT_TSC too without hardcoding Intel vendor. This is equivalent on 64bit because all 64bit capable Intel CPUs will have CONSTANT_TSC set. On Intel there is no CPU supplied CONSTANT_TSC bit currently, but we synthesize one based on hardcoded knowledge which steppings have p-state invariant TSC. So the new logic is now: On CPUs which have the AMD specific CONSTANT_TSC bit set or on Intel CPUs which are new enough to be known to have p-state invariant TSC always use TSC based gettimeofday() Cc: lenb@kernel.org Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-01-30x86: scale cyc_2_nsec according to CPU frequencyGuillaume Chazarain
scale the sched_clock() cyc_2_nsec scaling factor according to CPU frequency changes. [ mingo@elte.hu: simplified it and fixed it for SMP. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-23x86: fix more TSC clock source calibration errorsDave Johnson
The previous patch wasn't correctly handling the 'count' variable. If a CPU gave bad results on the 1st or 2nd run but good results on the 3rd, it wouldn't do the correct thing. No idea if any such CPU exists, but the patch below handles that case by discarding the bad runs. If a bad result (too quick, or too slow) occurs on any of the 3 runs it will be discarded. Also updated some comments to explain what's going on. Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-23x86: fix TSC clock source calibration errorDave Johnson
I ran into this problem on a system that was unable to obtain NTP sync because the clock was running very slow (over 10000ppm slow). ntpd had declared all of its peers 'reject' with 'peer_dist' reason. On investigation, the tsc_khz variable was significantly incorrect causing xtime to run slow. After a reboot tsc_khz was correct so I did a reboot test to see how often the problem occurred: Test was done on a 2000 Mhz Xeon system. Of 689 reboots, 8 of them had unacceptable tsc_khz values (>500ppm): range of tsc_khz # of boots % of boots ---------------- ---------- ---------- < 1999750 0 0.000% 1999750 - 1999800 21 3.048% 1999800 - 1999850 166 24.128% 1999850 - 1999900 241 35.029% 1999900 - 1999950 211 30.669% 1999950 - 2000000 42 6.105% 2000000 - 2000000 0 0.000% 2000050 - 2000100 0 0.000% [...] 2000100 - 2015000 1 0.145% << BAD 2015000 - 2030000 6 0.872% << BAD 2030000 - 2045000 1 0.145% << BAD 2045000 < 0 0.000% The worst boot was 2032.577 Mhz, over 1.5% off! It appears that on rare occasions, mach_countup() is taking longer to complete than necessary. I suspect that this is caused by the CPU taking a periodic SMI interrupt right at the end of the 30ms calibration loop. This would cause the loop to delay while the SMI BIOS hander runs. The resulting TSC value is beyond what it actually should be resulting in a higher tsc_khz. The below patch makes native_calculate_cpu_khz() take the best (shortest duration, lowest khz) run of it's 3 calibration loops. If a SMI goes off causing a bad result (long duration, higher khz) it will be discarded. With the patch applied, 300 boots of the same system produce good results: range of tsc_khz # of boots % of boots ---------------- ---------- ---------- < 1999750 0 0.000% 1999750 - 1999800 30 10.000% 1999800 - 1999850 166 55.333% 1999850 - 1999900 89 29.667% 1999900 - 1999950 15 5.000% 1999950 < 0 0.000% Problem was found and tested against 2.6.18. Patch is against 2.6.22. Signed-off-by: Dave Johnson <djohnson@sw.starentnetworks.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits) fix do_sys_open() prototype sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake Documentation: Fix typo in SubmitChecklist. Typo: depricated -> deprecated Add missing profile=kvm option to Documentation/kernel-parameters.txt fix typo about TBI in e1000 comment proc.txt: Add /proc/stat field small documentation fixes Fix compiler warning in smount example program from sharedsubtree.txt docs/sysfs: add missing word to sysfs attribute explanation documentation/ext3: grammar fixes Documentation/java.txt: typo and grammar fixes Documentation/filesystems/vfs.txt: typo fix include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros trivial copy_data_pages() tidy up Fix typo in arch/x86/kernel/tsc_32.c file link fix for Pegasus USB net driver help remove unused return within void return function Typo fixes retrun -> return x86 hpet.h: remove broken links ...
2007-10-20Fix typo in arch/x86/kernel/tsc_32.cJosh Triplett
Signed-off-by: Josh Triplett <josh@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-20spelling fixes: arch/i386/Simon Arlott
Spelling fixes in arch/i386/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19x86: convert cpuinfo_x86 array to a per_cpu arrayMike Travis
cpu_data is currently an array defined using NR_CPUS. This means that we overallocate since we will rarely really use maximum configured cpus. When NR_CPU count is raised to 4096 the size of cpu_data becomes 3,145,728 bytes. These changes were adopted from the sparc64 (and ia64) code. An additional field was added to cpuinfo_x86 to be a non-ambiguous cpu index. This corresponds to the index into a cpumask_t as well as the per_cpu index. It's used in various places like show_cpuinfo(). cpu_data is defined to be the boot_cpu_data structure for the NON-SMP case. Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-17x86: do not crash on non-Geode PCs in TSC probeIngo Molnar
with this fix Geode kernels can be booted (and QA-ed) on generic PCs. otherwise it crashes and burns during early bootup: Detected 2160.212 MHz processor. general protection fault: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 EIP: 0060:[<c09071f6>] Not tainted VLI EFLAGS: 00010002 (2.6.23-rc9 #90) EIP is at tsc_init+0xa6/0x150 eax: 00000001 ebx: c1dce000 ecx: 00001900 edx: 00000001 esi: 00051000 edi: 00051000 ebp: c08fdfc4 esp: c08fdfa4 ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068 Process swapper (pid: 0, ti=c08fc000 task=c082a180 task.ti=c08fc000) Stack: c076b870 00000870 000000d4 0000001d c0831e80 c1dce000 00051000 00051000 c08fdfcc c09053f8 c08fdff8 c09045ff 000001e2 c09040a0 00051000 00000020 0004e500 c0932140 00020800 00099800 c08ed000 01409007 00000000 Call Trace: [<c010517a>] show_trace_log_lvl+0x1a/0x30 [<c0105246>] show_stack_log_lvl+0xb6/0x100 [<c0105732>] show_registers+0x212/0x3a0 [<c0105aa4>] die+0x104/0x220 [<c0105f5f>] do_general_protection+0x1ef/0x2b0 [<c06699f2>] error_code+0x72/0x78 [<c09053f8>] time_init+0x8/0x20 [<c09045ff>] start_kernel+0x1af/0x320 [<00000000>] 0x0 ======================= Code: 31 d2 b8 00 00 09 3d f7 35 2c 70 9b c0 a3 04 95 8f c0 e8 ce 4e 99 ff b8 e0 45 93 c0 e8 94 b1 c5 ff e8 7f 3d 80 ff b9 00 19 00 00 <0f> 32 f6 c4 01 74 07 83 25 24 ce 82 c0 fd 8b 0d 20 ce 82 c0 b8 EIP: [<c09071f6>] tsc_init+0xa6/0x150 SS:ESP 0068:c08fdfa4 Kernel panic - not syncing: Attempted to kill the idle task! Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-13Delete filenames in comments.Dave Jones
Since the x86 merge, lots of files that referenced their own filenames are no longer correct. Rather than keep them up to date, just delete them, as they add no real value. Additionally: - fix up comment formatting in scx200_32.c - Remove a credit from myself in setup_64.c from a time when we had no SCM - remove longwinded history from tsc_32.c which can be figured out from git. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-11Merge branch 'dmi-const' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6 * 'dmi-const' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/misc-2.6: drivers/firmware: const-ify DMI API and internals
2007-10-11i386: move kernelThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>