aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2008-03-25KVM: MMU: Fix memory leak on guest demand faultsAvi Kivity
While backporting 72dc67a69690288538142df73a7e3ac66fea68dc, a gfn_to_page() call was duplicated instead of moved (due to an unrelated patch not being present in mainline). This caused a page reference leak, resulting in a fairly massive memory leak. Fix by removing the extraneous gfn_to_page() call. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-25KVM: VMX: convert init_rmode_tss() to slots_lockMarcelo Tosatti
init_rmode_tss was forgotten during the conversion from mmap_sem to slots_lock. INFO: task qemu-system-x86:3748 blocked for more than 120 seconds. Call Trace: [<ffffffff8053d100>] __down_read+0x86/0x9e [<ffffffff8053fb43>] do_page_fault+0x346/0x78e [<ffffffff8053d235>] trace_hardirqs_on_thunk+0x35/0x3a [<ffffffff8053dcad>] error_exit+0x0/0xa9 [<ffffffff8035a7a7>] copy_user_generic_string+0x17/0x40 [<ffffffff88099a8a>] :kvm:kvm_write_guest_page+0x3e/0x5f [<ffffffff880b661a>] :kvm_intel:init_rmode_tss+0xa7/0xf9 [<ffffffff880b7d7e>] :kvm_intel:vmx_vcpu_reset+0x10/0x38a [<ffffffff8809b9a5>] :kvm:kvm_arch_vcpu_setup+0x20/0x53 [<ffffffff8809a1e4>] :kvm:kvm_vm_ioctl+0xad/0x1cf [<ffffffff80249dea>] __lock_acquire+0x4f7/0xc28 [<ffffffff8028fad9>] vfs_ioctl+0x21/0x6b [<ffffffff8028fd75>] do_vfs_ioctl+0x252/0x26b [<ffffffff8028fdca>] sys_ioctl+0x3c/0x5e [<ffffffff8020b01b>] system_call_after_swapgs+0x7b/0x80 Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-25KVM: MMU: handle page removal with shadow mappingMarcelo Tosatti
Do not assume that a shadow mapping will always point to the same host frame number. Fixes crash with madvise(MADV_DONTNEED). [avi: move after first printk(), add another printk()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-25KVM: MMU: Fix is_rmap_pte() with io ptesAvi Kivity
is_rmap_pte() doesn't take into account io ptes, which have the avail bit set. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-25KVM: VMX: Restore tss even on x86_64Avi Kivity
The vmx hardware state restore restores the tss selector and base address, but not its length. Usually, this does not matter since most of the tss contents is within the default length of 0x67. However, if a process is using ioperm() to grant itself I/O port permissions, an additional bitmap within the tss, but outside the default length is consulted. The effect is that the process will receive a SIGSEGV instead of transparently accessing the port. Fix by restoring the tss length. Note that i386 had this working already. Closes bugzilla 10246. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-24x86-32: Pass the full resource data to ioremap()Linus Torvalds
It appears that 64-bit PCI resources cannot possibly ever have worked on x86-32 even when the RESOURCES_64BIT config option was set, because any driver that tried to [pci_]ioremap() the resource would have been unable to do so because the high 32 bits would have been silently dropped on the floor by the ioremap() routines that only used "unsigned long". Change them to use "resource_size_t" instead, which properly encodes the whole 64-bit resource data if RESOURCES_64BIT is enabled. Acked-by: H. Peter Anvin <hpa@kernel.org> Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-22x86: revert: reserve dma32 early for gartThomas Gleixner
Revert commit f62f1fc9ef94f74fda2b456d935ba2da69fa0a40 Author: Yinghai Lu <yhlu.kernel@gmail.com> Date: Fri Mar 7 15:02:50 2008 -0800 x86: reserve dma32 early for gart The patch has a dependency on bootmem modifications which are not .25 material that late in the -rc cycle. The problem which is addressed by the patch is limited to machines with 256G and more memory booted with NUMA disabled. This is not a .25 regression and the audience which is affected by this problem is very limited, so it's safer to do the revert than pulling in intrusive bootmem changes right now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86_64: free_bootmem should take physYinghai Lu
so use nodedata_phys directly. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86: trim mtrr don't close gap for resource allocation.Yinghai Lu
fix the bug reported here: http://bugzilla.kernel.org/show_bug.cgi?id=10232 use update_memory_range() instead of add_memory_range() directly to avoid closing the gap. ( the new code only affects and runs on systems where the MTRR workaround triggers. ) Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86: fix reboot problem with Dell Optiplex 745, 0KW626 boardHeinz-Ado Arnolds
we have seen a little problem in rebooting Dell Optiplex 745 with the 0KW626 board. Here is a small patch enabling reboot with this board, which forces the default reboot path it into the BIOS reboot mode. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86: fix fault_msg nul terminationJiri Slaby
The fault_msg text is not explictly nul terminated now in startup assembly. Do so by converting .ascii to .asciz. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86: fix long standing bug with usb after hibernation with 4GB ramPavel Machek
aperture_64.c takes a piece of memory and makes it into iommu window... but such window may not be saved by swsusp -- that leads to oops during hibernation. Signed-off-by: Pavel Machek <pavel@suse.cz> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86: hpet clock enable quirk on nVidia nForce 430Zbigniew Luszpinski
this patch allows hpet=force on nVidia nForce 430 southbridge. This patch was tested by me on my old Asus A8N-VM CSM (where bios does not support hpet and does not advertise it via acpi entry). My nForce430 version: lspci -nn | grep LPC 00:0a.0 ISA bridge [0601]: nVidia Corporation MCP51 LPC Bridge [10de:0260] (rev a2) Kernel 2.6.24.3 after patching and using hpet=force reports this: dmesg | grep -i hpet Kernel command line: root=/dev/sda8 ro vga=773 video=vesafb:mtrr:4,ywrap vt.default_utf8=0 hpet=force Force enabled HPET at base address 0xfed00000 hpet clockevent registered Time: hpet clocksource has been installed. grep -i hpet /proc/timer_list Clock Event Device: hpet set_next_event: hpet_legacy_next_event set_mode: hpet_legacy_set_mode grep Clock /proc/timer_list (before patching) Clock Event Device: pit Clock Event Device: lapic grep Clock /proc/timer_list (after patching) Clock Event Device: hpet Clock Event Device: lapic Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86: reserve dma32 early for gartYinghai Lu
a system with 256 GB of RAM, when NUMA is disabled crashes the following way: Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Cannot allocate aperture memory hole (ffff8101c0000000,65536K) Kernel panic - not syncing: Not enough memory for aperture Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33 Call Trace: [<ffffffff84037c62>] panic+0xb2/0x190 [<ffffffff840381fc>] ? release_console_sem+0x7c/0x250 [<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90 [<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50 [<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680 [<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310 [<ffffffff84506a2f>] ? _etext+0x0/0x1 [<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40 [<ffffffff847ac795>] mem_init+0x45/0x1a0 [<ffffffff8479ff35>] start_kernel+0x295/0x380 [<ffffffff8479f1c2>] _sinittext+0x1c2/0x230 the root cause is : memmap PMD is too big, [ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0 almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G. solution will be: 1. make memmap allocation get memory above 4G... 2. reserve some dma32 range early before we try to set up memmap for all. and release that before pci_iommu_alloc, so gart or swiotlb could get some range under 4g limit for sure. the patch is using method 2. because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP will get Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ 4000000 Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init) Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86: add the DFF (Desktop Form Factor) Dell Optiplex 745 to the reboot ↵Coleman Kane
errata list We recently got some of the "Desktop Form Factor" Optiplex 745's in. I noticed that there's an entry for the SFF one's, but the BIOS model number of the DFF differs from that of the SFF. We have been reliably experiencing the same (as far as I can tell) reboot bug as the SFF boxes. Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86/visws: fix printk format warningsRandy Dunlap
Fix visws printk format warnings: /local/linsrc/linux-2.6.24-git15/arch/x86/mach-visws/traps.c:50: warning: format '%#lx' expects type 'long unsigned int', but argument 2 has type 'u32' /local/linsrc/linux-2.6.24-git15/arch/x86/mach-visws/traps.c:50: warning: format '%#lx' expects type 'long unsigned int', but argument 3 has type 'u32' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86: tight online check in setup_per_cpu_areasYinghai Lu
when numa disabled I got this compile warning: arch/x86/kernel/setup64.c: In function setup_per_cpu_areas: arch/x86/kernel/setup64.c:147: warning: the address of contig_page_data will always evaluate as true it seems we missed checking if the node is online before we try to refer NODE_DATA. Fix it. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-21x86: fix dma_alloc_pagesYinghai Lu
memory-less node support: this patch uses updated dev_to_node, because dev_to_node already makes sure it returns an online node. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-12documentation: Move power-related files to Documentation/power/Randy Dunlap
Move 00-INDEX entries to power/00-INDEX (and add entry for pm_qos_interface.txt). Update references to moved filenames. Fix some trailing whitespace. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Len Brown <len.brown@intel.com>
2008-03-11x86: remove quicklistsThomas Gleixner
quicklists cause a serious memory leak on 32-bit x86, as documented at: http://bugzilla.kernel.org/show_bug.cgi?id=9991 the reason is that the quicklist pool is a special-purpose cache that grows out of proportion. It is not accounted for anywhere and users have no way to even realize that it's the quicklists that are causing RAM usage spikes. It was supposed to be a relatively small pool, but as demonstrated by KOSAKI Motohiro, they can grow as large as: Quicklists: 1194304 kB given how much trouble this code has caused historically, and given that Andrew objected to its introduction on x86 (years ago), the best option at this point is to remove them. [ any performance benefits of caching constructed pgds should be implemented in a more generic way (possibly within the page allocator), while still allowing constructed pages to be allocated by other workloads. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-11x86: ia32 syscall restart fixRoland McGrath
The code to restart syscalls after signals depends on checking for a negative orig_ax, and for particular negative -ERESTART* values in ax. These fields are 64 bits and for a 32-bit task they get zero-extended. The syscall restart behavior is lost, a regression from a native 32-bit kernel and from 64-bit tasks' behavior. This patch fixes the problem by doing sign-extension where it matters. For orig_ax, the only time the value should be -1 but winds up as 0x0ffffffff is via a 32-bit ptrace call. So the patch changes ptrace to sign-extend the 32-bit orig_eax value when it's stored; it doesn't change the checks on orig_ax, though it uses the new current_syscall() inline to better document the subtle importance of the used of signedness there. The ax value is stored a lot of ways and it seems hard to get them all sign-extended at their origins. So for that, we use the current_syscall_ret() to sign-extend it only for 32-bit tasks at the time of the -ERESTART* comparisons. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-11x86: ioremap, remove WARN_ON()Ingo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-10fix BIOS PCI config cycle buglet causing ACPI boot regressionIngo Molnar
I figured out another ACPI related regression today. randconfig testing triggered an early boot-time hang on a laptop of mine (32-bit x86, config attached) - the screen was scrolling ACPI AML exceptions [with no serial port and no early debugging available]. v2.6.24 works fine on that laptop with the same .config, so after a few hours of bisection (had to restart it 3 times - other regressions interacted), it honed in on this commit: | 10270d4838bdc493781f5a1cf2e90e9c34c9142f is first bad commit | | Author: Linus Torvalds <torvalds@woody.linux-foundation.org> | Date: Wed Feb 13 09:56:14 2008 -0800 | | acpi: fix acpi_os_read_pci_configuration() misuse of raw_pci_read() reverting this commit ontop of -rc5 gave a correctly booting kernel. But this commit fixes a real bug so the real question is, why did it break the bootup? After quite some head-scratching, the following change stood out: - pci_id->bus = tu8; + pci_id->bus = val; pci_id->bus is defined as u16: struct acpi_pci_id { u16 segment; u16 bus; ... and 'tu8' changed from u8 to u32. So previously we'd unconditionally mask the return value of acpi_os_read_pci_configuration() (raw_pci_read()) to 8 bits, but now we just trust whatever comes back from the PCI access routines and only crop it to 16 bits. But if the high 8 bits of that result contains any noise then we'll write that into ACPI's PCI ID descriptor and confuse the heck out of the rest of ACPI. So lets check the PCI-BIOS code on that theory. We have this codepath for 8-bit accesses (arch/x86/pci/pcbios.c:pci_bios_read()): switch (len) { case 1: __asm__("lcall *(%%esi); cld\n\t" "jc 1f\n\t" "xor %%ah, %%ah\n" "1:" : "=c" (*value), "=a" (result) : "1" (PCIBIOS_READ_CONFIG_BYTE), "b" (bx), "D" ((long)reg), "S" (&pci_indirect)); Aha! The "=a" output constraint puts the full 32 bits of EAX into *value. But if the BIOS's routines set any of the high bits to nonzero, we'll return a value with more set in it than intended. The other, more common PCI access methods (v1 and v2 PCI reads) clear out the high bits already, for example pci_conf1_read() does: switch (len) { case 1: *value = inb(0xCFC + (reg & 3)); which explicitly converts the return byte up to 32 bits and zero-extends it. So zero-extending the result in the PCI-BIOS read routine fixes the regression on my laptop. ( It might fix some other long-standing issues we had with PCI-BIOS during the past decade ... ) Both 8-bit and 16-bit accesses were buggy. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-11lguest: Revert 1ce70c4fac3c3954bd48c035f448793867592bc0, fix real problem.Rusty Russell
Ahmed managed to crash the Host in release_pgd(), which cannot be a Guest bug, and indeed it wasn't. The bug was that handing a 0 as the address of the toplevel page table being manipulated can cause the lookup code in find_pgdir() to return an uninitialized cache entry (we shadow up to 4 top level page tables for each Guest). Commit 37cc8d7f963ba2deec29c9b68716944516a3244f introduced this behaviour in the Guest, uncovering the bug. The patch which he submitted (which removed the /4 from the index calculation) simply ensured that these high-indexed entries hit the early exit path of guest_set_pmd(). But you get lots of segfaults in guest userspace as the PMDs aren't being updated. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-11lguest: Sanitize the lguest clock.Rusty Russell
Now the TSC code handles a zero return from calculate_cpu_khz(), lguest can simply pass through the value it gets from the Host: if non-zero, all the normal TSC code applies. Otherwise (or if the Host really doesn't support TSC), the clocksource code will fall back to the slower but reasonable lguest clock. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-07x86_64: make ptrace always sign-extend orig_ax to 64 bitsRoland McGrath
This makes 64-bit ptrace calls setting the 64-bit orig_ax field for a 32-bit task sign-extend the low 32 bits up to 64. This matches what a 64-bit debugger expects when tracing a 32-bit task. This follows on my "x86_64 ia32 syscall restart fix". This didn't matter until that was fixed. The debugger ignores or zeros the high half of every register slot it sets (including the orig_rax pseudo-register) uniformly. It expects that the setting of the low 32 bits always has the same meaning as a 32-bit debugger setting those same 32 bits with native 32-bit facilities. This never arose before because the syscall restart check never matched any -ERESTART* values due to lack of sign extension. Before that fix, even 32-bit ptrace setting orig_eax to -1 failed to trigger the restart check anyway. So this was never noticed as a regression of 64-bit debuggers vs 32-bit debuggers on the same 64-bit kernel. Signed-off-by: Roland McGrath <roland@redhat.com> [ Changed to just do the sign-extension unconditionally on x86-64, since orig_ax is always just a small integer and doesn't need the full 64-bit range ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-07x86-boot: don't request VBE2 informationPeter Korsgaard
The new x86 setup code (4fd06960f120) broke booting on an old P3/500MHz with an onboard Voodoo3 of mine. After debugging it, it turned out to be caused by the fact that the vesa probing now asks for VBE2 data. Disassembing the video BIOS shows that it overflows the vesa_general_info structure when VBE2 data is requested because the source addresses for the information strings which get strcpy'ed to the buffer lie outside the 32K BIOS code (and hence contain long sequences of 0xff's). E.G.: get_vbe_controller_info: 00002A9C 60 pushaw 00002A9D 1E push ds 00002A9E 0E push cs 00002A9F 1F pop ds 00002AA0 2BC9 sub cx,cx 00002AA2 6626813D56424532 cmp dword [es:di],0x32454256 ; "VBE2" 00002AAA 7501 jnz .1 00002AAC 41 inc cx .1: 00002AAD 51 push cx 00002AAE B91400 mov cx,0x14 00002AB1 BED47F mov si, controller_header 00002AB4 57 push di 00002AB5 F3A4 rep movsb ; copy vbe1.2 header 00002AB7 B9EC00 mov cx,0xec 00002ABA 2AC0 sub al,al 00002ABC F3AA rep stosb ; zero pad remainder 00002ABE 5F pop di 00002ABF E8EB0D call word get_memory 00002AC2 C1E002 shl ax,0x2 00002AC5 26894512 mov [es:di+0x12],ax ; total memory 00002AC9 26C745040003 mov word [es:di+0x4],0x300 ; VBE version 00002ACF 268C4D08 mov [es:di+0x8],cs 00002AD3 268C4D10 mov [es:di+0x10],cs 00002AD7 59 pop cx 00002AD8 E361 jcxz .done ; VBE2 requested? 00002ADA 8D9D0001 lea bx,[di+0x100] 00002ADE 53 push bx 00002ADF 87DF xchg bx,di ; di now points to 2nd half 00002AE1 26C747140001 mov word [es:bx+0x14],0x100 ; sw rev 00002AE7 26897F06 mov [es:bx+0x6],di ; oem string 00002AEB 268C4708 mov [es:bx+0x8],es 00002AEF BE5280 mov si,0x8052 ; oem string 00002AF2 E87A1B call word strcpy 00002AF5 26897F0E mov [es:bx+0xe],di ; video mode list 00002AF9 268C4710 mov [es:bx+0x10],es 00002AFD B91E00 mov cx,0x1e 00002B00 BEE87F mov si,vidmodes 00002B03 F3A5 rep movsw 00002B05 26897F16 mov [es:bx+0x16],di ; oem vendor 00002B09 268C4718 mov [es:bx+0x18],es 00002B0D BE2480 mov si,0x8024 ; oem vendor 00002B10 E85C1B call word strcpy 00002B13 26897F1A mov [es:bx+0x1a],di ; oem product 00002B17 268C471C mov [es:bx+0x1c],es 00002B1B BE3880 mov si,0x8038 ; oem product 00002B1E E84E1B call word strcpy 00002B21 26897F1E mov [es:bx+0x1e],di ; oem product rev 00002B25 268C4720 mov [es:bx+0x20],es 00002B29 BE4580 mov si,0x8045 ; oem product rev 00002B2C E8401B call word strcpy 00002B2F 58 pop ax 00002B30 B90001 mov cx,0x100 00002B33 2BCF sub cx,di 00002B35 03C8 add cx,ax 00002B37 2AC0 sub al,al 00002B39 F3AA rep stosb ; zero pad .done: 00002B3B 1F pop ds 00002B3C 61 popaw 00002B3D B84F00 mov ax,0x4f 00002B40 C3 ret (The full BIOS can be found at http://peter.korsgaard.com/vgabios.bin if interested). The old setup code didn't ask for VBE2 info, and the new code doesn't actually do anything with the extra information, so the fix is to simply not request it. Other BIOS'es might have the same problem. Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-07x86: re-add reboot fixupsIngo Molnar
Jan Beulich noticed that the reboot fixups went missing during reboot.c unification. (commit 4d022e35fd7e07c522c7863fee6f07e53cf3fc14) Geode and a few other rare boards with special reboot quirks are affected. Reported-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-07x86: fix typo in step.cJan Beulich
TIF_DEBUGCTLMSR has no meaning in the actual MSR... Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-07x86: fix merge mistake in i387.cJan Beulich
convert_fxsr_to_user() in 2.6.24's i387_32.c did this, and convert_to_fxsr() also does the inverse, so I assume it's an oversight that it is no longer being done. [ mingo@elte.hu: we encode it this way because there's no space for the 'FPU Last Instruction Opcode' (->fop) field in the legacy user_i387_ia32_struct that PTRACE_GETFPREGS/PTRACE_SETFPREGS uses. it's probably pure legacy - i'd be surprised if any user-space relied on the FPU Last Opcode in any way. But indeed we used to do it previously so the most conservative thing is to preserve that piece of information. ] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-07x86: clear DF before calling signal handlerAurelien Jarno
The Linux kernel currently does not clear the direction flag before calling a signal handler, whereas the x86/x86-64 ABI requires that. Linux had this behavior/bug forever, but this becomes a real problem with gcc version 4.3, which assumes that the direction flag is correctly cleared at the entry of a function. This patches changes the setup_frame() functions to clear the direction before entering the signal handler. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: H. Peter Anvin <hpa@zytor.com>
2008-03-05[CPUFREQ] Remove debugging message from e_powersaverDave Jones
We don't need to printk a message every time we transition. Leave the code there, but ifdef'd out, as it's useful when adding support for new processors. Reported-by: Petr Titěra <P.Titera@century.cz> Signed-off-by: Dave Jones <davej@redhat.com>
2008-03-04Kprobes: indicate kretprobe support in KconfigAnanth N Mavinakayanahalli
Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant architectures with kprobes support. This facilitates easy handling of in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on kretprobes being present in the kernel. Thanks to Sam Ravnborg for helping make the patch more lean. Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04x86: a P4 is a P6 not an i486Hugh Dickins
P4 has been coming out as CPU_FAMILY=4 instead of 6: fix MPENTIUM4 typo. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: x86/xen: fix DomU boot problem x86: not set node to cpu_to_node if the node is not online x86, i387: fix ptrace leakage using init_fpu()
2008-03-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: x86: disable KVM for Voyager and friends KVM: VMX: Avoid rearranging switched guest msrs while they are loaded KVM: MMU: Fix race when instantiating a shadow pte KVM: Route irq 0 to vcpu 0 exclusively KVM: Avoid infinite-frequency local apic timer KVM: make MMU_DEBUG compile again KVM: move alloc_apic_access_page() outside of non-preemptable region KVM: SVM: fix Windows XP 64 bit installation crash KVM: remove the usage of the mmap_sem for the protection of the memory slots. KVM: emulate access to MSR_IA32_MCG_CTL KVM: Make the supported cpuid list a host property rather than a vm property KVM: Fix kvm_arch_vcpu_ioctl_set_sregs so that set_cr0 works properly KVM: SVM: set NM intercept when enabling CR0.TS in the guest KVM: SVM: Fix lazy FPU switching
2008-03-04x86/xen: fix DomU boot problemIan Campbell
Construct Xen guest e820 map with a hole between 640K-1M. It's pure luck that Xen kernels have gotten away with it in the past. The patch below seems like the right thing to do. It certainly boots in a domU without the DMI problem (without any of the other related patches such as Alexander's). Signed-off-by: Ian Campbell <ijc@hellion.org.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Mark McLoughlin <markmc@redhat.com> Acked-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-04x86: not set node to cpu_to_node if the node is not onlineYinghai Lu
resolve boot problem reported by Mel Gorman: http://lkml.org/lkml/2008/2/13/404 init_cpu_to_node will use cpu->apic (from MADT or mptable) and apic->node(from SRAT or AMD config space with k8_bus_64.c) to have cpu->node mapping, and later identify_cpu will overwrite them again...(with nearby_node...) this patch checks if the node is online, otherwise it will not update cpu_node map. so keep cpu_node map to online node before identify_cpu..., to prevent possible error. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-04x86, i387: fix ptrace leakage using init_fpu()Suresh Siddha
This bug got introduced by the recent i387 merge: commit 4421011120b2304e5c248ae4165a2704588aedf1 Author: Roland McGrath <roland@redhat.com> Date: Wed Jan 30 13:31:50 2008 +0100 x86: x86 i387 user_regset Current usage of unlazy_fpu() in ptrace specific routines is wrong. unlazy_fpu() will not init fpu if the task never used math. So the ptrace calls can expose the parent tasks FPU data in some cases. Replace it with the init_fpu() which will init the math state, if the task never used math before. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-04x86: disable KVM for Voyager and friendsRandy Dunlap
Most classic Pentiums don't have hardware virtualization extension, and building kvm with Voyager, Visual Workstation, or NUMAQ generates spurious failures. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
2008-03-04KVM: VMX: Avoid rearranging switched guest msrs while they are loadedAvi Kivity
KVM tries to run as much as possible with the guest msrs loaded instead of host msrs, since switching msrs is very expensive. It also tries to minimize the number of msrs switched according to the guest mode; for example, MSR_LSTAR is needed only by long mode guests. This optimization is done by setup_msrs(). However, we must not change which msrs are switched while we are running with guest msr state: - switch to guest msr state - call setup_msrs(), removing some msrs from the list - switch to host msr state, leaving a few guest msrs loaded An easy way to trigger this is to kexec an x86_64 linux guest. Early during setup, the guest will switch EFER to not include SCE. KVM will stop saving MSR_LSTAR, and on the next msr switch it will leave the guest LSTAR loaded. The next host syscall will end up in a random location in the kernel. Fix by reloading the host msrs before changing the msr list. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-04KVM: MMU: Fix race when instantiating a shadow pteAvi Kivity
For improved concurrency, the guest walk is performed concurrently with other vcpus. This means that we need to revalidate the guest ptes once we have write-protected the guest page tables, at which point they can no longer be modified. The current code attempts to avoid this check if the shadow page table is not new, on the assumption that if it has existed before, the guest could not have modified the pte without the shadow lock. However the assumption is incorrect, as the racing vcpu could have modified the pte, then instantiated the shadow page, before our vcpu regains control: vcpu0 vcpu1 fault walk pte modify pte fault in same pagetable instantiate shadow page lookup shadow page conclude it is old instantiate spte based on stale guest pte We could do something clever with generation counters, but a test run by Marcelo suggests this is unnecessary and we can just do the revalidation unconditionally. The pte will be in the processor cache and the check can be quite fast. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-04KVM: Avoid infinite-frequency local apic timerAvi Kivity
If the local apic initial count is zero, don't start a an hrtimer with infinite frequency, locking up the host. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-04KVM: make MMU_DEBUG compile againMarcelo Tosatti
the cr3 variable is now inside the vcpu->arch structure. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-04KVM: move alloc_apic_access_page() outside of non-preemptable regionMarcelo Tosatti
alloc_apic_access_page() can sleep, while vmx_vcpu_setup is called inside a non preemptable region. Move it after put_cpu(). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-04KVM: SVM: fix Windows XP 64 bit installation crashJoerg Roedel
While installing Windows XP 64 bit wants to access the DEBUGCTL and the last branch record (LBR) MSRs. Don't allowing this in KVM causes the installation to crash. This patch allow the access to these MSRs and fixes the issue. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-04KVM: remove the usage of the mmap_sem for the protection of the memory slots.Izik Eidus
This patch replaces the mmap_sem lock for the memory slots with a new kvm private lock, it is needed beacuse untill now there were cases where kvm accesses user memory while holding the mmap semaphore. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-03x86: revert "x86: CPA: avoid split of alias mappings"Rafael J. Wysocki
Revert: commit 8be8f54bae3453588011cad06363813a5293af53 Author: Thomas Gleixner <tglx@linutronix.de> Date: Sat Feb 23 20:43:21 2008 +0100 x86: CPA: avoid split of alias mappings because it clearly mishandles the case when __change_page_attr(), called from __change_page_attr_set_clr(), changes cpa->processed to 1 and cpa_process_alias(cpa) is executed right after that. This crashes my x86-64 test box early in the boot process (ref. http://bugzilla.kernel.org/show_bug.cgi?id=10140#c4). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-03KVM: emulate access to MSR_IA32_MCG_CTLJoerg Roedel
Injecting an GP when accessing this MSR lets Windows crash when running some stress test tools in KVM. So this patch emulates access to this MSR. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-03-03KVM: Make the supported cpuid list a host property rather than a vm propertyAvi Kivity
One of the use cases for the supported cpuid list is to create a "greatest common denominator" of cpu capabilities in a server farm. As such, it is useful to be able to get the list without creating a virtual machine first. Since the code does not depend on the vm in any way, all that is needed is to move it to the device ioctl handler. The capability identifier is also changed so that binaries made against -rc1 will fail gracefully. Signed-off-by: Avi Kivity <avi@qumranet.com>