Age | Commit message (Collapse) | Author |
|
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Effectively reverting to the pre walk_shadow() version -- but now
with the reusable for_each().
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Eliminating a callback and a useless structure.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Using a for_each loop style removes the need to write callback and nasty
casts.
Implement the walk_shadow() using the for_each_shadow_entry().
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The AMD SVM instruction family all overload the 0f 01 /3 opcode, further
multiplexing on the three r/m bits. But the code decided that anything that
isn't a vmmcall must be an lidt (which shares the 0f 01 /3 opcode, for the
case that mod = 3).
Fix by aborting emulation if this isn't a vmmcall.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
If cr4.pge is cleared, we ought to treat any ptes in the page as non-global.
This allows us to remove the check from set_spte().
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
global bit set (the global bit has no effect if !cr4.pge).
This can only occur on smp with different cr4.pge settings for different
vcpus (since a cr4 change will resync the shadow ptes), but there's no
cost to being correct here.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Instead of "calculating" it on every shadow page allocation, set it once
when switching modes, and copy it when allocating pages.
This doesn't buy us much, but sets up the stage for inheriting more
information related to the mmu setup.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Add the remaining bits to make use of debug registers also for guest
debugging, thus enabling the use of hardware breakpoints and
watchpoints.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
So far KVM only had basic x86 debug register support, once introduced to
realize guest debugging that way. The guest itself was not able to use
those registers.
This patch now adds (almost) full support for guest self-debugging via
hardware registers. It refactors the code, moving generic parts out of
SVM (VMX was already cleaned up by the KVM_SET_GUEST_DEBUG patches), and
it ensures that the registers are properly switched between host and
guest.
This patch also prepares debug register usage by the host. The latter
will (once wired-up by the following patch) allow for hardware
breakpoints/watchpoints in guest code. If this is enabled, the guest
will only see faked debug registers without functionality, but with
content reflecting the guest's modifications.
Tested on Intel only, but SVM /should/ work as well, but who knows...
Known limitations: Trapping on tss switch won't work - most probably on
Intel.
Credits also go to Joerg Roedel - I used his once posted debugging
series as platform for this patch.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
When single-stepping over STI and MOV SS, we must clear the
corresponding interruptibility bits in the guest state. Otherwise
vmentry fails as it then expects bit 14 (BS) in pending debug exceptions
being set, but that's not correct for the guest debugging case.
Note that clearing those bits is safe as we check for interruptibility
based on the original state and do not inject interrupts or NMIs if
guest interruptibility was blocked.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
part, controlling the "main switch" and the single-step feature. The
arch specific part adds an x86 interface for intercepting both types of
debug exceptions separately and re-injecting them when the host was not
interested. Moveover, the foundation for guest debugging via debug
registers is layed.
To signal breakpoint events properly back to userland, an arch-specific
data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
contains the PC, the debug exception, and relevant debug registers to
tell debug events properly apart.
The availability of this new interface is signaled by
KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
provided.
Note that both SVM and VTX are supported, but only the latter was tested
yet. Based on the experience with all those VTX corner case, I would be
fairly surprised if SVM will work out of the box.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
VMX differentiates between processor and software generated exceptions
when injecting them into the guest. Extend vmx_queue_exception
accordingly (and refactor related constants) so that we can use this
service reliably for the new guest debugging framework.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Userspace has to tell the kernel module somehow that nested SVM should be used.
The easiest way that doesn't break anything I could think of is to implement
if (cpuid & svm)
allow write to efer
else
deny write to efer
Old userspaces mask the SVM capability bit, so they don't break.
In order to find out that the SVM capability is set, I had to split the
kvm_emulate_cpuid into a finding and an emulating part.
(introduced in v6)
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Normally setting the SVME bit in EFER is not allowed, as we did
not support SVM. Not since we do, we should also allow enabling
SVM mode.
v2 comes as last patch, so we don't enable half-ready code
v4 introduces a module option to enable SVM
v6 warns that nesting is enabled
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
KVM tries to read the VM_CR MSR to find out if SVM was disabled by
the BIOS. So implement read support for this MSR to make nested
SVM running.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This adds the #VMEXIT intercept, so we return to the level 1 guest
when something happens in the level 2 guest that should return to
the level 1 guest.
v2 implements HIF handling and cleans up exception interception
v3 adds support for V_INTR_MASKING_MASK
v4 uses the host page hsave
v5 removes IOPM merging code
v6 moves mmu code out of the atomic section
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
in the same context as the normal guest CPU would run.
So basically it is implemented the same way, a normal CPU would do it.
We also prepare all intercepts that get OR'ed with the original
intercepts, as we do not allow a level 2 guest to be intercepted less
than the first level guest.
v2 implements the following improvements:
- fixes the CPL check
- does not allocate iopm when not used
- remembers the host's IF in the HIF bit in the hflags
v3:
- make use of the new permission checking
- add support for V_INTR_MASKING_MASK
v4:
- use host page backed hsave
v5:
- remove IOPM merging code
v6:
- save cr4 so PAE l1 guests work
v7:
- return 0 on vmrun so we check the MSRs too
- fix MSR check to use the correct variable
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This implements the VMLOAD and VMSAVE instructions, that usually surround
the VMRUN instructions. Both instructions load / restore the same elements,
so we only need to implement them once.
v2 fixes CPL checking and replaces memcpy by assignments
v3 makes use of the new permission checking
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Implement the hsave MSR, that gives the VCPU a GPA to save the
old guest state in.
v2 allows userspace to save/restore hsave
v4 dummys out the hsave MSR, so we use a host page
v6 remembers the guest's hsave and exports the MSR
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This patch implements the GIF flag and the clgi and stgi instructions that
set this flag. Only if the flag is set (default), interrupts can be received by
the CPU.
To keep the information about that somewhere, this patch adds a new hidden
flags vector. that is used to store information that does not go into the
vmcb, but is SVM specific.
I tried to write some code to make -no-kvm-irqchip work too, but the first
level guest won't even boot with that atm, so I ditched it.
v2 moves the hflags to x86 generic code
v3 makes use of the new permission helper
v6 only enables interrupt_window if GIF=1
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
These are helpers for the nested SVM implementation.
- nsvm_printk implements a debug printk variant
- nested_svm_do calls a handler that can accesses gpa-based memory
v3 makes use of the new permission checker
v6 changes:
- streamline nsvm_debug()
- remove printk(KERN_ERR)
- SVME check before CPL check
- give GP error code
- use new EFER constant
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
MSR_EFER_SVME_MASK, MSR_VM_CR and MSR_VM_HSAVE_PA are set in KVM
specific headers. Linux does have nice header files to collect
EFER bits and MSR IDs, so IMHO we should put them there.
While at it, I also changed the naming scheme to match that
of the other defines.
(introduced in v6)
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The current VINTR intercept setters don't look clean to me. To make
the code easier to read and enable the possibilty to trap on a VINTR
set, this uses a helper function to set the VINTR intercept.
v2 uses two distinct functions for setting and clearing the bit
Acked-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
|
|
Impact: section mismatch fix
Ingo reports these warnings:
> WARNING: vmlinux.o(.text+0x6a288e): Section mismatch in reference from
> the function dmi_alloc() to the function .init.text:extend_brk()
> The function dmi_alloc() references
> the function __init extend_brk().
> This is often because dmi_alloc lacks a __init annotation or the
> annotation of extend_brk is wrong.
dmi_alloc() is a static inline, and so should be immune to this
kind of error. But force it to be inlined and make it __init
anyway, just to be extra sure.
All of dmi_alloc()'s callers are already __init.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <49C6B23C.2040308@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Impact: cleanup
This fixed various signedness issues in setup.c and e820.c:
arch/x86/kernel/setup.c:455:53: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/setup.c:455:53: expected int *pnr_map
arch/x86/kernel/setup.c:455:53: got unsigned int extern [toplevel] *<noident>
arch/x86/kernel/setup.c:639:53: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/setup.c:639:53: expected int *pnr_map
arch/x86/kernel/setup.c:639:53: got unsigned int extern [toplevel] *<noident>
arch/x86/kernel/setup.c:820:54: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/setup.c:820:54: expected int *pnr_map
arch/x86/kernel/setup.c:820:54: got unsigned int extern [toplevel] *<noident>
arch/x86/kernel/e820.c:670:53: warning: incorrect type in argument 3 (different signedness)
arch/x86/kernel/e820.c:670:53: expected int *pnr_map
arch/x86/kernel/e820.c:670:53: got unsigned int [toplevel] *<noident>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
move out msi_ir_chip and ir_ioapic_chip from CONFIG_INTR_REMAP shadow
Fix:
arch/x86/kernel/apic/io_apic.c:1431: warning: ‘msi_ir_chip’ defined but not used
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: cleanup
keep CONFIG_X86_LOCAL_APIC interrupts together to avoid extra ifdef
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: cleanup
SMP and !SMP will use same path for show_interrupts
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: cleanup
- Fix various style issues
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: cleanup
This patch fixes the following sparse warnings:
arch/x86/kernel/apic/io_apic.c:3602:17: warning: symbol 'hpet_msi_type'
was not declared. Should it be static?
arch/x86/kernel/apic/io_apic.c:3467:30: warning: Using plain integer as
NULL pointer
Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
LKML-Reference: <1237741871-5827-2-git-send-email-dmitri.vorobiev@movial.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/linux-2.6-tip into x86/cleanups
|
|
This reverts commit 698609bdcd35d0641f4c6622c83680ab1a6d67cb.
69860 breaks Xen booting, as it relies on head*.S to set up the fixmap
pagetables (as a side-effect of initializing the USB debug port).
Xen, however, does not boot via head*.S, and so the fixmap area is
not initialized.
The specific symptom of the crash is a fault in dmi_scan(), because
the pointer that early_ioremap returns is not actually present.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <49C43A8E.5090203@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Impact: cleanup
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: cleanup
- fix header file issues
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: cleanup
- fix various style problems
- fix header file issues
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: cleanup
- fix various style problems
- fix header file issues
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: cleanup
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: cleanup
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
To reduce the size of the oversized function __get_smp_config()
There should be no impact to functionality.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
smp_read_mpc() and replace_intsrc_all() can use same smp_dump_mptable()
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
|
|
Impact: fix incorrect error message
- IO APIC resource allocation error message contains one too many "be".
- Print the error message iff there are IO APICs in the system.
I've seen this error message for some time on my x86-32 laptop...
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Alan Bartlett <ajb.stxsl@googlemail.com>
LKML-Reference: <200903202100.30789.bzolnier@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Fix mmconfig detection to not assume a single mmconfig space in the
northbridge, paving the way for AMD fam10h + mcp55 CPUs. On those, the
MSR has some range, but the mcp55 pci config will have another one.
Also helps the mcp55 + io55 case, where every one will have one range.
If it is mcp55, exclude the range that is used by CPU MSR, in other
words , if the CPU claims busses 0-255, the range in mcp55 is dropped,
because CPU HW will not route those ranges to mcp55 mmconfig to handle
it.
Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
|
Detect and enable memory-mapped PCI configuration space on the nVidia
MCP55 southbridge. Tested against 2.6.27.4 on an Arista Networks
development board with one MCP55, Coreboot firmware, no ACPI.
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
|
Impact: cleanup
Check alternate signal stack overflow with proper stack pointer.
The stack pointer of the next signal frame is different if that
task has i387 state.
On x86_64, redzone would be included.
No need to check SA_ONSTACK if we're already using alternate signal stack.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Roland McGrath <roland@redhat.com>
LKML-Reference: <49C2874D.3080002@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Many host bridges support a 4k config space, so check them directy
instead of using quirks to add them.
We only need to do this extra check for host bridges at this point,
because only host bridges are known to have extended address space
without also having a PCI-X/PCI-E caps. Other devices with this
property could be done with quirks (if there are any).
As a bonus, we can remove the quirks for AMD host bridges with family
10h and 11h since they're not needed any more.
With this patch, we can get correct pci cfg size of new Intel CPUs/IOHs
with host bridges.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
|
Add the new API pci_enable_msi_block() to allow drivers to
request multiple MSI and reimplement pci_enable_msi in terms of
pci_enable_msi_block. Ensure that the architecture back ends don't
have to know about multiple MSI.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
|
|
This patch changes a VIA PCI quirk to use dev_info() rather than printk().
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgek.org>
|
|
Add new interfaces:
set_pages_array_uc()
set_pages_array_wb()
that can be used change the page attribute for a bunch of pages with
flush etc done once at the end of all the changes. These interfaces
are similar to existing set_memory_array_uc() and set_memory_array_wc().
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: arjan@infradead.org
Cc: eric@anholt.net
Cc: airlied@redhat.com
LKML-Reference: <20090319215358.901545000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|