aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kvm/x86.c
AgeCommit message (Collapse)Author
2009-01-03KVM: change KVM to use IOMMU APIJoerg Roedel
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-12-31KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMIJan Kiszka
There is no point in doing the ready_for_nmi_injection/ request_nmi_window dance with user space. First, we don't do this for in-kernel irqchip anyway, while the code path is the same as for user space irqchip mode. And second, there is nothing to loose if a pending NMI is overwritten by another one (in contrast to IRQs where we have to save the number). Actually, there is even the risk of raising spurious NMIs this way because the reason for the held-back NMI might already be handled while processing the first one. Therefore this patch creates a simplified user space NMI injection interface, exporting it under KVM_CAP_USER_NMI and dropping the old KVM_CAP_NMI capability. And this time we also take care to provide the interface only on archs supporting NMIs via KVM (right now only x86). Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: Consolidate userspace memory capability reporting into common codeAvi Kivity
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: MMU: prepopulate the shadow on invlpgMarcelo Tosatti
If the guest executes invlpg, peek into the pagetable and attempt to prepopulate the shadow entry. Also stop dirty fault updates from interfering with the fork detector. 2% improvement on RHEL3/AIM7. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: MMU: skip global pgtables on sync due to cr3 switchMarcelo Tosatti
Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is important for Linux 32-bit guests with PAE, where the kmap page is marked as global. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: Fix cpuid iteration on multiple leaves per eacNitin A Kamble
The code to traverse the cpuid data array list for counting type of leaves is currently broken. This patches fixes the 2 things in it. 1. Set the 1st counting entry's flag KVM_CPUID_FLAG_STATE_READ_NEXT. Without it the code will never find a valid entry. 2. Also the stop condition in the for loop while looking for the next unflaged entry is broken. It needs to stop when it find one matching entry; and in the case of count of 1, it will be the same entry found in this iteration. Signed-Off-By: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: Fix cpuid leaf 0xb loop terminationNitin A Kamble
For cpuid leaf 0xb the bits 8-15 in ECX register define the end of counting leaf. The previous code was using bits 0-7 for this purpose, which is a bug. Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: Enable Function Level Reset for assigned deviceSheng Yang
Ideally, every assigned device should in a clear condition before and after assignment, so that the former state of device won't affect later work. Some devices provide a mechanism named Function Level Reset, which is defined in PCI/PCI-e document. We should execute it before and after device assignment. (But sadly, the feature is new, and most device on the market now don't support it. We are considering using D0/D3hot transmit to emulate it later, but not that elegant and reliable as FLR itself.) [Update: Reminded by Xiantao, execute FLR after we ensure that the device can be assigned to the guest.] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: allow emulator to adjust rip for emulated pio instructionsGuillaume Thouvenin
If we call the emulator we shouldn't call skip_emulated_instruction() in the first place, since the emulator already computes the next rip for us. Thus we move ->skip_emulated_instruction() out of kvm_emulate_pio() and into handle_io() (and the svm equivalent). We also replaced "return 0" by "break" in the "do_io:" case because now the shadow register state needs to be committed. Otherwise eip will never be updated. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: x86: Fix typo in function nameAmit Shah
get_segment_descritptor_dtable() contains an obvious type. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: Enable MTRR for EPTSheng Yang
The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory type field of EPT entry. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: VMX: Add PAT support for EPTSheng Yang
GUEST_PAT support is a new feature introduced by Intel Core i7 architecture. With this, cpu would save/load guest and host PAT automatically, for EPT memory type in guest depends on MSR_IA32_CR_PAT. Also add save/restore for MSR_IA32_CR_PAT. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: Improve MTRR structureSheng Yang
As well as reset mmu context when set MTRR. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: call kvm_arch_vcpu_reset() instead of the kvm_x86_ops callbackGleb Natapov
Call kvm_arch_vcpu_reset() instead of directly using arch callback. The function does additional things. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: x86: Support for user space injected NMIsJan Kiszka
Introduces the KVM_NMI IOCTL to the generic x86 part of KVM for injecting NMIs from user space and also extends the statistic report accordingly. Based on the original patch by Sheng Yang. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: x86: VCPU with pending NMI is runnabledJan Kiszka
Ensure that a VCPU with pending NMIs is considered runnable. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31KVM: x86: Reset pending/inject NMI state on CPU resetJan Kiszka
CPU reset invalidates pending or already injected NMIs, therefore reset the related state variables. Based on original patch by Gleb Natapov. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-28KVM: Fix guest shared interrupt with in-kernel irqchipSheng Yang
Every call of kvm_set_irq() should offer an irq_source_id, which is allocated by kvm_request_irq_source_id(). Based on irq_source_id, we identify the irq source and implement logical OR for shared level interrupts. The allocated irq_source_id can be freed by kvm_free_irq_source_id(). Currently, we support at most sizeof(unsigned long) different irq sources. [Amit: - rebase to kvm.git HEAD - move definition of KVM_USERSPACE_IRQ_SOURCE_ID to common file - move kvm_request_irq_source_id to the update_irq ioctl] [Xiantao: - Add kvm/ia64 stuff and make it work for kvm/ia64 guests] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-16Merge branch 'kvm-updates/2.6.28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (134 commits) KVM: ia64: Add intel iommu support for guests. KVM: ia64: add directed mmio range support for kvm guests KVM: ia64: Make pmt table be able to hold physical mmio entries. KVM: Move irqchip_in_kernel() from ioapic.h to irq.h KVM: Separate irq ack notification out of arch/x86/kvm/irq.c KVM: Change is_mmio_pfn to kvm_is_mmio_pfn, and make it common for all archs KVM: Move device assignment logic to common code KVM: Device Assignment: Move vtd.c from arch/x86/kvm/ to virt/kvm/ KVM: VMX: enable invlpg exiting if EPT is disabled KVM: x86: Silence various LAPIC-related host kernel messages KVM: Device Assignment: Map mmio pages into VT-d page table KVM: PIC: enhance IPI avoidance KVM: MMU: add "oos_shadow" parameter to disable oos KVM: MMU: speed up mmu_unsync_walk KVM: MMU: out of sync shadow core KVM: MMU: mmu_convert_notrap helper KVM: MMU: awareness of new kvm_mmu_zap_page behaviour KVM: MMU: mmu_parent_walk KVM: x86: trap invlpg KVM: MMU: sync roots on mmu reload ...
2008-10-16misc: replace __FUNCTION__ with __func__Harvey Harrison
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-15KVM: Move device assignment logic to common codeXiantao Zhang
To share with other archs, this patch moves device assignment logic to common parts. Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: x86: Silence various LAPIC-related host kernel messagesJan Kiszka
KVM-x86 dumps a lot of debug messages that have no meaning for normal operation: - INIT de-assertion is ignored - SIPIs are sent and received - APIC writes are unaligned or < 4 byte long (Windows Server 2003 triggers this on SMP) Degrade them to true debug messages, keeping the host kernel log clean for real problems. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: PIC: enhance IPI avoidanceMarcelo Tosatti
The PIC code makes little effort to avoid kvm_vcpu_kick(), resulting in unnecessary guest exits in some conditions. For example, if the timer interrupt is routed through the IOAPIC, IRR for IRQ 0 will get set but not cleared, since the APIC is handling the acks. This means that everytime an interrupt < 16 is triggered, the priority logic will find IRQ0 pending and send an IPI to vcpu0 (in case IRQ0 is not masked, which is Linux's case). Introduce a new variable isr_ack to represent the IRQ's for which the guest has been signalled / cleared the ISR. Use it to avoid more than one IPI per trigger-ack cycle, in addition to the avoidance when ISR is set in get_priority(). Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: MMU: out of sync shadow coreMarcelo Tosatti
Allow guest pagetables to go out of sync. Instead of emulating write accesses to guest pagetables, or unshadowing them, we un-write-protect the page table and allow the guest to modify it at will. We rely on invlpg executions to synchronize individual ptes, and will synchronize the entire pagetable on tlb flushes. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: x86: trap invlpgMarcelo Tosatti
With pages out of sync invlpg needs to be trapped. For now simply nuke the entry. Untested on AMD. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: MMU: sync roots on mmu reloadMarcelo Tosatti
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: don't enter guest after SIPI was received by a CPUGleb Natapov
The vcpu should process pending SIPI message before entering guest mode again. kvm_arch_vcpu_runnable() returns true if the vcpu is in SIPI state, so we can't call it here. Signed-off-by: Gleb Natapov <gleb@qumranet.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: x86.c make kvm_load_realmode_segment staticHarvey Harrison
Noticed by sparse: arch/x86/kvm/x86.c:3591:5: warning: symbol 'kvm_load_realmode_segment' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: switch to get_user_pages_fastMarcelo Tosatti
Convert gfn_to_pfn to use get_user_pages_fast, which can do lockless pagetable lookups on x86. Kernel compilation on 4-way guest is 3.7% faster on VMX. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: Device Assignment: Free device structures if IRQ allocation failsAmit Shah
When an IRQ allocation fails, we free up the device structures and disable the device so that we can unregister the device in the userspace and not expose it to the guest at all. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15KVM: Device Assignment with VT-dBen-Ami Yassour
Based on a patch by: Kay, Allen M <allen.m.kay@intel.com> This patch enables PCI device assignment based on VT-d support. When a device is assigned to the guest, the guest memory is pinned and the mapping is updated in the VT-d IOMMU. [Amit: Expose KVM_CAP_IOMMU so we can check if an IOMMU is present and also control enable/disable from userspace] Signed-off-by: Kay, Allen M <allen.m.kay@intel.com> Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com> Signed-off-by: Amit Shah <amit.shah@qumranet.com> Acked-by: Mark Gross <mgross@linux.intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: x86: unhalt vcpu0 on resetMarcelo Tosatti
Since "KVM: x86: do not execute halted vcpus", HLT by vcpu0 before system reset by the IO thread will hang the guest. Mark vcpu as runnable in such case. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: x86: do not execute halted vcpusMarcelo Tosatti
Offline or uninitialized vcpu's can be executed if requested to perform userspace work. Follow Avi's suggestion to handle halted vcpu's in the main loop, simplifying kvm_emulate_halt(). Introduce a new vcpu->requests bit to indicate events that promote state from halted to running. Also standardize vcpu wake sites. Signed-off-by: Marcelo Tosatti <mtosatti <at> redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Add statistics for guest irq injectionsAvi Kivity
These can help show whether a guest is making progress or not. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: add MC5_MISC msr read supportJoerg Roedel
Currently KVM implements MC0-MC4_MISC read support. When booting Linux this results in KVM warnings in the kernel log when the guest tries to read MC5_MISC. Fix this warnings with this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Allocate guest memory as MAP_PRIVATE, not MAP_SHAREDAvi Kivity
There is no reason to share internal memory slots with fork()ed instances. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Load real mode segments correctlyAvi Kivity
Real mode segments to not reference the GDT or LDT; they simply compute base = selector * 16. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: make irq ack notifier functions staticHarvey Harrison
sparse says: arch/x86/kvm/x86.c:107:32: warning: symbol 'kvm_find_assigned_dev' was not declared. Should it be static? arch/x86/kvm/i8254.c:225:6: warning: symbol 'kvm_pit_ack_irq' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Use kvm_set_irq to inject interruptsAmit Shah
... instead of using the pic and ioapic variants Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Device assignment: Check for privileges before assigning irqAmit Shah
Even though we don't share irqs at the moment, we should ensure regular user processes don't try to allocate system resources. We check for capability to access IO devices (CAP_SYS_RAWIO) before we request_irq on behalf of the guest. Noticed by Avi. Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: set debug registers after "schedulable" sectionMarcelo Tosatti
The vcpu thread can be preempted after the guest_debug_pre() callback, resulting in invalid debug registers on the new vcpu. Move it inside the non-preemptable section. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Reduce stack usage in kvm_arch_vcpu_ioctl()Dave Hansen
[sheng: fix KVM_GET_LAPIC using wrong size] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Reduce kvm stack usage in kvm_arch_vm_ioctl()Dave Hansen
On my machine with gcc 3.4, kvm uses ~2k of stack in a few select functions. This is mostly because gcc fails to notice that the different case: statements could have their stack usage combined. It overflows very nicely if interrupts happen during one of these large uses. This patch uses two methods for reducing stack usage. 1. dynamically allocate large objects instead of putting on the stack. 2. Use a union{} member for all of the case variables. This tricks gcc into combining them all into a single stack allocation. (There's also a comment on this) Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: pci device assignmentBen-Ami Yassour
Based on a patch from: Amit Shah <amit.shah@qumranet.com> This patch adds support for handling PCI devices that are assigned to the guest. The device to be assigned to the guest is registered in the host kernel and interrupt delivery is handled. If a device is already assigned, or the device driver for it is still loaded on the host, the device assignment is failed by conveying a -EBUSY reply to the userspace. Devices that share their interrupt line are not supported at the moment. By itself, this patch will not make devices work within the guest. The VT-d extension is required to enable the device to perform DMA. Another alternative is PVDMA. Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com> Signed-off-by: Weidong Han <weidong.han@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Ignore DEBUGCTL MSRs with no effectAlexander Graf
Netware writes to DEBUGCTL and reads from the DEBUGCTL and LAST*IP MSRs without further checks and is really confused to receive a #GP during that. To make it happy we should just make them stubs, which is exactly what SVM already does. Writes to DEBUGCTL that are vendor-specific are resembled to behave as if the virtual CPU does not know them. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: Clear exception queue before emulating an instructionAvi Kivity
If we're emulating an instruction, either it will succeed, in which case any previously queued exception will be spurious, or we will requeue the same exception. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15KVM: x86: accessors for guest registersMarcelo Tosatti
As suggested by Avi, introduce accessors to read/write guest registers. This simplifies the ->cache_regs/->decache_regs interface, and improves register caching which is important for VMX, where the cost of vmcs_read/vmcs_write is significant. [avi: fix warnings] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29KVM: Advertise synchronized mmu support to userspaceAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29KVM: Allow browsing memslots with mmu_lockAndrea Arcangeli
This allows reading memslots with only the mmu_lock hold for mmu notifiers that runs in atomic context and with mmu_lock held. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-29KVM: Allow reading aliases with mmu_lockAndrea Arcangeli
This allows the mmu notifier code to run unalias_gfn with only the mmu_lock held. Only alias writes need the mmu_lock held. Readers will either take the slots_lock in read mode or the mmu_lock. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>