aboutsummaryrefslogtreecommitdiff
path: root/drivers/kvm/vmx.c
AgeCommit message (Collapse)Author
2008-01-30KVM: Move arch dependent files to new directory arch/x86/kvm/Avi Kivity
This paves the way for multiple architecture support. Note that while ioapic.c could potentially be shared with ia64, it is also moved. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Add printk_ratelimit in vmx_intr_assistRyan Harper
Add printk_ratelimit check in front of printk. This prevents spamming of the message during 32-bit ubuntu 6.06server install. Previously, it would hang during the partition formatting stage. Signed-off-by: Ryan Harper <ryanh@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Move round_robin_prev_vcpu and tss_addr to kvm_archZhang Xiantao
This patches moves two fields round_robin_prev_vcpu and tss to kvm_arch. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Split mmu-related static inline functions to mmu.hZhang Xiantao
Since these functions need to know the details of kvm or kvm_vcpu structure, it can't be put in x86.h. Create mmu.h to hold them. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Introduce kvm_vcpu_archZhang Xiantao
Move all the architecture-specific fields in kvm_vcpu into a new struct kvm_vcpu_arch. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Avoid exit when setting cr8 if the local apic is in the kernelAvi Kivity
With apic in userspace, we must exit to userspace after a cr8 write in order to update the tpr. But if the apic is in the kernel, the exit is unnecessary. Noticed by Joerg Roedel. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Use generalized exception queue for injecting #UDAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Replace #GP injection by the generalized exception queueAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Replace page fault injection by the generalized exception queueAvi Kivity
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Generalize exception injection mechanismAvi Kivity
Instead of each subarch doing its own thing, add an API for queuing an injection, and manage failed exception injection centerally (i.e., if an inject failed due to a shadow page fault, we need to requeue it). Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Remove the secondary execute control dependency on irqchipSheng Yang
The state of SECONDARY_VM_EXEC_CONTROL shouldn't depend on in-kernel IRQ chip, this patch fix this. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Force seg.base == (seg.sel << 4) in real modeJan Kiszka
Ensure that segment.base == segment.selector << 4 when entering the real mode on Intel so that the CPU will not bark at us. This fixes some old protected mode demo from http://www.x86.org/articles/pmbasics/tspec_a1_doc.htm. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Make unloading of FPU state when putting vcpu arch-independentAmit Shah
Instead of having each architecture do it individually, we do this in the arch-independent code (just x86 as of now). [avi: add svm to the mix, which was added to mainline during the 2.6.24-rc process] Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Replace 'light_exits' stat with 'host_state_reload'Avi Kivity
This is a little more accurate (since it counts actual reloads, not potential reloads), and reverses the sense of the statistic to measure a bad event like most of the other stats (e.g. we want to minimize all counters). Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Consolidate register usage in vmx_vcpu_run()Avi Kivity
We pass vcpu, vmx->fail, and vmx->launched to assembly code, but all three are fields within vmx. Consolidate by only passing in vmx and offsets for the rest. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Combine kvm_init and kvm_init_x86Zhang Xiantao
Will be called once arch module registers itself. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: wbinvd exitingEddie Dong
Add wbinvd VM Exit support to prepare for pass-through device cache emulation and also enhance real time responsiveness. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Fix faults during injection of real-mode interruptsAvi Kivity
If vmx fails to inject a real-mode interrupt while fetching the interrupt redirection table, it fails to record this in the vectoring information field. So we detect this condition and do it ourselves. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Read & store IDT_VECTORING_INFO_FIELDAvi Kivity
We'll want to write to it in order to fix real-mode irq injection problems, but it is a read-only field. Storing it in a variable solves that issue. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Use vmx to inject real-mode interruptsAvi Kivity
Instead of injecting real-mode interrupts by writing the interrupt frame into guest memory, abuse vmx by injecting a software interrupt. We need to pretend the software interrupt instruction had a length > 0, so we have to adjust rip backward. This lets us not to mess with writing guest memory, which is complex and also sleeps. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Enable memory mapped TPR shadow (FlexPriority)Sheng Yang
This patch based on CR8/TPR patch, and enable the TPR shadow (FlexPriority) for 32bit Windows. Since TPR is accessed very frequently by 32bit Windows, especially SMP guest, with FlexPriority enabled, we saw significant performance gain. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Move page fault processing to common codeAvi Kivity
The code that dispatches the page fault and emulates if we failed to map is duplicated across vmx and svm. Merge it to simplify further bugfixing. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Let gcc to choose which registers to save (i386)Laurent Vivier
This patch lets GCC to determine which registers to save when we switch to/from a VCPU in the case of intel i386. * Original code saves following registers: eax, ebx, ecx, edx, edi, esi, ebp (using popa) * Patched code: - informs GCC that we modify following registers using the clobber description: ebx, edi, rsi - doesn't save eax because it is an output operand (vmx->fail) - cannot put ecx in clobber description because it is an input operand, but as we modify it and we want to keep its value (vcpu), we must save it (pop/push) - ebp is saved (pop/push) because GCC seems to ignore its use the clobber description. - edx is saved (pop/push) because it is reserved by GCC (REGPARM) and cannot be put in the clobber description. - line "mov (%%esp), %3 \n\t" has been removed because %3 is ecx and ecx is restored just after. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Let gcc to choose which registers to save (x86_64)Laurent Vivier
This patch lets GCC to determine which registers to save when we switch to/from a VCPU in the case of intel x86_64. * Original code saves following registers: rax, rbx, rcx, rdx, rsi, rdi, rbp, r8, r9, r10, r11, r12, r13, r14, r15 * Patched code: - informs GCC that we modify following registers using the clobber description: rbx, rdi, rsi, r8, r9, r10, r11, r12, r13, r14, r15 - doesn't save rax because it is an output operand (vmx->fail) - cannot put rcx in clobber description because it is an input operand, but as we modify it and we want to keep its value (vcpu), we must save it (pop/push) - rbp is saved (pop/push) because GCC seems to ignore its use in the clobber description. - rdx is saved (pop/push) because it is reserved by GCC (REGPARM) and cannot be put in the clobber description. - line "mov (%%rsp), %3 \n\t" has been removed because %3 is rcx and rcx is restored just after. - line ASM_VMX_VMWRITE_RSP_RDX() is moved out of the ifdef/else/endif Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Add ioctl to tss address from userspace,Izik Eidus
Currently kvm has a wart in that it requires three extra pages for use as a tss when emulating real mode on Intel. This patch moves the allocation internally, only requiring userspace to tell us where in the physical address space we can place the tss. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Move vmx_vcpu_reset() out of vmx_vcpu_setup()Avi Kivity
Split guest reset code out of vmx_vcpu_setup(). Besides being cleaner, this moves the realmode tss setup (which can sleep) outside vmx_vcpu_setup() (which is executed with preemption enabled). [izik: remove unused variable] Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Portability: Split kvm_vcpu into arch dependent and independent parts ↵Zhang Xiantao
(part 1) First step to split kvm_vcpu. Currently, we just use an macro to define the common fields in kvm_vcpu for all archs, and all archs need to define its own kvm_vcpu struct. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Move apic timer interrupt backlog processing to common codeAvi Kivity
Beside the obvious goodness of making code more common, this prevents a livelock with the next patch which moves interrupt injection out of the critical section. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: CodingStyle cleanupMike Day
Signed-off-by: Mike D. Day <ncmike@ncultra.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Hoist kvm_create_lapic() into kvm_vcpu_init()Rusty Russell
Move kvm_create_lapic() into kvm_vcpu_init(), rather than having svm and vmx do it. And make it return the error rather than a fairly random -ENOMEM. This also solves the problem that neither svm.c nor vmx.c actually handles the error path properly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Add general accessors to read and write guest memoryIzik Eidus
Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Simplify vcpu_clear()Avi Kivity
Now that smp_call_function_single() knows how to call a function on the current cpu, there's no need to check explicitly. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Don't clear the vmcs if the vcpu is not loaded on any processorAvi Kivity
Noted by Eddie Dong. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Allow not-present guest page faults to bypass kvmAvi Kivity
There are two classes of page faults trapped by kvm: - host page faults, where the fault is needed to allow kvm to install the shadow pte or update the guest accessed and dirty bits - guest page faults, where the guest has faulted and kvm simply injects the fault back into the guest to handle The second class, guest page faults, is pure overhead. We can eliminate some of it on vmx using the following evil trick: - when we set up a shadow page table entry, if the corresponding guest pte is not present, set up the shadow pte as not present - if the guest pte _is_ present, mark the shadow pte as present but also set one of the reserved bits in the shadow pte - tell the vmx hardware not to trap faults which have the present bit clear With this, normal page-not-present faults go directly to the guest, bypassing kvm entirely. Unfortunately, this trick only works on Intel hardware, as AMD lacks a way to discriminate among page faults based on error code. It is also a little risky since it uses reserved bits which might become unreserved in the future, so a module parameter is provided to disable it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: VMX: Further reduce efer reloadsAvi Kivity
KVM avoids reloading the efer msr when the difference between the guest and host values consist of the long mode bits (which are switched by hardware) and the NX bit (which is emulated by the KVM MMU). This patch also allows KVM to ignore SCE (syscall enable) when the guest is running in 32-bit mode. This is because the syscall instruction is not available in 32-bit mode on Intel processors, so the SCE bit is effectively meaningless. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Call x86_decode_insn() only when neededLaurent Vivier
Move emulate_ctxt to kvm_vcpu to keep emulate context when we exit from kvm module. Call x86_decode_insn() only when needed. Modify x86_emulate_insn() to not modify the context if it must be re-entered. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30KVM: Refactor hypercall infrastructure (v3)Anthony Liguori
This patch refactors the current hypercall infrastructure to better support live migration and SMP. It eliminates the hypercall page by trapping the UD exception that would occur if you used the wrong hypercall instruction for the underlying architecture and replacing it with the right one lazily. A fall-out of this patch is that the unhandled hypercalls no longer trap to userspace. There is very little reason though to use a hypercall to communicate with userspace as PIO or MMIO can be used. There is no code in tree that uses userspace hypercalls. [avi: fix #ud injection on vmx] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30x86: get rid of _MASK flagsGlauber de Oliveira Costa
There's no need for the *_MASK flags (TF_MASK, IF_MASK, etc), found in processor.h (both _32 and _64). They have a one-to-one mapping with the EFLAGS value. This patch removes the definitions, and use the already existent X86_EFLAGS_ version when applicable. [ roland@redhat.com: KVM build fixes. ] Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-22KVM: VMX: Force vm86 mode if setting flags during real modeAvi Kivity
When resetting from userspace, we need to handle the flags being cleared even after we are in real mode. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22KVM: VMX: Reset mmu context when entering real modeEddie Dong
Resetting an SMP guest will force AP enter real mode (RESET) with paging enabled in protected mode. While current enter_rmode() can only handle mode switch from nonpaging mode to real mode which leads to SMP reboot failure. Fix by reloading the mmu context on entering real mode. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22KVM: VMX: Handle NMIs before enabling interrupts and preemptionAvi Kivity
This makes sure we handle NMI on the current cpu, and that we don't service maskable interrupts before non-maskable ones. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Improve emulation failure reportingAvi Kivity
Report failed opcodes from all locations. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: VMX: Fix exit qualification width on i386He, Qing
According to Intel Software Developer's Manual, Vol. 3B, Appendix H.4.2, exit qualification should be of natural width. However, current code uses u64 as the data type for this register, which occasionally introduces invalid value to VMExit handling logics. This patch fixes this bug. I have tested Windows and Linux guest on i386 host, and they can boot successfully with this patch. Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Move main vcpu loop into subarch independent codeAvi Kivity
This simplifies adding new code as well as reducing overall code size. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: VMX: Move vm entry failure handling to the exit handlerAvi Kivity
This will help moving the main loop to subarch independent code. Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Rename kvm_arch_ops to kvm_x86_opsChristian Ehrhardt
This patch just renames the current (misnamed) _arch namings to _x86 to ensure better readability when a real arch layer takes place. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: enable in-kernel APIC INIT/SIPI handlingHe, Qing
This patch enables INIT/SIPI handling using in-kernel APIC by introducing a ->mp_state field to emulate the SMP state transition. [avi: remove smp_processor_id() warning] Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Xin Li <xin.b.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Migrate lapic hrtimer when vcpu moves to another cpuEddie Dong
This reduces overhead by accessing cachelines from the wrong node, as well as simplifying locking. [Qing: fix for inactive or expired one-shot timer] Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: Keep track of missed timer irq injectionsEddie Dong
APIC timer IRQ is set every time when a certain period expires at host time, but the guest may be descheduled at that time and thus the irq be overwritten by later fire. This patch keep track of firing irq numbers and decrease only when the IRQ is injected to guest or buffered in APIC. Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-13KVM: VMX: Use shadow TPR/cr8 for 64-bits guestsYang, Sheng
This patch enables TPR shadow of VMX on CR8 access. 64bit Windows using CR8 access TPR frequently. The TPR shadow can improve the performance of access TPR by not causing vmexit. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>