Age | Commit message (Collapse) | Author |
|
Partial revert of commit 129d8bc828e011bda0b7110a097bf3a0167f966e
titled 'x86: don't compile vsmp_64 for 32bit'
Commit reverted to compile vsmp_64.c if CONFIG_X86_64 is defined,
since is_vsmp_box() needs to indicate that TSCs are not synchronized, and
hence, not a valid time source, even when CONFIG_X86_VSMP is not defined.
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: shai@scalex86.org
LKML-Reference: <20090324061429.GH7278@localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Impact: get correct smp_affinity as user requested
The effect of setting desc->affinity (ie. from userspace via sysfs) has
varied over time. In 2.6.27, the 32-bit code anded the value with
cpu_online_map, and both 32 and 64-bit did that anding whenever a cpu
was unplugged.
2.6.29 consolidated this into one routine (and fixed hotplug) but
introduced another variation: anding the affinity with cfg->domain.
We should just set it to what the user said - if possible.
(cpu_mask_to_apicid_and already takes cpu_online_mask into account)
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <49C94DDF.2010703@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Tetsuo Handa reported this link bug:
| arch/x86/mm/built-in.o(.init.text+0x1831): In function `early_ioremap_init':
| : undefined reference to `__this_fixmap_does_not_exist'
| make: *** [.tmp_vmlinux1] Error 1
Commit:8827247ffcc9e880cbe4705655065cf011265157 used a variable (which
would be optimized to constant) as fix_to_virt()'s parameter.
It's depended on gcc's optimization and fails on old gcc. (Tetsuo used gcc 3.3)
We can use __fix_to_vir() instead, because we know it's safe and
don't need link time error reporting.
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Cc: sfr@canb.auug.org.au
LKML-Reference: <49C9FFEA.7060908@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Impact: cleanup
Use online_mask directly on 64bit too.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <49C94DAE.9070300@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Impact: fix bug with irq-descriptor moving when logical flat
Rusty observed:
> The effect of setting desc->affinity (ie. from userspace via sysfs) has varied
> over time. In 2.6.27, the 32-bit code anded the value with cpu_online_map,
> and both 32 and 64-bit did that anding whenever a cpu was unplugged.
>
> 2.6.29 consolidated this into one routine (and fixed hotplug) but introduced
> another variation: anding the affinity with cfg->domain. Is this right, or
> should we just set it to what the user said? Or as now, indicate that we're
> restricting it.
Eric pointed out that desc->affinity should be what the user requested,
if it is at all possible to honor the user space request.
This bug got introduced by commit 22f65d31b "x86: Update io_apic.c to use
new cpumask API".
Fix it by moving the masking to before the descriptor moving ...
Reported-by: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <49C94134.4000408@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/linux-2.6-tiptop into x86/cleanups
|
|
and 'x86/signal'; commit 'v2.6.29' into x86/core
|
|
While looking at the issue in the thread:
http://marc.info/?l=dri-devel&m=123606627824556&w=2
noticed a bug in pci PAT code and memory type setting.
PCI mmap code did not set the proper protection in vma, when it
inherited protection in reserve_memtype. This bug only affects
the case where there exists a WC mapping before X does an mmap
with /proc or /sys pci interface. This will cause X userlevel
mmap from /proc or /sysfs to fail on fork.
Reported-by: Kevin Winchester <kjwinchester@gmail.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: <stable@kernel.org>
LKML-Reference: <20090323190720.GA16831@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
vmx_set_msr() does not allow i386 guests to touch EFER, but they can still
do so through the default: label in the switch. If they set EFER_LME, they
can oops the host.
Fix by having EFER access through the normal channel (which will check for
EFER_LME) even on i386.
Reported-and-tested-by: Benjamin Gilbert <bgilbert@cs.cmu.edu>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
When kvm emulates an invlpg instruction, it can drop a shadow pte, but
leaves the guest tlbs intact. This can cause memory corruption when
swapping out.
Without this the other cpu can still write to a freed host physical page.
tlb smp flush must happen if rmap_remove is called always before mmu_lock
is released because the VM will take the mmu_lock before it can finally add
the page to the freelist after swapout. mmu notifier makes it safe to flush
the tlb after freeing the page (otherwise it would never be safe) so we can do
a single flush for multiple sptes invalidated.
Cc: stable@kernel.org
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Impact: Make symbols static.
Fix this sparse warnings:
arch/x86/kvm/mmu.c:992:5: warning: symbol 'mmu_pages_add' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1124:5: warning: symbol 'mmu_pages_next' was not declared. Should it be static?
arch/x86/kvm/mmu.c:1144:6: warning: symbol 'mmu_pages_clear_parents' was not declared. Should it be static?
arch/x86/kvm/x86.c:2037:5: warning: symbol 'kvm_read_guest_virt' was not declared. Should it be static?
arch/x86/kvm/x86.c:2067:5: warning: symbol 'kvm_write_guest_virt' was not declared. Should it be static?
virt/kvm/irq_comm.c:220:5: warning: symbol 'setup_routing_entry' was not declared. Should it be static?
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Impact: Attribute function with __acquires(...) resp. __releases(...).
Fix this sparse warnings:
arch/x86/kvm/i8259.c:34:13: warning: context imbalance in 'pic_lock' - wrong count at exit
arch/x86/kvm/i8259.c:39:13: warning: context imbalance in 'pic_unlock' - unexpected unlock
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
is_long_mode currently checks the LongModeEnable bit in
EFER instead of the LongModeActive bit. This is wrong, but
we survived this till now since it wasn't triggered. This
breaks guests that go from long mode to compatibility mode.
This is noticed on a solaris guest and fixes bug #1842160
Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
setup_msrs() should be called when entering long mode to save the
shadow state for the 64-bit guest state.
Using vmx_set_efer() in enter_lmode() removes some duplicated code
and also ensures we call setup_msrs(). We can safely pass the value
of shadow_efer to vmx_set_efer() as no other bits in the efer change
while enabling long mode (guest first sets EFER.LME, then sets CR0.PG
which causes a vmexit where we activate long mode).
With this fix, is_long_mode() can check for EFER.LMA set instead of
EFER.LME and 5e23049e86dd298b72e206b420513dbc3a240cd9 can be reverted.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
In the paging_fetch function rmap_remove is called after setting a large
pte to non-present. This causes rmap_remove to not drop the reference to
the large page. The result is a memory leak of that page.
Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
In the segment descriptor _cache_ the accessed bit is always set
(although it can be cleared in the descriptor itself). Since Intel
checks for this condition on a VMENTRY, set this bit in the AMD path
to enable cross vendor migration.
Cc: stable@kernel.org
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Acked-By: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
IRQ injection status is either -1 (if there was no CPU found
that should except the interrupt because IRQ was masked or
ioapic was misconfigured or ...) or >= 0 in that case the
number indicates to how many CPUs interrupt was injected.
If the value is 0 it means that the interrupt was coalesced
and probably should be reinjected.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The assertion no longer makes sense since we don't clear page tables on
allocation; instead we clear them during prefetch.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The following code flow is unnecessary:
if (largepage)
was_rmapped = is_large_pte(*shadow_pte);
else
was_rmapped = 1;
The is_large_pte() function will always evaluate to one here because the
(largepage && !is_large_pte) case is already handled in the first
if-clause. So we can remove this check and set was_rmapped to one always
here.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
kvmclock currently falls apart on machines without constant tsc.
This patch fixes it. Changes:
* keep tsc frequency in a per-cpu variable.
* handle kvmclock update using a new request flag, thus checking
whenever we need an update each time we enter guest context.
* use a cpufreq notifier to track frequency changes and force
kvmclock updates.
* send ipis to kick cpu out of guest context if needed to make
sure the guest doesn't see stale values.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Removed duplicated code.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Looks like neither the direction nor the rep prefix are used anymore.
Drop related evaluations from SVM's and VMX's I/O exit handlers.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
AMD K10 CPUs implement the FFXSR feature that gets enabled using
EFER. Let's check if the virtual CPU description includes that
CPUID feature bit and allow enabling it then.
This is required for Windows Server 2008 in Hyper-V mode.
v2 adds CPUID capability exposure
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
AMD k10 includes support for the FFXSR feature, which leaves out
XMM registers on FXSAVE/FXSAVE when the EFER_FFXSR bit is set in
EFER.
The CPUID feature bit exists already, but the EFER bit is missing
currently, so this patch adds it to the list of known EFER bits.
Signed-off-by: Alexander Graf <agraf@suse.de>
CC: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
IRQ ack notifications assume an identity mapping between pin->gsi,
which might not be the case with, for example, HPET.
Translate before acking.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
|
|
Kconfig symbols are not available in userspace, and are not stripped by
headers-install. Avoid their use by adding #defines in <asm/kvm.h> to
suit each architecture.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Currently KVM has a static routing from GSI numbers to interrupts (namely,
0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to
the IOAPIC). This is insufficient for several reasons:
- HPET requires non 1:1 mapping for the timer interrupt
- MSIs need a new method to assign interrupt numbers and dispatch them
- ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the
ioapics
This patch implements an interrupt routing table (as a linked list, but this
can be easily changed) and a userspace interface to replace the table. The
routing table is initialized according to the current hardwired mapping.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Some typos, comments, whitespace errors corrected in the cpuid code
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Two dimensional paging is only confused by it.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This actually describes what is going on, rather than alerting the reader
that something strange is going on.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Zeroing on mmu_memory_cache_alloc is unnecessary since:
- Smaller areas are pre-allocated with kmem_cache_zalloc.
- Page pointed by ->spt is overwritten with prefetch_page
and entries in page pointed by ->gfns are initialized
before reading.
[avi: zeroing pages is unnecessary]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
While the PIT is masked the guest cannot ack the irq, so the reinject logic
will never allow the interrupt to be injected.
Fix by resetting the reinjection counters on unmask.
Unbreaks Xen.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Two KVM archs support irqchips and two don't. Add a Kconfig item to
make selecting between the two models easier.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Using kvm_mmu_lookup_page() will result in multiple scans of the hash chains;
use hlist_for_each_entry_safe() to achieve a single scan instead.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
VMware ESX checks if the microcode level is correct when using a barcelona
CPU, in order to see if it actually can use SVM. Let's tell it we're on the
safe side...
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Otherwise, two threads can create a PIT in parallel and cause a memory leak.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
unnecessarily
If we aren't doing mmio there's no need to exit to userspace (which will
just be confused).
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Allow emulate_pop() to read into arbitrary memory rather than just the
source operand. Needed for complicated instructions like far returns.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
If we've just emulated an instruction, we won't have any valid exit
reason and associated information.
Fix by moving the clearing of the emulation_required flag to the exit handler.
This way the exit handler can notice that we've been emulating and abort
early.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The ususable bit is important for determining state validity; don't
clobber it.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
The vmx guest state validity checks are full of bugs. Make them
conform to the manual.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This is an x86 specific stucture and has no business living in common code.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Certain clocks (such as TSC) in older 2.6 guests overaccount for lost
ticks, causing severe time drift. Interrupt reinjection magnifies the
problem.
Provide an option to disable it.
[avi: allow room for expansion in case we want to disable reinjection
of other timers]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
Since we advertise MSR_VM_HSAVE_PA, userspace will attempt to read it
even on Intel. Implement fake support for this MSR to avoid the
warnings.
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
vmap() on guest pages hides those pages from the Linux mm for an extended
(userspace determined) amount of time. Get rid of it.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
This commit change the name of emulator_read_std into kvm_read_guest_virt,
and add new function name kvm_write_guest_virt that allow writing into a
guest virtual address.
Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
VMX initializes the TSC offset for each vcpu at different times, and
also reinitializes it for vcpus other than 0 on APIC SIPI message.
This bug causes the TSC's to appear unsynchronized in the guest, even if
the host is good.
Older Linux kernels don't handle the situation very well, so
gettimeofday is likely to go backwards in time:
http://www.mail-archive.com/kvm@vger.kernel.org/msg02955.html
http://sourceforge.net/tracker/index.php?func=detail&aid=2025534&group_id=180599&atid=893831
Fix it by initializating the offset of each vcpu relative to vm creation
time, and moving it from vmx_vcpu_reset to vmx_vcpu_setup, out of the
APIC MP init path.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
|
|
No longer used.
Signed-off-by: Avi Kivity <avi@redhat.com>
|