Age | Commit message (Collapse) | Author |
|
Converting a frame number to an address is tricky since the data type changes
size. Introduce a function to do it. This fixes an actual bug when
accessing guest ptes.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Since set_pte() is now the only caller of set_pte_common(), merge the two
functions.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
It is now identical to set_pte().
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Similar information is available in the gfn parameter, so use that.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Instead of repretitively open-coding this.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
I spent an hour worrying why I see so many guest page faults on FC6 i386.
Turns out bypass wasn't implemented for nonpae. Implement it so it doesn't
happen again.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Improve dirty bit setting for pages that kvm release, until now every page
that we released we marked dirty, from now only pages that have potential
to get dirty we mark dirty.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Things are simpler and more regular this way.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This allows guest memory to be swapped. Pages which are currently mapped
via shadow page tables are pinned into memory, but all other pages can
be freely swapped.
The patch makes gfn_to_page() elevate the page's reference count, and
introduces kvm_release_page() that pairs with it.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
In case the page is not present in the guest memory map, return a dummy
page the guest can scribble on.
This simplifies error checking in its users.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Simplify the walker level loop not to carry so much information from one
loop to the next. In addition to being complex, this made kmap_atomic()
critical sections difficult to manage.
As a result of this change, kmap_atomic() sections are limited to actually
touching the guest pte, which allows the other functions called from the
walker to do sleepy operations. This will happen when we enable swapping.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Since the mmu uses different shadow pages for dirty large pages and clean
large pages, this allows the mmu to drop ptes that are now invalid.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
By forcing clean huge pages to be read-only, we have separate roles
for the shadow of a clean large page and the shadow of a dirty large
page. This is necessary because different ptes will be instantiated
for the two cases, even for read faults.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
We must set the bit before the shift, otherwise the wrong bit gets set.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This is more consistent with the accessed bit management, and makes the dirty
bit available earlier for other purposes.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This time, the biggest change is gpa_to_hpa. The translation of GPA to HPA does
not depend on the VCPU state unlike GVA to GPA so there's no need to pass in
the kvm_vcpu.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Some of the MMU functions take a struct kvm_vcpu even though they affect all
VCPUs. This patch cleans up some of them to instead take a struct kvm. This
makes things a bit more clear.
The main thing that was confusing me was whether certain functions need to be
called on all VCPUs.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Mike D. Day <ncmike@ncultra.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
When kvm uses user-allocated pages in the future for the guest, we won't
be able to use page->private for rmap, since page->rmap is reserved for
the filesystem. So we move the rmap base pointers to the memory slot.
A side effect of this is that we need to store the gfn of each gpte in
the shadow pages, since the memory slot is addressed by gfn, instead of
hfn like struct page.
Signed-off-by: Izik Eidus <izik@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
When we allow guest page faults to reach the guests directly, we lose
the fault tracking which allows us to detect demand paging. So we provide
an alternate mechnism by clearing the accessed bit when we set a pte, and
checking it later to see if the guest actually used it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
There are two classes of page faults trapped by kvm:
- host page faults, where the fault is needed to allow kvm to install
the shadow pte or update the guest accessed and dirty bits
- guest page faults, where the guest has faulted and kvm simply injects
the fault back into the guest to handle
The second class, guest page faults, is pure overhead. We can eliminate
some of it on vmx using the following evil trick:
- when we set up a shadow page table entry, if the corresponding guest pte
is not present, set up the shadow pte as not present
- if the guest pte _is_ present, mark the shadow pte as present but also
set one of the reserved bits in the shadow pte
- tell the vmx hardware not to trap faults which have the present bit clear
With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.
Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code. It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch just renames the current (misnamed) _arch namings to _x86 to
ensure better readability when a real arch layer takes place.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
gfn_to_page might sleep with swap support. Move it out of the kmap calls.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
The kernel now has asm/cpu-features.h: use those macros instead of inventing
our own.
Also spell out definition of CR3_RESEVED_BITS, fix spelling and
tighten it for the non-PAE case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
We need to distinguish between large page shadows which have the nx bit set
and those which don't. The problem shows up when booting a newer smp Linux
kernel, where the trampoline page (which is in real mode, which uses the
same shadow pages as large pages) is using the same mapping as a kernel data
page, which is mapped using nx, causing kvm to spin on that page.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This has not been used for some time, as the same information is available
in the page header.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This was once used to avoid accessing the guest pte when upgrading
the shadow pte from read-only to read-write. But usually we need
to set the guest pte dirty or accessed bits anyway, so this wasn't
really exploited.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Always set the accessed and dirty bit (since having them cleared causes
a read-modify-write cycle), always set the present bit, and copy the
nx bit from the guest.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
No longer needed as we do everything in one place.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
With guest smp, a second vcpu might see partial updates when the first
vcpu services a page fault. So delay all updates until we have figured
out what the pte should look like.
Note that on i386, this is still not completely atomic as a 64-bit write
will be split into two on a 32-bit machine.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
We want all shadow pte modifications in one place.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This prevents some work from being performed twice, and, more importantly,
reduces the number of places where we modify shadow ptes.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
We will need the accessed bit (in addition to the dirty bit) and
also write access (for setting the dirty bit) in a future patch.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
In preparation of some modifications.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Simpifies things a bit.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
A typical demand page/copy on write pattern is:
- page fault on vaddr
- kvm propagates fault to guest
- guest handles fault, updates pte
- kvm traps write, clears shadow pte, resumes guest
- guest returns to userspace, re-faults on same vaddr
- kvm installs shadow pte, resumes guest
- guest continues
So, three vmexits for a single guest page fault. But if instead of clearing
the page table entry, we update to correspond to the value that the guest
has just written, we eliminate the third vmexit.
This patch does exactly that, reducing kbuild time by about 10%.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
The kvm mmu tries to detects forks by looking for repeated writes to a
page table. If it sees a fork, it unshadows the page table so the page
table copying can proceed at native speed instead of being emulated.
However, the detector also triggered on simple demand paging access patterns:
a linear walk of memory would of course cause repeated writes to the same
pagetable page, causing it to unshadow prematurely.
Fix by resetting the fork detector if we detect a demand fault.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Make the exit statistics per-vcpu instead of global. This gives a 3.5%
boost when running one virtual machine per core on my two socket dual core
(4 cores total) machine.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map
the same physical address, they share the same shadow page. This is a fairly
common case (kernel mappings on i386 nonpae Linux, for example).
However, if the two pdes map the same memory but with different permissions, kvm
will happily use the cached shadow page. If the access through the more
permissive pde will occur after the access to the strict pde, an endless pagefault
loop will be generated and the guest will make no progress.
Fix by making the access permissions part of the cache lookup key.
The fix allows Xen pae to boot on kvm and run guest domains.
Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
We already special case the pdptr access, so no need to check it again.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|