Age | Commit message (Collapse) | Author |
|
With the sg table code, every SCSI driver is now either chain capable
or broken (or has sg_tablesize set so chaining is never activated), so
there's no need to have a check in the host template.
Also tidy up the code by moving the scatterlist size defines into the
SCSI includes and permit the last entry of the scatterlist pools not
to be a power of two.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
|
At the block level bidi request uses req->next_rq pointer for a second
bidi_read request.
At Scsi-midlayer a second scsi_data_buffer structure is used for the
bidi_read part. This bidi scsi_data_buffer is put on
request->next_rq->special. Struct scsi_cmnd is not changed.
- Define scsi_bidi_cmnd() to return true if it is a bidi request and a
second sgtable was allocated.
- Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer
from this command This API is to isolate users from the mechanics of
bidi.
- Define scsi_end_bidi_request() to do what scsi_end_request() does but
for a bidi request. This is necessary because bidi commands are a bit
tricky here. (See comments in body)
- scsi_release_buffers() will also release the bidi_read scsi_data_buffer
- scsi_io_completion() on bidi commands will now call
scsi_end_bidi_request() and return.
- The previous work done in scsi_init_io() is now done in a new
scsi_init_sgtable() (which is 99% identical to old scsi_init_io())
The new scsi_init_io() will call the above twice if needed also for
the bidi_read command. Only at this point is a command bidi.
- In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not
confused by a get-sense command that looks like bidi. This is done
by puting NULL at request->next_rq, and restoring.
[jejb: update to sg_table and resolve conflicts
also update to blk-end-request and resolve conflicts]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
|
In preparation for bidi we abstract all IO members of scsi_cmnd,
that will need to duplicate, into a substructure.
- Group all IO members of scsi_cmnd into a scsi_data_buffer
structure.
- Adjust accessors to new members.
- scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of
scsi_cmnd. And work on it.
- Adjust scsi_init_io() and scsi_release_buffers() for above
change.
- Fix other parts of scsi_lib/scsi.c to members migration. Use
accessors where appropriate.
- fix Documentation about scsi_cmnd in scsi_host.h
- scsi_error.c
* Changed needed members of struct scsi_eh_save.
* Careful considerations in scsi_eh_prep/restore_cmnd.
- sd.c and sr.c
* sd and sr would adjust IO size to align on device's block
size so code needs to change once we move to scsi_data_buff
implementation.
* Convert code to use scsi_for_each_sg
* Use data accessors where appropriate.
- tgt: convert libsrp to use scsi_data_buffer
- isd200: This driver still bangs on scsi_cmnd IO members,
so need changing
[jejb: rebased on top of sg_table patches fixed up conflicts
and used the synergy to eliminate use_sg and sg_count]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
|
If we export scsi_init_io()/scsi_release_buffers() instead of
scsi_{alloc,free}_sgtable() from scsi_lib than tgt code is much more
insulated from scsi_lib changes. As a bonus it will also gain bidi
capability when it comes.
[jejb: rebase on to sg_table and fix up rejections]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
|
Migrating the apic timer in the critical section is not very nice, and is
absolutely horrible with the real-time port. Move migration to the regular
vcpu execution path, triggered by a new bitflag.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
kvm_para.h potentially contains definitions that are to be used by userspace,
so it should not be included inside the __KERNEL__ block. To protect its own
data structures, kvm_para.h already includes its own __KERNEL__ block.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Acked-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch moves kvm_fpu asm-x86/kvm.h to allow every architecture to
define an own representation used for KVM_GET_FPU/KVM_SET_FPU.
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Acked-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
When executing a test program called "crashme", we found the KVM guest cannot
survive more than ten seconds, then encounterd kernel panic. The basic concept
of "crashme" is generating random assembly code and trying to execute it.
After some fixes on emulator insn validity judgment, we found it's hard to
get the current emulator handle the invalid instructions correctly, for the
#UD trap for hypercall patching caused troubles. The problem is, if the opcode
itself was OK, but combination of opcode and modrm_reg was invalid, and one
operand of the opcode was memory (SrcMem or DstMem), the emulator will fetch
the memory operand first rather than checking the validity, and may encounter
an error there. For example, ".byte 0xfe, 0x34, 0xcd" has this problem.
In the patch, we simply check that if the invalid opcode wasn't vmcall/vmmcall,
then return from emulate_instruction() and inject a #UD to guest. With the
patch, the guest had been running for more than 12 hours.
Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Convert the synchronization of the shadow handling to a separate mmu_lock
spinlock.
Also guard fetch() by mmap_sem in read-mode to protect against alias
and memslot changes.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Since gfn_to_page() is a sleeping function, and we want to make the core mmu
spinlocked, we need to pass the page from the walker context (which can sleep)
to the shadow context (which cannot).
[marcelo: avoid recursive locking of mmap_sem]
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
In preparation for a mmu spinlock, add kvm_read_guest_atomic()
and use it in fetch() and prefetch_page().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
FlexPriority accelerates the tpr without any patching.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This adds a mechanism for exposing the virtual apic tpr to the guest, and a
protocol for letting the guest update the tpr without causing a vmexit if
conditions allow (e.g. there is no interrupt pending with a higher priority
than the new tpr).
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Add a facility to report on accesses to the local apic tpr even if the
local apic is emulated in the kernel. This is basically a hack that
allows userspace to patch Windows which tends to bang on the tpr a lot.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
IA64 also needs to see ioapic structure in irqchip.
Signed-off-by: xiantao.zhang@intel.com <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Moving kvm_vcpu_kick() to x86.c. Since it should be
common for all archs, put its declarations in <linux/kvm_host.h>
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This paves the way for multiple architecture support. Note that while
ioapic.c could potentially be shared with ia64, it is also moved.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
includes, not existing. Rather than add a zillion <asm/kvm.h>s, export kvm.h
only if the arch actually supports it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch fixes a small issue where sturctures:
kvm_pic_state
kvm_ioapic_state
are defined inside x86 specific code and may or may not
be defined in anyway for other architectures. The problem
caused is one cannot compile userspace apps (ex. libkvm)
for other archs since a size cannot be determined for these
structures.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
The current cpuid management suffers from several problems, which inhibit
passing through the host feature set to the guest:
- No way to tell which features the host supports
While some features can be supported with no changes to kvm, others
need explicit support. That means kvm needs to vet the feature set
before it is passed to the guest.
- No support for indexed or stateful cpuid entries
Some cpuid entries depend on ecx as well as on eax, or on internal
state in the processor (running cpuid multiple times with the same
input returns different output). The current cpuid machinery only
supports keying on eax.
- No support for save/restore/migrate
The internal state above needs to be exposed to userspace so it can
be saved or migrated.
This patch adds extended cpuid support by means of three new ioctls:
- KVM_GET_SUPPORTED_CPUID: get all cpuid entries the host (and kvm)
supports
- KVM_SET_CPUID2: sets the vcpu's cpuid table
- KVM_GET_CPUID2: gets the vcpu's cpuid table, including hidden state
[avi: fix original KVM_SET_CPUID not removing nx on non-nx hosts as it did
before]
Signed-off-by: Dan Kenigsberg <danken@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch moves structures:
kvm_cpuid_entry
kvm_cpuid
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Move structures:
kvm_sregs
kvm_msr_entry
kvm_msrs
kvm_msr_list
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch moves structures:
kvm_segment
kvm_dtable
from include/linux/kvm.h to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch moves structure lapic_state from include/linux/kvm.h
to include/asm-x86/kvm.h
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch moves structure kvm_regs to include/asm-x86/kvm.h.
Each architecture will need to create there own version of this
structure.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch moves structures:
kvm_pic_state
kvm_ioapic_state
to inclue/asm-x86/kvm.h.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch moves sturct kvm_memory_alias from include/linux/kvm.h
to include/asm-x86/kvm.h. Also have include/linux/kvm.h include
include/asm/kvm.h.
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Currently kvm has a wart in that it requires three extra pages for use
as a tss when emulating real mode on Intel. This patch moves the allocation
internally, only requiring userspace to tell us where in the physical address
space we can place the tss.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Currently kvm provides hypercalls only for x86* architectures. To
provide hypercall infrastructure for other kvm architectures I split
kvm_para.h into a generic header file and architecture specific
definitions.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Instead of having the kernel allocate memory to the guest, let userspace
allocate it and pass the address to the kernel.
This is required for s390 support, but also enables features like memory
sharing and using hugetlbfs backed memory.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
The user is now able to set how many mmu pages will be allocated to the guest.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
This patch refactors the current hypercall infrastructure to better
support live migration and SMP. It eliminates the hypercall page by
trapping the UD exception that would occur if you used the wrong hypercall
instruction for the underlying architecture and replacing it with the right
one lazily.
A fall-out of this patch is that the unhandled hypercalls no longer trap to
userspace. There is very little reason though to use a hypercall to
communicate with userspace as PIO or MMIO can be used. There is no code
in tree that uses userspace hypercalls.
[avi: fix #ud injection on vmx]
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
|
|
Use a standard list threaded through page->lru for maintaining the pgd
list on PAE. This is the same as 64-bit, and seems saner than using a
non-standard list via page->index.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
PAE mode requires that we reload cr3 in order to guarantee that
changes to the pgd will be noticed by the processor. This means that
in principle pud_clear needs to reload cr3 every time. However,
because reloading cr3 implies a tlb flush, we want to avoid it where
possible.
pud_clear() is only used in a couple of places:
- in free_pmd_range(), when pulling down a range of process address space, and
- huge_pmd_unshare()
In both cases, the calling code will do a a tlb flush anyway, so
there's no need to do it within pud_clear().
In free_pmd_range(), the pud_clear is immediately followed by
pmd_free_tlb(); we can hook that to make the mmu_gather do an
unconditional full flush to make sure cr3 gets reloaded.
In huge_pmd_unshare, it is followed by flush_tlb_range, which always
results in a full cr3-reload tlb flush.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
This patch adds a new configuration option, which adds support for a new
early_param which gets checked in arch/x86/kernel/setup_{32,64}.c:setup_arch()
to decide wether OHCI-1394 FireWire controllers should be initialized and
enabled for physical DMA access to allow remote debugging of early problems
like issues ACPI or other subsystems which are executed very early.
If the config option is not enabled, no code is changed, and if the boot
paramenter is not given, no new code is executed, and independent of that,
all new code is freed after boot, so the config option can be even enabled
in standard, non-debug kernels.
With specialized tools, it is then possible to get debugging information
from machines which have no serial ports (notebooks) such as the printk
buffer contents, or any data which can be referenced from global pointers,
if it is stored below the 4GB limit and even memory dumps of of the physical
RAM region below the 4GB limit can be taken without any cooperation from the
CPU of the host, so the machine can be crashed early, it does not matter.
In the extreme, even kernel debuggers can be accessed in this way. I wrote
a small kgdb module and an accompanying gdb stub for FireWire which allows
to gdb to talk to kgdb using remote remory reads and writes over FireWire.
An version of the gdb stub fore FireWire is able to read all global data
from a system which is running a a normal kernel without any kernel debugger,
without any interruption or support of the system's CPU. That way, e.g. the
task struct and so on can be read and even manipulated when the physical DMA
access is granted.
A HOWTO is included in this patch, in Documentation/debugging-via-ohci1394.txt
and I've put a copy online at
ftp://ftp.suse.de/private/bk/firewire/docs/debugging-via-ohci1394.txt
It also has links to all the tools which are available to make use of it
another copy of it is online at:
ftp://ftp.suse.de/private/bk/firewire/kernel/ohci1394_dma_early-v2.diff
Signed-Off-By: Bernhard Kaindl <bk@suse.de>
Tested-By: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In x86 PAE mode, stop treating pmds as a special case. Previously
they were always allocated and freed with the pgd. The modifies the
code to be the same as 64-bit mode, where they are allocated on
demand.
This is a step on the way to unifying 32/64-bit pagetable allocation
as much as possible.
There is a complicating wart, however. When you install a new
reference to a pmd in the pgd, the processor isn't guaranteed to see
it unless you reload cr3. Since reloading cr3 also has the
side-effect of flushing the tlb, this is an expense that we want to
avoid whereever possible.
This patch simply avoids reloading cr3 unless the update is to the
current pagetable. Later patches will optimise this further.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: William Irwin <wli@holomorphy.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Yes! A mere 120 c_p_a() fixing and rewriting patches later,
we are now confident that we can enable UC by default for
ioremap(), on x86 too.
Every other architectures was doing this already. Doing so
makes Linux more robust against MTRR mixups (which might go
unnoticed if BIOS writers test other OSs only - where PAT
might override bad MTRRs defaults).
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
only present ptes must be flushed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd.
I also cleaned up the NX test code quite a bit, and got rid of the ugly
exception table sorting stuff.
From: Arjan van de Ven <arjan@linux.intel.com>
This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option
as well as the NX CPU feature/mappings. Both testcases can move to tests/
once that patch gets merged into mainline.
(I'm half considering moving the rodata test into mm/init.c but I'll
wait with that until init.c is unified)
As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h
for the RODATA sections, which lead to 1 page less being marked read only.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The set_memory_* and set_pages_* family of API's currently requires the
callers to do a global tlb flush after the function call; forgetting this is
a very nasty deathtrap. This patch moves the global tlb flush into
each of the callers
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
change_page_attr_add is only used in pageattr.c now, so we can
make this function static.
change_page_attr() isn't used anywere at all anymore; this function
is a really bad API anyway so just remove the bloat entirely.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
page_is_ram has a FIXME since ages, which reminds to sanity check the
BIOS area between 640k and 1M, which is sometimes falsely reported as
RAM in the e820 tables.
Implement the sanity check. Move the BIOS range defines from
pageattr.c into e820.h to avoid duplicate defines.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|