aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-06-12Merge git://git.infradead.org/~dwmw2/firmware-2.6Linus Torvalds
* git://git.infradead.org/~dwmw2/firmware-2.6: firmware: speed up request_firmware(), v3
2009-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Remove lock_kernel from gfs2_put_super() GFS2: Add tracepoints
2009-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguestLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits) lguest: add support for indirect ring entries lguest: suppress notifications in example Launcher lguest: try to batch interrupts on network receive lguest: avoid sending interrupts to Guest when no activity occurs. lguest: implement deferred interrupts in example Launcher lguest: remove obsolete LHREQ_BREAK call lguest: have example Launcher service all devices in separate threads lguest: use eventfds for device notification eventfd: export eventfd_signal and eventfd_fget for lguest lguest: allow any process to send interrupts lguest: PAE fixes lguest: PAE support lguest: Add support for kvm_hypercall4() lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated lguest: map switcher with executable page table entries lguest: fix writev returning short on console output lguest: clean up length-used value in example launcher lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition. lguest: beyond ARRAY_SIZE of cpu->arch.gdt ...
2009-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-virtioLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-virtio: virtio: enhance id_matching for virtio drivers virtio: fix id_matching for virtio drivers virtio: handle short buffers in virtio_rng. virtio_blk: add missing __dev{init,exit} markings virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC) virtio: teach virtio_has_feature() about transport features virtio: expose features in sysfs virtio_pci: optional MSI-X support virtio_pci: split up vp_interrupt virtio: find_vqs/del_vqs virtio operations virtio: add names to virtqueue struct, mapping from devices to queues. virtio: meet virtio spec by finalizing features before using device virtio: fix obsolete documentation on probe function
2009-06-12Merge branch 'cuse' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'cuse' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: CUSE: implement CUSE - Character device in Userspace fuse: export symbols to be used by CUSE fuse: update fuse_conn_init() and separate out fuse_conn_kill() fuse: don't use inode in fuse_file_poll fuse: don't use inode in fuse_do_ioctl() helper fuse: don't use inode in fuse_sync_release() fuse: create fuse_do_open() helper for CUSE fuse: clean up args in fuse_finish_open() and fuse_release_fill() fuse: don't use inode in helpers called by fuse_direct_io() fuse: add members to struct fuse_file fuse: prepare fuse_direct_io() for CUSE fuse: clean up fuse_write_fill() fuse: use struct path in release structure fuse: misc cleanups
2009-06-12Merge ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param: module: cleanup FIXME comments about trimming exception table entries. module: trim exception table on init free. module: merge module_alloc() finally uml module: fix uml build process due to this merge x86 module: merge the rest functions with macros x86 module: merge the same functions in module_32.c and module_64.c uvesafb: improve parameter handling. module_param: allow 'bool' module_params to be bool, not just int. module_param: add __same_type convenience wrapper for __builtin_types_compatible_p module_param: split perm field into flags and perm module_param: invbool should take a 'bool', not an 'int' cyber2000fb.c: use proper method for stopping unload if CONFIG_ARCH_SHARK
2009-06-12Merge branch 'for-2.6.31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (29 commits) ide: re-implement ide_pci_init_one() on top of ide_pci_init_two() ide: unexport ide_find_dma_mode() ide: fix PowerMac bootup oops ide: skip probe if there are no devices on the port (v2) sl82c105: add printk() logging facility ide-tape: fix proc warning ide: add IDE_DFLAG_NIEN_QUIRK device flag ide: respect quirk_drives[] list on all controllers hpt366: enable all quirks for devices on quirk_drives[] list hpt366: sync quirk_drives[] list with pdc202xx_{new,old}.c ide: remove superfluous SELECT_MASK() call from do_rw_taskfile() ide: remove superfluous SELECT_MASK() call from ide_driveid_update() icside: remove superfluous ->maskproc method ide-tape: fix IDE_AFLAG_* atomic accesses ide-tape: change IDE_AFLAG_IGNORE_DSC non-atomically pdc202xx_old: kill resetproc() method pdc202xx_old: don't call pdc202xx_reset() on IRQ timeout pdc202xx_old: use ide_dma_test_irq() ide: preserve Host Protected Area by default (v2) ide-gd: implement block device ->set_capacity method (v2) ...
2009-06-12Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Provide _sdata in the vmlinux.lds.S file x86: handle initrd that extends into unusable memory
2009-06-12slab: setup cpu caches later on when interrupts are enabledPekka Enberg
Fixes the following boot-time warning: [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x56/0x1bc() [ 0.000000] Hardware name: [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #492 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff8149e021>] ? _spin_unlock+0x4f/0x5c [ 0.000000] [<ffffffff8108f11b>] ? smp_call_function_many+0x56/0x1bc [ 0.000000] [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9 [ 0.000000] [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16 [ 0.000000] [<ffffffff8108f11b>] smp_call_function_many+0x56/0x1bc [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff8108f2be>] smp_call_function+0x3d/0x68 [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff81066fd8>] on_each_cpu+0x31/0x7c [ 0.000000] [<ffffffff810f64f5>] do_tune_cpucache+0x119/0x454 [ 0.000000] [<ffffffff81087080>] ? lockdep_init_map+0x94/0x10b [ 0.000000] [<ffffffff818133b0>] ? kmem_cache_init+0x421/0x593 [ 0.000000] [<ffffffff810f69cf>] enable_cpucache+0x68/0xad [ 0.000000] [<ffffffff818133c3>] kmem_cache_init+0x434/0x593 [ 0.000000] [<ffffffff8180987c>] ? mem_init+0x156/0x161 [ 0.000000] [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9 [ 0.000000] [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae [ 0.000000] [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12slab,slub: don't enable interrupts during early bootPekka Enberg
As explained by Benjamin Herrenschmidt: Oh and btw, your patch alone doesn't fix powerpc, because it's missing a whole bunch of GFP_KERNEL's in the arch code... You would have to grep the entire kernel for things that check slab_is_available() and even then you'll be missing some. For example, slab_is_available() didn't always exist, and so in the early days on powerpc, we used a mem_init_done global that is set form mem_init() (not perfect but works in practice). And we still have code using that to do the test. Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators in early boot code to avoid enabling interrupts. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12slab: fix gfp flag in setup_cpu_cache()Pekka Enberg
Fixes the following warning during bootup when compiling with CONFIG_SLAB: [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/lockdep.c:2282 lockdep_trace_alloc+0x91/0xb9() [ 0.000000] Hardware name: [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #491 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff81087d84>] ? lockdep_trace_alloc+0x91/0xb9 [ 0.000000] [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9 [ 0.000000] [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16 [ 0.000000] [<ffffffff81087d84>] lockdep_trace_alloc+0x91/0xb9 [ 0.000000] [<ffffffff810f5b03>] kmem_cache_alloc_node_notrace+0x26/0xdf [ 0.000000] [<ffffffff81487f4e>] ? setup_cpu_cache+0x7e/0x210 [ 0.000000] [<ffffffff81487fe3>] setup_cpu_cache+0x113/0x210 [ 0.000000] [<ffffffff810f73ff>] kmem_cache_create+0x409/0x486 [ 0.000000] [<ffffffff818131c1>] kmem_cache_init+0x232/0x593 [ 0.000000] [<ffffffff8180987c>] ? mem_init+0x156/0x161 [ 0.000000] [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9 [ 0.000000] [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae [ 0.000000] [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-06-12[SCSI] Merge branch 'linus'James Bottomley
Conflicts: drivers/message/fusion/mptsas.c fixed up conflict between req->data_len accessors and mptsas driver updates. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-12lguest: add support for indirect ring entriesMark McLoughlin
Support the VIRTIO_RING_F_INDIRECT_DESC feature. This is a simple matter of changing the descriptor walking code to operate on a struct vring_desc* and supplying it with an indirect table if detected. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: suppress notifications in example LauncherRusty Russell
The Guest only really needs to tell us about activity when we're going to listen to the eventfd: normally, we don't want to know. So if there are no available buffers, turn on notifications, re-check, then wait for the Guest to notify us via the eventfd, then turn notifications off again. There's enough else going on that the differences are in the noise. Before: Secs RxKicks TxKicks 1G TCP Guest->Host: 3.94 4686 32815 1M normal pings: 104 142862 1000010 1M 1k pings (-l 120): 57 142026 1000007 After: 1G TCP Guest->Host: 3.76 4691 32811 1M normal pings: 111 142859 997467 1M 1k pings (-l 120): 55 19648 501549 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: try to batch interrupts on network receiveRusty Russell
Rather than triggering an interrupt every time, we only trigger an interrupt when there are no more incoming packets (or the recv queue is full). However, the overhead of doing the select to figure this out is measurable: 1M pings goes from 98 to 104 seconds, and 1G Guest->Host TCP goes from 3.69 to 3.94 seconds. It's close to the noise though. I tested various timeouts, including reducing it as the number of pending packets increased, timing a 1 gigabyte TCP send from Guest -> Host and Host -> Guest (GSO disabled, to increase packet rate). // time tcpblast -o -s 65536 -c 16k 192.168.2.1:9999 > /dev/null Timeout Guest->Host Pkts/irq Host->Guest Pkts/irq Before 11.3s 1.0 6.3s 1.0 0 11.7s 1.0 6.6s 23.5 1 17.1s 8.8 8.6s 26.0 1/pending 13.4s 1.9 6.6s 23.8 2/pending 13.6s 2.8 6.6s 24.1 5/pending 14.1s 5.0 6.6s 24.4 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: avoid sending interrupts to Guest when no activity occurs.Rusty Russell
If we track how many buffers we've used, we can tell whether we really need to interrupt the Guest. This happens as a side effect of spurious notifications. Spurious notifications happen because it can take a while before the Host thread wakes up and sets the VRING_USED_F_NO_NOTIFY flag, and meanwhile the Guest can more notifications. A real fix would be to use wake counts, rather than a suppression flag, but the practical difference is generally in the noise: the interrupt is usually coalesced into a pending one anyway so we just save a system call which isn't clearly measurable. Secs Spurious IRQS 1G TCP Guest->Host: 3.93 58 1M normal pings: 100 72 1M 1k pings (-l 120): 57 492904 Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: implement deferred interrupts in example LauncherRusty Russell
Rather than sending an interrupt on every buffer, we only send an interrupt when we're about to wait for the Guest to send us a new one. The console input and network input still send interrupts manually, but the block device, network and console output queues can simply rely on this logic to send interrupts to the Guest at the right time. The patch is cluttered by moving trigger_irq() higher in the code. In practice, two factors make this optimization less interesting: (1) we often only get one input at a time, even for networking, (2) triggering an interrupt rapidly tends to get coalesced anyway. Before: Secs RxIRQS TxIRQs 1G TCP Guest->Host: 3.72 32784 32771 1M normal pings: 99 1000004 995541 100,000 1k pings (-l 120): 5 49510 49058 After: 1G TCP Guest->Host: 3.69 32809 32769 1M normal pings: 99 1000004 996196 100,000 1k pings (-l 120): 5 52435 52361 (Note the interrupt count on 100k pings goes *up*: see next patch). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: remove obsolete LHREQ_BREAK callRusty Russell
We no longer need an efficient mechanism to force the Guest back into host userspace, as each device is serviced without bothering the main Guest process (aka. the Launcher). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: have example Launcher service all devices in separate threadsRusty Russell
Currently lguest has three threads: the main Launcher thread, a Waker thread, and a thread for the block device (because synchronous block was simply too painful to bear). The Waker selects() on all the input file descriptors (eg. stdin, net devices, pipe to the block thread) and when one becomes readable it calls into the kernel to kick the Launcher thread out into userspace, which repeats the poll, services the device(s), and then tells the kernel to release the Waker before re-entering the kernel to run the Guest. Also, to make a slightly-decent network transmit routine, the Launcher would suppress further network interrupts while it set a timer: that signal handler would write to a pipe, which would rouse the Waker which would prod the Launcher out of the kernel to check the network device again. Now we can convert all our virtqueues to separate threads: each one has a separate eventfd for when the Guest pokes the device, and can trigger interrupts in the Guest directly. The linecount shows how much this simplifies, but to really bring it home, here's an strace analysis of single Guest->Host ping before: * Guest sends packet, notifies xmit vq, return control to Launcher * Launcher clears notification flag on xmit ring * Launcher writes packet to TUN device writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108 * Launcher sets up interrupt for Guest (xmit ring is empty) write(10, "\2\0\0\0\3\0\0\0", 8) = 0 * Launcher sets up timer for interrupt mitigation setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0 * Launcher re-runs guest pread64(10, 0xbfa5f4d4, 4, 0) ... * Waker notices reply packet in tun device (it was in select) select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4]) * Waker kicks Launcher out of guest: pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0 * Launcher returns from running guest: ... = -1 EAGAIN (Resource temporarily unavailable) * Launcher looks at input fds: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0}) * Launcher reads pong from tun device: readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108 * Launcher injects guest notification: write(10, "\2\0\0\0\2\0\0\0", 8) = 0 * Launcher rechecks fds: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout) * Launcher clears Waker: pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0 * Launcher reruns Guest: pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted) * Signal comes in, uses pipe to wake up Launcher: --- SIGALRM (Alarm clock) @ 0 (0) --- write(8, "\0", 1) = 1 sigreturn() = ? (mask now []) * Waker sees write on pipe: select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6]) * Waker kicks Launcher out of Guest: pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0 * Launcher exits from kernel: pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable) * Launcher looks to see what fd woke it: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0}) * Launcher reads timeout fd, sets notification flag on xmit ring read(6, "\0", 32) = 1 * Launcher rechecks fds: select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout) * Launcher clears Waker: pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0 * Launcher resumes Guest: pread64(10, "\0p\0\4", 4, 0) .... strace analysis of single Guest->Host ping after: * Guest sends packet, notifies xmit vq, creates event on eventfd. * Network xmit thread wakes from read on eventfd: read(7, "\1\0\0\0\0\0\0\0", 8) = 8 * Network xmit thread writes packet to TUN device writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108 * Network recv thread wakes up from read on tunfd: readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108 * Network recv thread sets up interrupt for the Guest write(6, "\2\0\0\0\2\0\0\0", 8) = 0 * Network recv thread goes back to reading tunfd 13:39:42.460285 readv(4, <unfinished ...> * Network xmit thread sets up interrupt for Guest (xmit ring is empty) write(6, "\2\0\0\0\3\0\0\0", 8) = 0 * Network xmit thread goes back to reading from eventfd read(7, <unfinished ...> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: use eventfds for device notificationRusty Russell
Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with an address: the main Launcher process returns with this address, and figures out what device to run. A far nicer model is to let processes bind an eventfd to an address: if we find one, we simply signal the eventfd. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Davide Libenzi <davidel@xmailserver.org>
2009-06-12eventfd: export eventfd_signal and eventfd_fget for lguestRusty Russell
lguest wants to attach eventfds to guest notifications, and lguest is usually a module. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> To: Davide Libenzi <davidel@xmailserver.org>
2009-06-12lguest: allow any process to send interruptsRusty Russell
We currently only allow the Launcher process to send interrupts, but it as we already send interrupts from the hrtimer, it's a simple matter of extracting that code into a common set_interrupt routine. As we switch to a thread per virtqueue, this avoids a bottleneck through the main Launcher process. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: PAE fixesRusty Russell
1) j wasn't initialized in setup_pagetables, so they weren't set up for me causing immediate guest crashes. 2) gpte_addr should not re-read the pmd from the Guest. Especially not BUG_ON() based on the value. If we ever supported SMP guests, they could trigger that. And the Launcher could also trigger it (tho currently root-only). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: PAE supportMatias Zabaljauregui
This version requires that host and guest have the same PAE status. NX cap is not offered to the guest, yet. Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: Add support for kvm_hypercall4()Matias Zabaljauregui
Add support for kvm_hypercall4(); PAE wants it. Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGDMatias Zabaljauregui
replace LHCALL_SET_PMD with LHCALL_SET_PGD hypercall name (That's really what it is, and the confusion gets worse with PAE support) Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reported-by: Jeremy Fitzhardinge <jeremy@goop.org>
2009-06-12lguest: use native_set_* macros, which properly handle 64-bit entries when ↵Matias Zabaljauregui
PAE is activated Some cleanups and replace direct assignment with native_set_* macros which properly handle 64-bit entries when PAE is activated Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: map switcher with executable page table entriesMatias Zabaljauregui
Map switcher with executable page table entries. (This bug didn't matter before PAE and hence NX support -- RR) Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: fix writev returning short on console outputRusty Russell
I've never seen it here, but I can't find anywhere that says writev will write everything. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: clean up length-used value in example launcherRusty Russell
The "len" field in the used ring for virtio indicates the number of bytes *written* to the buffer. This means the guest doesn't have to zero the buffers in advance as it always knows the used length. Erroneously, the console and network example code puts the length *read* into that field. The guest ignores it, but it's wrong. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition.Matias Zabaljauregui
If GDT_ENTRIES were every > 256, this could become a problem. Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: beyond ARRAY_SIZE of cpu->arch.gdtRoel Kluin
Do not go beyond ARRAY_SIZE of cpu->arch.gdt Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: clean up example launcher compile flags.Rusty Russell
18 months ago 5bbf89fc260830f3f58b331d946a16b39ad1ca2d changed to loading bzImages directly, and no longer manually ungzipping them, so we no longer need libz. Also, -m32 is useful for those on 64-bit platforms (and harmless on 32-bit). Reported-by: Ron Minnich <rminnich@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: optimize by coding restore_flags and irq_enable in assembler.Rusty Russell
The downside of the last patch which made restore_flags and irq_enable check interrupts is that they are now too big to be patched directly into the callsites, so the C versions are always used. But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all the registers. In fact, we don't need any registers in the fast path, so we can do better than this if we actually code them in assembler. The results are in the noise, but since it's about the same amount of code, it's worth applying. 1GB Guest->Host: input(suppressed),output(suppressed) Before: Seconds: 0:16.53 Packets: 377268,753673 Interrupts: 22461,24297 Notifications: 1(5245),21303(732370) Net IRQs triggered: 377023(245),42578(711095) After: Seconds: 0:16.48 Packets: 377289,753673 Interrupts: 22281,24465 Notifications: 1(5245),21296(732377) Net IRQs triggered: 377060(229),42564(711109) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: improve interrupt handling, speed up stream networkingRusty Russell
lguest never checked for pending interrupts when enabling interrupts, and things still worked. However, it makes a significant difference to TCP performance, so it's time we fixed it by introducing a pending_irq flag and checking it on irq_restore and irq_enable. These two routines are now too big to patch into the 8/10 bytes patch space, so we drop that code. Note: The high latency on interrupt delivery had a very curious effect: once everything else was optimized, networking without GSO was faster than networking with GSO, since more interrupts were sent and hence a greater chance of one getting through to the Guest! Note2: (Almost) Closing the same loophole for iret doesn't have any measurable effect, so I'm leaving that patch for the moment. Before: 1GB tcpblast Guest->Host: 30.7 seconds 1GB tcpblast Guest->Host (no GSO): 76.0 seconds After: 1GB tcpblast Guest->Host: 6.8 seconds 1GB tcpblast Guest->Host (no GSO): 27.8 seconds Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: fix race in halt codeRusty Russell
When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting that a timer or the Waker will wake_up_process() us. But we do it in a stupid way, leaving a classic missing wakeup race. So split maybe_do_interrupt() into interrupt_pending() and try_deliver_interrupt(), and check maybe_do_interrupt() and the "break_out" flag before calling schedule. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: remove invalid interrupt forcing logic.Rusty Russell
20887611523e749d99cc7d64ff6c97d27529fbae (lguest: notify on empty) introduced lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on interrupts all the time. Because we always process one buffer at a time, the inflight count is always 0 when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from the Guest. It should be looking to see if there are more buffers in the Guest's queue: if it's empty, then we force an interrupt. This makes little difference, since we usually have an empty queue; but that's the subject of another patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: fix lguest wake on guest clock tick, or fd activityRusty Russell
The Launcher could be inside the Guest on another CPU; wake_up_process will do nothing because it is "running". kick_process will knock it back into our kernel in this case, otherwise we'll miss it until the next guest exit. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12sched: export kick_processRusty Russell
lguest needs kick_process: wake_up_process() does nothing if a process is running, which isn't sufficient (we need it in the kernel). And lguest support is usually modular. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>
2009-06-12lguest: get more serious about wmb() in example Launcher codeRusty Russell
Since the Launcher process runs the Guest, it doesn't have to be very serious about its barriers: the Guest isn't running while we are (Guest is UP). Before we change to use threads to service devices, we need to fix this. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: clean up lguest_init_IRQRusty Russell
Copy from arch/x86/kernel/irqinit_32.c: we don't use the vectors beyond LGUEST_IRQS (if any), but we might as well set them all. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: cleanup passing of /dev/lguest fd around example launcher.Rusty Russell
We hand the /dev/lguest fd everywhere; it's far neater to just make it a global (it already is, in fact, hidden in the waker_fds struct). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12lguest: be paranoid about guest playing with device descriptors.Rusty Russell
We can't trust the values in the device descriptor table once the guest has booted, so keep local copies. They could set them to strange values then cause us to segv (they're 8 bit values, so they can't make our pointers go too wild). This becomes more important with the following patches which read them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12virtio: enhance id_matching for virtio driversChristian Borntraeger
This patch allows a virtio driver to use VIRTIO_DEV_ANY_ID for the device id. This will be used by a test module that can be bound to any virtio device. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12virtio: fix id_matching for virtio driversChristian Borntraeger
This bug never appeared, since all current virtio drivers use VIRTIO_DEV_ANY_ID for the vendor field. If a real vendor would be used, the check in virtio_id_match is wrong - it returns 0 if id->vendor == dev->id.vendor. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12virtio: handle short buffers in virtio_rng.Rusty Russell
If the device fills less than 4 bytes of our random buffer, we'll BUG_ON. It's nicer to handle the case where it partially fills the buffer (the protocol doesn't explicitly bad that). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12virtio_blk: add missing __dev{init,exit} markingsMike Frysinger
The remove member of the virtio_driver structure uses __devexit_p(), so the remove function itself should be marked with __devexit. And where there be __devexit on the remove, so is there __devinit on the probe. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)Mark McLoughlin
Add a new feature flag for indirect ring entries. These are ring entries which point to a table of buffer descriptors. The idea here is to increase the ring capacity by allowing a larger effective ring size whereby the ring size dictates the number of requests that may be outstanding, rather than the size of those requests. This should be most effective in the case of block I/O where we can potentially benefit by concurrently dispatching a large number of large requests. Even in the simple case of single segment block requests, this results in a threefold increase in ring capacity. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12virtio: teach virtio_has_feature() about transport featuresMark McLoughlin
Drivers don't add transport features to their table, so we shouldn't check these with virtio_check_driver_offered_feature(). We could perhaps add an ->offered_feature() virtio_config_op, but that perhaps that would be overkill for a consitency check like this. Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-12virtio: expose features in sysfsRusty Russell
Each device negotiates feature bits; expose these in sysfs to help diagnostics and debugging. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>