aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)Author
2009-05-15x86: Fix performance regression caused by paravirt_ops on native kernelsJeremy Fitzhardinge
Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: | commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8 | Date: Mon Jul 7 12:07:51 2008 -0700 | | paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq *0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop /* or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com> Analysed-by: "Li Xin" <xin.li@intel.com> Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4A0B62F7.5030802@goop.org> [ fixed the help text ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12x86, 32-bit: fix kernel_trap_sp()Masami Hiramatsu
Use &regs->sp instead of regs for getting the top of stack in kernel mode. (on x86-64, regs->sp always points the top of stack) [ Impact: Oprofile decodes only stack for backtracing on i386 ] Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> [ v2: rename the API to kernel_stack_pointer(), move variable inside ] Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: systemtap@sources.redhat.com Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Jan Blunck <jblunck@suse.de> Cc: Christoph Hellwig <hch@infradead.org> LKML-Reference: <20090511210300.17332.67549.stgit@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11x86: fix percpu_{to,from}_op()Jan Beulich
- the byte operand constraints were wrong for 32-bit - the to-op's input operands weren't properly parenthesized [ Impact: fix possible miscompilation or build failure ] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-02Merge branch 'x86-mce-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip: x86, mce: fix boot logging logic x86, mce: make polling timer interval per CPU
2009-04-22x86/PCI: set_pci_bus_resources_arch_default cleanupsYinghai Lu
Rename set_pci_bus_resources_arch_default to x86_pci_root_bus_res_quirks, move the weak version from common.c to i386.c, and before calling, make sure it's a root bus. Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-04-22x86, mce: fix boot logging logicAndi Kleen
The earlier patch to change the poller to a separate function subtly broke the boot logging logic. This could lead to machine checks getting logged at boot even when disabled or defaulting to off on some systems. Fix that. [ Impact: bug fix - avoid spurious MCE in log ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-04-21FRV: Fix the section attribute on UP DECLARE_PER_CPU()David Howells
In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU() does not agree with that specified by DEFINE_PER_CPU(). This means that architectures that have a small data section references relative to a base register may throw up linkage errors due to too great a displacement between where the base register points and the per-CPU variable. On FRV, the .h declaration says that the variable is in the .sdata section, but the .c definition says it's actually in the .data section. The linker throws up the following errors: kernel/built-in.o: In function `release_task': kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o To fix this, DECLARE_PER_CPU() should simply apply the same section attribute as does DEFINE_PER_CPU(). However, this is made slightly more complex by virtue of the fact that there are several variants on DEFINE, so these need to be matched by variants on DECLARE. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-19lguest: fix guest crash on non-linear addresses in gdt pvopsRusty Russell
Fixes guest crash 'lguest: bad read address 0x4800000 len 256' The new per-cpu allocator ends up handing a non-linear address to write_gdt_entry. We do __pa() on it, and hand it to the host, which kills us. I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT code, but had no pressing reason until now. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: lguest@ozlabs.org
2009-04-17Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix microcode driver newly spewing warnings x86, PAT: Remove page granularity tracking for vm_insert_pfn maps x86: disable X86_PTRACE_BTS for now x86, documentation: kernel-parameters replace X86-32,X86-64 with X86 x86: pci-swiotlb.c swiotlb_dma_ops should be static x86, PAT: Remove duplicate memtype reserve in devmem mmap x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() x86, PAT: Changing memtype to WC ensuring no WB alias x86, PAT: Handle faults cleanly in set_memory_ APIs x86, PAT: Change order of cpa and free in set_memory_wb x86, CPA: Change idmap attribute before ioremap attribute setup
2009-04-16Merge branch 'x86/uv' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86/uv' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: UV BAU distribution and payload MMRs x86: UV: BAU partition-relative distribution map x86, uv: add Kconfig dependency on NUMA for UV systems x86: prevent /sys/firmware/sgi_uv from being created on non-uv systems x86, UV: Fix for nodes with memory and no cpus x86, UV: system table in bios accessed after unmap x86: UV BAU messaging timeouts x86: UV BAU and nodes with no memory
2009-04-13Merge branch 'for-rc1/xen/core' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'for-rc1/xen/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: add FIX_TEXT_POKE to fixmap xen: honour VCPU availability on boot xen: clean up gate trap/interrupt constants xen: set _PAGE_NX in __supported_pte_mask before pagetable construction xen: resume interrupts before system devices. xen/mmu: weaken flush_tlb_other test xen/mmu: some early pagetable cleanups Xen: Add virt_to_pfn helper function x86-64: remove PGE from must-have feature list xen: mask XSAVE from cpuid NULL noise: arch/x86/xen/smp.c xen: remove xen_load_gdt debug xen: make xen_load_gdt simpler xen: clean up xen_load_gdt xen: split construction of p2m mfn tables from registration xen: separate p2m allocation from setting xen: disable preempt for leave_lazy_mmu
2009-04-13Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: add linux kernel support for YMM state x86: fix wrong section of pat_disable & make it static x86: Fix section mismatches in mpparse x86: fix set_fixmap to use phys_addr_t x86: Document get_user_pages_fast() x86, intr-remap: fix eoi for interrupt remapping without x2apic
2009-04-12x86: add linux kernel support for YMM stateSuresh Siddha
Impact: save/restore Intel-AVX state properly between tasks Intel Advanced Vector Extensions (AVX) introduce 256-bit vector processing capability. More about AVX at http://software.intel.com/sites/avx Add OS support for YMM state management using xsave/xrstor infrastructure to support AVX. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1239402084.27006.8057.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10x86: fix set_fixmap to use phys_addr_tMasami Hiramatsu
Impact: fix kprobes crash on 32-bit with RAM above 4G Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: systemtap-ml <systemtap@sources.redhat.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <49DE3695.6040800@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10x86, PAT: Remove duplicate memtype reserve in devmem mmapSuresh Siddha
/dev/mem mmap code was doing memtype reserve/free for a while now. Recently we added memtype tracking in remap_pfn_range, and /dev/mem mmap uses it indirectly. So, we don't need seperate tracking in /dev/mem code any more. That means another ~100 lines of code removed :-). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20090409212709.085210000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09x86: fix set_fixmap to use phys_addr_tMasami Hiramatsu
Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-09Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: cpu_debug remove execute permission x86: smarten /proc/interrupts output for new counters x86: DMI match for the Dell DXP061 as it needs BIOS reboot x86: make 64 bit to use default_inquire_remote_apic x86, setup: un-resequence mode setting for VGA 80x34 and 80x60 modes x86, intel-iommu: fix X2APIC && !ACPI build failure
2009-04-09x86: cpu_debug remove execute permissionJaswinder Singh Rajput
It seems by mistake these files got execute permissions so removing it. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> LKML-Reference: <1239211186.9037.2.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08Xen: Add virt_to_pfn helper functionAlex Nixon
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
2009-04-08x86-64: remove PGE from must-have feature listJeremy Fitzhardinge
PGE may not be available when running paravirtualized, so test the cpuid bit before using it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-04-07x86 ACPI: Add support for Always Running APIC timerVenkatesh Pallipadi
Add support for Always Running APIC timer, CPUID_0x6_EAX_Bit2. This bit means the APIC timer continues to run even when CPU is in deep C-states. The advantage is that we can use LAPIC timer on these CPUs always, and there is no need for "slow to read and program" external timers (HPET/PIT) and the timer broadcast logic and related code in C-state entry and exit. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-04-07dma-mapping: replace all DMA_24BIT_MASK macro with DMA_BIT_MASK(24)Yang Hongyang
Replace all DMA_24BIT_MASK macro with DMA_BIT_MASK(24) Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07dma-mapping: replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)Yang Hongyang
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32) Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06Merge git://git.infradead.org/iommu-2.6Linus Torvalds
* git://git.infradead.org/iommu-2.6: drivers/pci/intr_remapping.c: include acpi.h intel-iommu: Fix oops in device_to_iommu() when devices not found. intel-iommu: Handle PCI domains appropriately. intel-iommu: Fix device-to-iommu mapping for PCI-PCI bridges. x2apic/intr-remap: decouple interrupt remapping from x2apic x86, dmar: check if it's initialized before disable queue invalidation intel-iommu: set compatibility format interrupt Intel IOMMU Suspend/Resume Support - Interrupt Remapping Intel IOMMU Suspend/Resume Support - Queued Invalidation Intel IOMMU Suspend/Resume Support - DMAR intel-iommu: Add for_each_iommu() and for_each_active_iommu() macros
2009-04-05Merge branch 'tracing-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits) tracing, net: fix net tree and tracing tree merge interaction tracing, powerpc: fix powerpc tree and tracing tree interaction ring-buffer: do not remove reader page from list on ring buffer free function-graph: allow unregistering twice trace: make argument 'mem' of trace_seq_putmem() const tracing: add missing 'extern' keywords to trace_output.h tracing: provide trace_seq_reserve() blktrace: print out BLK_TN_MESSAGE properly blktrace: extract duplidate code blktrace: fix memory leak when freeing struct blk_io_trace blktrace: fix blk_probes_ref chaos blktrace: make classic output more classic blktrace: fix off-by-one bug blktrace: fix the original blktrace blktrace: fix a race when creating blk_tree_root in debugfs blktrace: fix timestamp in binary output tracing, Text Edit Lock: cleanup tracing: filter fix for TRACE_EVENT_FORMAT events ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release() x86: kretprobe-booster interrupt emulation code fix ... Fix up trivial conflicts in arch/parisc/include/asm/ftrace.h include/linux/memory.h kernel/extable.c kernel/module.c
2009-04-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumaskLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits) cpumask: remove cpumask allocation from idle_balance, fix numa, cpumask: move numa_node_id default implementation to topology.h, fix cpumask: remove cpumask allocation from idle_balance x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus x86: cpumask: update 32-bit APM not to mug current->cpus_allowed x86: microcode: cleanup x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash numa, cpumask: move numa_node_id default implementation to topology.h cpumask: convert node_to_cpumask_map[] to cpumask_var_t cpumask: remove x86 cpumask_t uses. cpumask: use cpumask_var_t in uv_flush_tlb_others. cpumask: remove cpumask_t assignment from vector_allocation_domain() cpumask: make Xen use the new operators. cpumask: clean up summit's send_IPI functions cpumask: use new cpumask functions throughout x86 x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t cpumask: convert node_to_cpumask_map[] to cpumask_var_t x86: unify 32 and 64-bit node_to_cpumask_map ...
2009-04-04x2apic/intr-remap: decouple interrupt remapping from x2apicHan, Weidong
interrupt remapping must be enabled before enabling x2apic, but interrupt remapping doesn't depend on x2apic, it can be used separately. Enable interrupt remapping in init_dmars even x2apic is not supported. [dwmw2: Update Kconfig accordingly, fix build with INTR_REMAP && !X2APIC] Signed-off-by: Weidong Han <weidong.han@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-04-03Intel IOMMU Suspend/Resume Support - Interrupt RemappingFenghua Yu
This patch enables suspend/resume for interrupt remapping. During suspend, interrupt remapping is disabled. When resume, interrupt remapping is enabled again. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-04-03x86: UV BAU messaging timeoutsCliff Wickman
This patch replaces a 'nop' uv_enable_timeouts() in the UV TLB shootdown code. (somehow, long ago that function got eviscerated) If any cpu in the destination node does not get interrupted by the message and post completion in a reasonable time the hardware should respond to the sender with an error. This function enables such timeouts. Tested on the UV hardware simulator. Signed-off-by: Cliff Wickman <cpw@sgi.com> LKML-Reference: <E1LpjXU-00007e-Qh@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-03x86/dma: unify definition of pci_unmap_addr* and pci_unmap_len macrosJoerg Roedel
Impact: unification of pci-dma macros and pci_32.h removal This patch unifies the definition of the pci_unmap_addr*, pci_unmap_len* and DECLARE_PCI_UNMAP* macros. This makes sense because the pci_unmap functions are no longer no-ops anymore when the kernel runs with CONFIG_DMA_API_DEBUG. Without an iommu or DMA_API_DEBUG it is a no-op on 32 bit because the dma mapping path returns a physical address and therefore the dma-api implementation has no internal state which needs to be destroyed with an unmap call. This unification also simplifies the port of x86_64 iommu drivers to 32 bit x86 and let us get rid of pci_32.h. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
2009-04-02Allow rwlocks to re-enable interruptsRobin Holt
Pass the original flags to rwlock arch-code, so that it can re-enable interrupts if implemented for that architecture. Initially, make __raw_read_lock_flags and __raw_write_lock_flags stubs which just do the same thing as non-flags variants. Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02preadv/pwritev: Add preadv and pwritev system calls.Gerd Hoffmann
This patch adds preadv and pwritev system calls. These syscalls are a pretty straightforward combination of pread and readv (same for write). They are quite useful for doing vectored I/O in threaded applications. Using lseek+readv instead opens race windows you'll have to plug with locking. Other systems have such system calls too, for example NetBSD, check here: http://www.daemon-systems.org/man/preadv.2.html The application-visible interface provided by glibc should look like this to be compatible to the existing implementations in the *BSD family: ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset); ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset); This prototype has one problem though: On 32bit archs is the (64bit) offset argument unaligned, which the syscall ABI of several archs doesn't allow to do. At least s390 needs a wrapper in glibc to handle this. As we'll need a wrappers in glibc anyway I've decided to push problem to glibc entriely and use a syscall prototype which works without arch-specific wrappers inside the kernel: The offset argument is explicitly splitted into two 32bit values. The patch sports the actual system call implementation and the windup in the x86 system call tables. Other archs follow as separate patches. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-api@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02sgi-gru: add macros for using the UV hub to send interruptsJack Steiner
Add macros for using the UV hub to send interrupts. Change the IPI code to use these macros. These macros will also be used in additional patches that will follow. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02sgi-gru: add definitions of x86_64 GRU MMRsJack Steiner
Add definitions for x86_64 GRU MMRs. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02sgi-gru: exclude UV definitions on 32-bit x86Jack Steiner
Eliminate compile errors on 32-bit X86 caused by UV. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02Merge branch 'tracing/core-v2' into tracing-for-linusIngo Molnar
Conflicts: include/linux/slub_def.h lib/Kconfig.debug mm/slob.c mm/slub.c
2009-04-01Merge branch 'linux-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits) PCI: fix HT MSI mapping fix PCI: don't enable too much HT MSI mapping x86/PCI: make pci=lastbus=255 work when acpi is on PCI: save and restore PCIe 2.0 registers PCI: update fakephp for bus_id removal PCI: fix kernel oops on bridge removal PCI: fix conflict between SR-IOV and config space sizing powerpc/PCI: include pci.h in powerpc MSI implementation PCI Hotplug: schedule fakephp for feature removal PCI Hotplug: rename legacy_fakephp to fakephp PCI Hotplug: restore fakephp interface with complete reimplementation PCI: Introduce /sys/bus/pci/devices/.../rescan PCI: Introduce /sys/bus/pci/devices/.../remove PCI: Introduce /sys/bus/pci/rescan PCI: Introduce pci_rescan_bus() PCI: do not enable bridges more than once PCI: do not initialize bridges more than once PCI: always scan child buses PCI: pci_scan_slot() returns newly found devices PCI: don't scan existing devices ... Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
2009-04-01pm: cleanup includesMagnus Damm
Remove unused/duplicate cruft from asm/suspend.h: - x86_32: remove unused acpi code - powerpc: remove duplicate prototypes, see linux/suspend.h Signed-off-by: Magnus Damm <damm@igel.co.jp> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-31Merge branch 'cpumask-for-linus' of ↵Rusty Russell
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip Conflicts: arch/x86/include/asm/topology.h drivers/oprofile/buffer_sync.c (Both cases: changed in Linus' tree, removed in Ingo's).
2009-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumaskLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: oprofile: Thou shalt not call __exit functions from __init functions cpumask: remove the now-obsoleted pcibus_to_cpumask(): generic cpumask: remove cpumask_t from core cpumask: convert rcutorture.c cpumask: use new cpumask_ functions in core code. cpumask: remove references to struct irqaction's mask field. cpumask: use mm_cpumask() wrapper: kernel/fork.c cpumask: use set_cpu_active in init/main.c cpumask: remove node_to_first_cpu cpumask: fix seq_bitmap_*() functions. cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL
2009-03-30Merge ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest-and-virtio * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest-and-virtio: lguest: barrier me harder lguest: use bool instead of int lguest: use KVM hypercalls lguest: wire up pte_update/pte_update_defer lguest: fix spurious BUG_ON() on invalid guest stack. virtio: more neatening of virtio_ring macros. virtio: fix BAD_RING, START_US and END_USE macros
2009-03-30Merge branch 'linus' into cpumask-for-linusIngo Molnar
Conflicts: arch/x86/kernel/cpu/common.c
2009-03-30Merge branch 'iommu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits) dma-debug: make memory range checks more consistent dma-debug: warn of unmapping an invalid dma address dma-debug: fix dma_debug_add_bus() definition for !CONFIG_DMA_API_DEBUG dma-debug/x86: register pci bus for dma-debug leak detection dma-debug: add a check dma memory leaks dma-debug: add checks for kernel text and rodata dma-debug: print stacktrace of mapping path on unmap error dma-debug: Documentation update dma-debug: x86 architecture bindings dma-debug: add function to dump dma mappings dma-debug: add checks for sync_single_sg_* dma-debug: add checks for sync_single_range_* dma-debug: add checks for sync_single_* dma-debug: add checking for [alloc|free]_coherent dma-debug: add add checking for map/unmap_sg dma-debug: add checking for map/unmap_page/single dma-debug: add core checking functions dma-debug: add debugfs interface dma-debug: add kernel command line parameters dma-debug: add initialization code ... Fix trivial conflicts due to whitespace changes in arch/x86/kernel/pci-nommu.c
2009-03-30Merge branch 'x86-stage-3-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-stage-3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (190 commits) Revert "cpuacct: reduce one NULL check in fast-path" Revert "x86: don't compile vsmp_64 for 32bit" x86: Correct behaviour of irq affinity x86: early_ioremap_init(), use __fix_to_virt(), because we are sure it's safe x86: use default_cpu_mask_to_apicid for 64bit x86: fix set_extra_move_desc calling x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot x86/dmi: fix dmi_alloc() section mismatches x86: e820 fix various signedness issues in setup.c and e820.c x86: apic/io_apic.c define msi_ir_chip and ir_ioapic_chip all the time x86: irq.c keep CONFIG_X86_LOCAL_APIC interrupts together x86: irq.c use same path for show_interrupts x86: cpu/cpu.h cleanup x86: Fix a couple of sparse warnings in arch/x86/kernel/apic/io_apic.c Revert "x86: create a non-zero sized bm_pte only when needed" x86: pci-nommu.c cleanup x86: io_delay.c cleanup x86: rtc.c cleanup x86: i8253 cleanup x86: kdebugfs.c cleanup ...
2009-03-30Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (180 commits) powerpc: clean up ssi.txt, add definition for fsl,ssi-asynchronous powerpc/85xx: Add support for the "socrates" board (MPC8544). powerpc: Fix bugs introduced by sysfs changes powerpc: Sanitize stack pointer in signal handling code powerpc: Add write barrier before enabling DTL flags powerpc/83xx: Update ranges in gianfar node to match other dts powerpc/86xx: Move gianfar mdio nodes under the ethernet nodes powerpc/85xx: Move gianfar mdio nodes under the ethernet nodes powerpc/83xx: Move gianfar mdio nodes under the ethernet nodes powerpc/83xx: Add power management support for MPC837x boards powerpc/mm: Introduce early_init_mmu() on 64-bit powerpc/mm: Add option for non-atomic PTE updates to ppc64 powerpc/mm: Fix printk type warning in mmu_context_nohash powerpc/mm: Rename arch/powerpc/kernel/mmap.c to mmap_64.c powerpc/mm: Merge various PTE bits and accessors definitions powerpc/mm: Tweak PTE bit combination definitions powerpc/cell: Fix iommu exception reporting powerpc/mm: e300c2/c3/c4 TLB errata workaround powerpc/mm: Used free register to save a few cycles in SW TLB miss handling powerpc/mm: Remove unused register usage in SW TLB miss handling ...
2009-03-30x86: fix mismerge in arch/x86/include/asm/timer.hStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-30cpumask: remove node_to_first_cpuRusty Russell
Everyone defines it, and only one person uses it (arch/mips/sgi-ip27/ip27-nmi.c). So just open code it there. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-mips@linux-mips.org
2009-03-30lguest: use KVM hypercallsMatias Zabaljauregui
Impact: cleanup This patch allow us to use KVM hypercalls Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-30Merge commit 'origin/master' into nextBenjamin Herrenschmidt
Manual merge of: arch/powerpc/include/asm/elf.h drivers/i2c/busses/i2c-mpc.c
2009-03-28Merge branch 'linus' into core/iommuIngo Molnar
Conflicts: arch/x86/Kconfig