Age | Commit message (Collapse) | Author |
|
Add a generic implementation of the old mmap() syscall, which expects its
argument in a memory block and switch all architectures over to use it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add a generic implementation of the old select() syscall, which expects
its argument in a memory block and switch all architectures over to use
it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Reviewed-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: James Morris <jmorris@namei.org>
Acked-by: Andreas Schwab <schwab@linux-m68k.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (62 commits)
msi-laptop: depends on RFKILL
msi-laptop: Detect 3G device exists by standard ec command
msi-laptop: Add resume method for set the SCM load again
msi-laptop: Support some MSI 3G netbook that is need load SCM
msi-laptop: Add threeg sysfs file for support query 3G state by standard 66/62 ec command
msi-laptop: Support standard ec 66/62 command on MSI notebook and nebook
Driver core: create lock/unlock functions for struct device
sysfs: fix for thinko with sysfs_bin_attr_init()
sysfs: Kill unused sysfs_sb variable.
sysfs: Pass super_block to sysfs_get_inode
driver core: Use sysfs_rename_link in device_rename
sysfs: Implement sysfs_rename_link
sysfs: Pack sysfs_dirent more tightly.
sysfs: Serialize updates to the vfs inode
sysfs: windfarm: init sysfs attributes
sysfs: Use sysfs_attr_init and sysfs_bin_attr_init on module dynamic attributes
sysfs: Document sysfs_attr_init and sysfs_bin_attr_init
sysfs: Use sysfs_attr_init and sysfs_bin_attr_init on dynamic attributes
sysfs: Use one lockdep class per sysfs attribute.
sysfs: Only take active references on attributes.
...
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] i6300esb.c: change platform_driver to pci_driver
[WATCHDOG] i6300esb: fix unlock register with
[WATCHDOG] drivers/watchdog/wdt.c:wdt_ioctl(): make `ident' non-static
[WATCHDOG] change reboot_notifier to platform-shutdown method.
[WATCHDOG] watchdog_info constify
[WATCHDOG] gef_wdt: Author corrections following split of GE Fanuc joint venture
[WATCHDOG] iTCO_wdt: clean up probe(), modify err msg
[WATCHDOG] ep93xx: watchdog timer driver for TS-72xx SBCs cleanup
[WATCHDOG] support for max63xx watchdog timer chips
[WATCHDOG] ep93xx: added platform side support for TS-72xx WDT driver
[WATCHDOG] ep93xx: implemented watchdog timer driver for TS-72xx SBCs
|
|
Declare the smsgiucv prefix char pointer as "const" and use
use const char pointers in callback functions.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
zfcp and qeth are setting flags for the qdio-layer, but these flags
are not used in qdio. Patch removes the flag definitions from qdio
and their settings in zfcp and qeth.
Cc: Jan Glauber <jang@linux.vnet.ibm.com>
Cc: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
This replaces direct xtime usage in the s390 arch with timekeeping accessors,
so we can further clean up the timekeeping core.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
If there is no in kernel image caller modules will suffer:
ERROR: "copy_from_user_overflow" [net/core/pktgen.ko] undefined!
ERROR: "copy_from_user_overflow" [net/can/can-raw.ko] undefined!
ERROR: "copy_from_user_overflow" [fs/cifs/cifs.ko] undefined!
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
These are the non-static sysfs attributes that exist on
my test machine. Fix them to use sysfs_attr_init or
sysfs_bin_attr_init as appropriate. It simply requires
making a sysfs attribute present to see this. So this
is a little bit tedious but otherwise not too bad.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
A pointer to a probe callback is passed to the core via
platform_driver_register and so the function must not disappear when the
.init sections are discarded. Otherwise (if also having HOTPLUG=y)
unbinding and binding a device to the driver via sysfs will result in an
oops as does a device being registered late.
An alternative to this patch is using platform_driver_probe instead of
platform_driver_register plus removing the pointer to the probe function
from the struct platform_driver.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Dmitry Baryshkov <dbaryshkov@gmail.com>
Cc: Eric Miao <eric.miao@marvell.com>
Cc: Liam Girdwood <liam.girdwood@wolfsonmicro.com>
Cc: Paul Sokolovsky <pmiscml@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Acked-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
In linux-next "sysdev: Pass attribute in sysdev_class attributes show/store"
forgot to convert one place in s390 code. Here is the missing part.
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Passing the attribute to the low level IO functions allows all kinds
of cleanups, by sharing low level IO code without requiring
an own function for every piece of data.
Also drivers can extend the attributes with own data fields
and use that in the low level function.
Similar to sysdev_attributes and normal attributes.
This is a tree-wide sweep, converting everything in one go.
No functional changes in this patch other than passing the new
argument everywhere.
Tested on x86, the non x86 parts are uncompiled.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
x86, mrst: Fix whitespace breakage in apb_timer.c
x86, mrst: Fix APB timer per cpu clockevent
x86, mrst: Remove X86_MRST dependency on PCI_IOAPIC
x86, olpc: Use pci subarch init for OLPC
x86, pci: Add arch_init to x86_init abstraction
x86, mrst: Add Kconfig dependencies for Moorestown
x86, pci: Exclude Moorestown PCI code if CONFIG_X86_MRST=n
x86, numaq: Make CONFIG_X86_NUMAQ depend on CONFIG_PCI
x86, pci: Add sanity check for PCI fixed bar probing
x86, legacy_irq: Remove duplicate vector assigment
x86, legacy_irq: Remove left over nr_legacy_irqs
x86, mrst: Platform clock setup code
x86, apbt: Moorestown APB system timer driver
x86, mrst: Add vrtc platform data setup code
x86, mrst: Add platform timer info parsing code
x86, mrst: Fill in PCI functions in x86_init layer
x86, mrst: Add dummy legacy pic to platform setup
x86/PCI: Moorestown PCI support
x86, ioapic: Add dummy ioapic functions
x86, ioapic: Early enable ioapic for timer irq
...
Fixed up semantic conflict of new clocksources due to commit
17622339af25 ("clocksource: add argument to resume callback").
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (26 commits)
sh: Convert sh to use read/update_persistent_clock
sh: Move PMB debugfs entry initialization to later stage
sh: Fix up flush_cache_vmap() on SMP.
sh: fix up MMU reset with variable PMB mapping sizes.
sh: establish PMB mappings for NUMA nodes.
sh: check for existing mappings for bolted PMB entries.
sh: fixed virt/phys mapping helpers for PMB.
sh: make pmb iomapping configurable.
sh: reworked dynamic PMB mapping.
sh: Fix up cpumask_of_pcibus() for the NUMA build.
serial: sh-sci: Tidy up build warnings.
sh: Fix up ctrl_read/write stragglers in migor setup.
serial: sh-sci: Add DMA support.
dmaengine: shdma: extend .device_terminate_all() to record partial transfer
sh: merge sh7722 and sh7724 DMA register definitions
sh: activate runtime PM for dmaengine on sh7722 and sh7724
dmaengine: shdma: add runtime PM support.
dmaengine: shdma: separate DMA headers.
dmaengine: shdma: convert to platform device resources
dmaengine: shdma: fix DMA error handling.
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
parisc: use __ratelimit in unaligned.c
parisc: Convert to read/update_persistent_clock
parisc: Simplify param.h by including <asm-generic/param.h>
parisc: drop unnecessary cast in __ldcw_align() macro
parisc: add strict copy size checks (v2)
parisc: remove trailing space in messages
parisc: ditto sys_accept4
parisc: wire up sys_recvmmsg
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Fix cast warning in pcc driver.
[CPUFREQ] Processor Clocking Control interface driver
|
|
make the watchdog_info struct const where possible.
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
|
|
Replace open-coded rate limiting logic with __ratelimit().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
This patch converts the parisc architecture to use the generic
read_persistent_clock and update_persistent_clock interfaces, reducing
the amount of arch specific code we have to maintain, and allowing for
further cleanups in the future.
I have not built or tested this patch, so help from arch maintainers
would be appreciated.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
__ldcw_align() can directly access the slock member of struct arch_spinlock_t
instead of using an ugly cast.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
Add CONFIG_DEBUG_STRICT_USER_COPY_CHECKS, copied from the x86
implementation. Tested with 32 and 64bit kernel.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
Signed-off-by: Frans Pop <elendil@planet.nl>
Cc: linux-parisc@vger.kernel.org
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
tested with test_accept4.c from de11defebf00007677fb7ee91d9b089b78786fbb
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
|
|
The current ELF dumper implementation can produce broken corefiles if
program headers exceed 65535. This number is determined by the number of
vmas which the process have. In particular, some extreme programs may use
more than 65535 vmas. (If you google max_map_count, you can find some
users facing this problem.) This kind of program never be able to generate
correct coredumps.
This patch implements ``extended numbering'' that uses sh_info field of
the first section header instead of e_phnum field in order to represent
upto 4294967295 vmas.
This is supported by
AMD64-ABI(http://www.x86-64.org/documentation.html) and
Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
Of course, we are preparing patches for gdb and binutils.
Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
macro for hiding _multiline_ logics in functions. This patch removes
#ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For
architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
order to reduce a range of modification.
This cleanup is for my next patches, but I think this cleanup itself is
worth doing regardless of my firnal purpose.
Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'. Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Milton Miller <miltonm@bga.com>
Cc: Nathan Fontenot <nfont@austin.ibm.com>
Cc: Geoff Levand <geoffrey.levand@am.sony.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Tell git to ignore the generated files under um, except:
include/shared/kern_constants.h
include/shared/user_constants.h
which will be moved to include/generated.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Assign tty only if line is not NULL.
[akpm@linux-foundation.org: simplification]
Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With id 1 the wrong bp was unwatched.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
size_t desc_len cannot be less than 0, test before the subtraction.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Convert cris to use GENERIC_TIME via the arch_getoffset() infrastructure,
reducing the amount of arch specific code we need to maintain.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The initial -EINVAL value is overwritten by `retval = PTR_ERR(name)'. If
this isn't an error pointer and typenr is not 1, 6 or 9, then this retval,
a pointer cast to a long, is returned.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
No architecture except for frv has pci_dma_sync_single() and
pci_dma_sync_sg(). The APIs are deprecated.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The old anon_vma code can lead to scalability issues with heavily forking
workloads. Specifically, each anon_vma will be shared between the parent
process and all its child processes.
In a workload with 1000 child processes and a VMA with 1000 anonymous
pages per process that get COWed, this leads to a system with a million
anonymous pages in the same anon_vma, each of which is mapped in just one
of the 1000 processes. However, the current rmap code needs to walk them
all, leading to O(N) scanning complexity for each page.
This can result in systems where one CPU is walking the page tables of
1000 processes in page_referenced_one, while all other CPUs are stuck on
the anon_vma lock. This leads to catastrophic failure for a benchmark
like AIM7, where the total number of processes can reach in the tens of
thousands. Real workloads are still a factor 10 less process intensive
than AIM7, but they are catching up.
This patch changes the way anon_vmas and VMAs are linked, which allows us
to associate multiple anon_vmas with a VMA. At fork time, each child
process gets its own anon_vmas, in which its COWed pages will be
instantiated. The parents' anon_vma is also linked to the VMA, because
non-COWed pages could be present in any of the children.
This reduces rmap scanning complexity to O(1) for the pages of the 1000
child processes, with O(N) complexity for at most 1/N pages in the system.
This reduces the average scanning cost in heavily forking workloads from
O(N) to 2.
The only real complexity in this patch stems from the fact that linking a
VMA to anon_vmas now involves memory allocations. This means vma_adjust
can fail, if it needs to attach a VMA to anon_vma structures. This in
turn means error handling needs to be added to the calling functions.
A second source of complexity is that, because there can be multiple
anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
"the" anon_vma lock. To prevent the rmap code from walking up an
incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag. This bit
flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
to make sure it is impossible to compile a kernel that needs both symbolic
values for the same bitflag.
Some test results:
Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
box with 16GB RAM and not quite enough IO), the system ends up running
>99% in system time, with every CPU on the same anon_vma lock in the
pageout code.
With these changes, AIM7 hits the cross-over point around 29.7k users.
This happens with ~99% IO wait time, there never seems to be any spike in
system time. The anon_vma lock contention appears to be resolved.
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Rename for_each_bit to for_each_set_bit in the kernel source tree. To
permit for_each_clear_bit(), should that ever be added.
The patch includes a macro to map the old for_each_bit() onto the new
for_each_set_bit(). This is a (very) temporary thing to ease the migration.
[akpm@linux-foundation.org: add temporary for_each_bit()]
Suggested-by: Alexey Dobriyan <adobriyan@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Mika Westerberg <mika.westerberg@iki.fi>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
|
|
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (145 commits)
KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
KVM: VMX: Update instruction length on intercepted BP
KVM: Fix emulate_sys[call, enter, exit]()'s fault handling
KVM: Fix segment descriptor loading
KVM: Fix load_guest_segment_descriptor() to inject page fault
KVM: x86 emulator: Forbid modifying CS segment register by mov instruction
KVM: Convert kvm->requests_lock to raw_spinlock_t
KVM: Convert i8254/i8259 locks to raw_spinlocks
KVM: x86 emulator: disallow opcode 82 in 64-bit mode
KVM: x86 emulator: code style cleanup
KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
KVM: x86 emulator: Add LOCK prefix validity checking
KVM: x86 emulator: Check CPL level during privilege instruction emulation
KVM: x86 emulator: Fix popf emulation
KVM: x86 emulator: Check IOPL level during io instruction emulation
KVM: x86 emulator: fix memory access during x86 emulation
KVM: x86 emulator: Add Virtual-8086 mode of emulation
KVM: x86 emulator: Add group9 instruction decoding
KVM: x86 emulator: Add group8 instruction decoding
KVM: do not store wqh in irqfd
...
Trivial conflicts in Documentation/feature-removal-schedule.txt
|
|
Fix missing kernel-doc notation in mtrr/main.c:
Warning(arch/x86/kernel/cpu/mtrr/main.c:152): No description found for parameter 'info'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-probes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Issue at least one memory barrier in stop_machine_text_poke()
perf probe: Correct probe syntax on command line help
perf probe: Add lazy line matching support
perf probe: Show more lines after last line
perf probe: Check function address range strictly in line finder
perf probe: Use libdw callback routines
perf probe: Use elfutils-libdw for analyzing debuginfo
perf probe: Rename probe finder functions
perf probe: Fix bugs in line range finder
perf probe: Update perf probe document
perf probe: Do not show --line option without dwarf support
kprobes: Add documents of jump optimization
kprobes/x86: Support kprobes jump optimization on x86
x86: Add text_poke_smp for SMP cross modifying code
kprobes/x86: Cleanup save/restore registers
kprobes/x86: Boost probes when reentering
kprobes: Jump optimization sysctl interface
kprobes: Introduce kprobes jump optimization
kprobes: Introduce generic insn_slot framework
kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE
|
|
This patch converts the sh architecture to use the generic
read_persistent_clock and update_persistent_clock interfaces, reducing
the amount of arch specific code we have to maintain, and allowing for
further cleanups in the future.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Make prom entry spinlock NMI safe.
sparc64: Kill off old sys_perfctr system call and state.
sparc: Update defconfigs.
sparc: Provide io{read,write}{16,32}be().
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (28 commits)
ioat: cleanup ->timer_fn() and ->cleanup_fn() prototypes
ioat3: interrupt coalescing
ioat: close potential BUG_ON race in the descriptor cleanup path
ioat2: kill pending flag
ioat3: use ioat2_quiesce()
ioat3: cleanup, don't enable DCA completion writes
DMAENGINE: COH 901 318 lli sg offset fix
DMAENGINE: COH 901 318 configure channel direction
DMAENGINE: COH 901 318 remove irq counting
DMAENGINE: COH 901 318 descriptor pool refactoring
DMAENGINE: COH 901 318 cleanups
dma: Add MPC512x DMA driver
Debugging options for the DMA engine subsystem
iop-adma: redundant/wrong tests in iop_*_count()?
dmatest: fix handling of an even number of xor_sources
dmatest: correct raid6 PQ test
fsldma: Fix cookie issues
fsldma: Fix cookie issues
dma: cases IPU_PIX_FMT_BGRA32, BGR32 and ABGR32 are the same in ipu_ch_param_set_size()
dma: make Open Firmware device id constant
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
init: Open /dev/console from rootfs
mqueue: fix typo "failues" -> "failures"
mqueue: only set error codes if they are really necessary
mqueue: simplify do_open() error handling
mqueue: apply mathematics distributivity on mq_bytes calculation
mqueue: remove unneeded info->messages initialization
mqueue: fix mq_open() file descriptor leak on user-space processes
fix race in d_splice_alias()
set S_DEAD on unlink() and non-directory rename() victims
vfs: add NOFOLLOW flag to umount(2)
get rid of ->mnt_parent in tomoyo/realpath
hppfs can use existing proc_mnt, no need for do_kern_mount() in there
Mirror MS_KERNMOUNT in ->mnt_flags
get rid of useless vfsmount_lock use in put_mnt_ns()
Take vfsmount_lock to fs/internal.h
get rid of insanity with namespace roots in tomoyo
take check for new events in namespace (guts of mounts_poll()) to namespace.c
Don't mess with generic_permission() under ->d_lock in hpfs
sanitize const/signedness for udf
nilfs: sanitize const/signedness in dealing with ->d_name.name
...
Fix up fairly trivial (famous last words...) conflicts in
drivers/infiniband/core/uverbs_main.c and security/tomoyo/realpath.c
|
|
... so the "sh_debugfs_root" is already available. Previously it
wasn't and in result its path was "/sys/kernel/debug/pmb" instead of
"/sys/kernel/debug/sh/pmb".
Signed-off-by: Pawel Moll <pawel.moll@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Fix stop_machine_text_poke() to issue smp_mb() before exiting
waiting loop, and use cpu_relax() for waiting.
Changes in v2:
- Don't use ACCESS_ONCE().
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <20100304033850.3819.74590.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
flush_cache_all() uses broadcast IPIs, so we can't wrap in to that when
IRQs are disabled. The local cache flush manages to do what we need here
anyways, so just switch to that.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
|
Presently we run in to issues with the MMU resetting the CPU when
variable sized mappings are employed. This takes a slightly more
aggressive approach to keeping the TLB and cache state sane before
establishing the mappings in order to cut down on races observed on
SMP configurations.
At the same time, we bump the VMA range up to the 0xb000...0xc000 range,
as there still seems to be some undocumented behaviour in setting up
variable mappings in the 0xa000...0xb000 range, resulting in reset by the
TLB.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|