aboutsummaryrefslogtreecommitdiff
path: root/drivers/base
AgeCommit message (Collapse)Author
2010-03-07sysdev: Convert cpu driver sysdev class attributesAndi Kleen
Using the new attribute argument convert the cpu driver class attributes to carry the node state. Then use a shared function to do what a lot of individual functions did before. This eliminates an ugly macro. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07sysdev: Convert node driver class attributes to be data drivenAndi Kleen
Using the new attribute argument convert the node driver class attributes to carry the node state. Then use a shared function to do what a lot of individual functions did before. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07sysdev: Pass attribute in sysdev_class attributes show/storeAndi Kleen
Passing the attribute to the low level IO functions allows all kinds of cleanups, by sharing low level IO code without requiring an own function for every piece of data. Also drivers can extend the attributes with own data fields and use that in the low level function. Similar to sysdev_attributes and normal attributes. This is a tree-wide sweep, converting everything in one go. No functional changes in this patch other than passing the new argument everywhere. Tested on x86, the non x86 parts are uncompiled. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07Driver core: add platform_create_bundle() helperDmitry Torokhov
Many legacy-style module create singleton platform devices themselves, along with corresponding platform driver. Instead of replicating error handling code in all such drivers, provide a helper that allocates and registers a single platform device and a driver and binds them together. Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07driver-core: fix race condition in get_device_parent()Tejun Heo
sysfs is creating several devices in cuse class concurrently and with CONFIG_SYSFS_DEPRECATED turned off, it triggers the following oops. BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 IP: [<ffffffff81158b0a>] sysfs_addrm_start+0x4a/0xf0 PGD 75bb067 PUD 75be067 PMD 0 Oops: 0000 [#1] PREEMPT SMP last sysfs file: /sys/devices/system/cpu/cpu7/topology/core_siblings CPU 1 Modules linked in: cuse fuse Pid: 4737, comm: osspd Not tainted 2.6.31-work #77 RIP: 0010:[<ffffffff81158b0a>] [<ffffffff81158b0a>] sysfs_addrm_start+0x4a/0xf0 RSP: 0018:ffff88000042f8f8 EFLAGS: 00010296 RAX: ffff88000042ffd8 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff880007eef660 RDI: 0000000000000001 RBP: ffff88000042f918 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: ffffffff81158b0a R12: ffff88000042f928 R13: 00000000fffffff4 R14: 0000000000000000 R15: ffff88000042f9a0 FS: 00007fe93905a950(0000) GS:ffff880008600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000038 CR3: 00000000077c9000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process osspd (pid: 4737, threadinfo ffff88000042e000, task ffff880007eef040) Stack: ffff880005da10e8 0000000011cc8d6e ffff88000042f928 ffff880003d28a28 <0> ffff88000042f988 ffffffff811592d7 0000000000000000 0000000000000000 <0> 0000000000000000 0000000000000000 ffff88000042f958 0000000011cc8d6e Call Trace: [<ffffffff811592d7>] create_dir+0x67/0xe0 [<ffffffff811593a8>] sysfs_create_dir+0x58/0xb0 [<ffffffff8128ca7c>] ? kobject_add_internal+0xcc/0x220 [<ffffffff812942e1>] ? vsnprintf+0x3c1/0xb90 [<ffffffff8128cab7>] kobject_add_internal+0x107/0x220 [<ffffffff8128cd37>] kobject_add_varg+0x47/0x80 [<ffffffff8128ce53>] kobject_add+0x53/0x90 [<ffffffff81357d84>] device_add+0xd4/0x690 [<ffffffff81356c2b>] ? dev_set_name+0x4b/0x70 [<ffffffffa001a884>] cuse_process_init_reply+0x2b4/0x420 [cuse] ... The problem is that kobject_add_internal() first adds a kobject to the kset and then try to create sysfs directory for it. If the creation fails, it remove the kobject from the kset. get_device_parent() accesses class_dirs kset while only holding class_dirs.list_lock to see whether the cuse class dir exists. But when it exists, it may not have finished initialization yet or may fail and get removed soon. In the above case, the former happened so the second one ends up trying to create subdirectory under NULL sysfs_dirent. Fix it by grabbing a mutex in get_device_parent(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Colin Guthrie <cguthrie@mandriva.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-06PM: Provide generic subsystem-level callbacksRafael J. Wysocki
There are subsystems whose power management callbacks only need to invoke the callbacks provided by device drivers. Still, their system sleep PM callbacks should play well with the runtime PM callbacks, so that devices suspended at run time can be left in that state for a system sleep transition. Provide a set of generic PM callbacks for such subsystems and define convenience macros for populating dev_pm_ops structures. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-02-26PM: Allow device drivers to use dpm_wait()Rafael J. Wysocki
There are some dependencies between devices (in particular, between EHCI USB controllers and their OHCI/UHCI siblings) which are not reflected by the structure of the device tree. With synchronous suspend and resume these dependencies are taken into accout automatically, because the devices in question are always registered in the right order, but to meet these constraints with asynchronous suspend and resume the drivers of these devices will need to use dpm_wait() in their suspend/resume routines, so introduce a helper function allowing them to do that. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-02-26PM: Start asynchronous resume threads upfrontRafael J. Wysocki
It has been shown by testing that total device resume time can be reduced significantly (by as much as 50% or more) if the async threads executing some devices' resume routines are all started before the main resume thread starts to handle the "synchronous" devices. This is a consequence of the fact that the slowest devices tend to be located at the end of dpm_list, so their resume routines are started very late. Consequently, they have to wait for all the preceding "synchronous" devices before their resume routines can be started by the main resume thread, even if they are "asynchronous". By starting their async threads upfront we effectively move those devices towards the beginning of dpm_list, without breaking their ordering with respect to their parents and children. As a result, their resume routines are started much earlier and we are able to save much more device resume time this way. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-02-26PM: Add facility for advanced testing of async suspend/resumeRafael J. Wysocki
Add configuration switch CONFIG_PM_ADVANCED_DEBUG for compiling in extra PM debugging/testing code allowing one to access some PM-related attributes of devices from the user space via sysfs. If CONFIG_PM_ADVANCED_DEBUG is set, add sysfs attribute power/async for every device allowing the user space to access the device's power.async_suspend flag and modify it, if desired. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-02-26PM: Add a switch for disabling/enabling asynchronous suspend/resumeRafael J. Wysocki
Add sysfs attribute /sys/power/pm_async allowing the user space to disable/enable asynchronous suspend/resume of devices. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-02-26PM: Asynchronous suspend and resume of devicesRafael J. Wysocki
Theoretically, the total time of system sleep transitions (suspend to RAM, hibernation) can be reduced by running suspend and resume callbacks of device drivers in parallel with each other. However, there are dependencies between devices such that we're not allowed to suspend the parent of a device before suspending the device itself. Analogously, we're not allowed to resume a device before resuming its parent. The most straightforward way to take these dependencies into accout is to start the async threads used for suspending and resuming devices at the core level, so that async_schedule() is called for each suspend and resume callback supposed to be executed asynchronously. For this purpose, introduce a new device flag, power.async_suspend, used to mark the devices whose suspend and resume callbacks are to be executed asynchronously (ie. in parallel with the main suspend/resume thread and possibly in parallel with each other) and helper function device_enable_async_suspend() allowing one to set power.async_suspend for given device (power.async_suspend is unset by default for all devices). For each device with the power.async_suspend flag set the PM core will use async_schedule() to execute its suspend and resume callbacks. The async threads started for different devices as a result of calling async_schedule() are synchronized with each other and with the main suspend/resume thread with the help of completions, in the following way: (1) There is a completion, power.completion, for each device object. (2) Each device's completion is reset before calling async_schedule() for the device or, in the case of devices with the power.async_suspend flags unset, before executing the device's suspend and resume callbacks. (3) During suspend, right before running the bus type, device type and device class suspend callbacks for the device, the PM core waits for the completions of all the device's children to be completed. (4) During resume, right before running the bus type, device type and device class resume callbacks for the device, the PM core waits for the completion of the device's parent to be completed. (5) The PM core completes power.completion for each device right after the bus type, device type and device class suspend (or resume) callbacks executed for the device have returned. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-02-26PM: Add parent information to timing messagesRafael J. Wysocki
Add parent information to the messages printed by the suspend/resume core when initcall_debug is set. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2010-02-26PM / Runtime: Add sysfs switch for disabling device run-time PMRafael J. Wysocki
Add new device sysfs attribute, power/control, allowing the user space to block the run-time power management of the devices. If this attribute is set to "on", the driver of the device won't be able to power manage it at run time (without breaking the rules) and the device will always be in the full power state (except when the entire system goes into a sleep state). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Alan Stern <stern@rowland.harvard.edu>
2010-02-16class: Free the class private data in class_releaseLaurent Pinchart
Fix a memory leak by freeing the memory allocated in __class_register for the class private data. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-20Revert "sysdev: fix prototype for memory_sysdev_class show/store functions"Greg Kroah-Hartman
This reverts commit 8ff410daa009c4b44be445ded5b0cec00abc0426 It should not have been sent to Linus's tree yet, as it depends on changes that are queued up in my driver-core for the .34 kernel merge. Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "Zheng, Shaohui" <shaohui.zheng@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-20driver-core: fix devtmpfs crash on s390Heiko Carstens
On Mon, Jan 18, 2010 at 05:26:20PM +0530, Sachin Sant wrote: > Hello Heiko, > > Today while trying to boot next-20100118 i came across > the following Oops : > > Brought up 4 CPUs > Unable to handle kernel pointer dereference at virtual kernel address 0000000000 > 543000 > Oops: 0004 #1 SMP > Modules linked in: > CPU: 0 Not tainted 2.6.33-rc4-autotest-next-20100118-5-default #1 > Process swapper (pid: 1, task: 00000000fd792038, ksp: 00000000fd797a30) > Krnl PSW : 0704200180000000 00000000001eb0b8 (shmem_parse_options+0xc0/0x328) > R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3 > Krnl GPRS: 000000000054388a 000000000000003d 0000000000543836 000000000000003d > 0000000000000000 0000000000483f28 0000000000536112 00000000fd797d00 > 00000000fd4ba100 0000000000000100 0000000000483978 0000000000543832 > 0000000000000000 0000000000465958 00000000001eb0b0 00000000fd797c58 > Krnl Code: 00000000001eb0aa: c0e5000994f1 brasl %r14,31da8c > 00000000001eb0b0: b9020022 ltgr %r2,%r2 > 00000000001eb0b4: a784010b brc 8,1eb2ca > >00000000001eb0b8: 92002000 mvi 0(%r2),0 > 00000000001eb0bc: a7080000 lhi %r0,0 > 00000000001eb0c0: 41902001 la %r9,1(%r2) > 00000000001eb0c4: b9040016 lgr %r1,%r6 > 00000000001eb0c8: b904002b lgr %r2,%r11 > Call Trace: > (<00000000fd797c50> 0xfd797c50) > <00000000001eb5da> shmem_fill_super+0x13a/0x25c > <0000000000228cfa> get_sb_single+0xbe/0xdc > <000000000034ffc0> dev_get_sb+0x2c/0x38 > <000000000066c602> devtmpfs_init+0x46/0xc0 > <000000000066c53e> driver_init+0x22/0x60 > <000000000064d40a> kernel_init+0x24e/0x3d0 > <000000000010a7ea> kernel_thread_starter+0x6/0xc > <000000000010a7e4> kernel_thread_starter+0x0/0xc > > I never tried to boot a kernel with DEVTMPFS enabled on a s390 box. > So am wondering if this is supported or not ? If you think this > is supported i will send a mail to community on this. There is nothing arch specific to devtmpfs. This part crashes because the kernel tries to modify the data read-only section which is write protected on s390. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-16sysdev: fix prototype for memory_sysdev_class show/store functionsWu Fengguang
The function prototype mismatches in call stack: [<ffffffff81494268>] print_block_size+0x58/0x60 [<ffffffff81487e3f>] sysdev_class_show+0x1f/0x30 [<ffffffff811d629b>] sysfs_read_file+0xcb/0x1f0 [<ffffffff81176328>] vfs_read+0xc8/0x180 Due to prototype mismatch, print_block_size() will sprintf() into *attribute instead of *buf, hence user space will read the initial zeros from *buf: $ hexdump /sys/devices/system/memory/block_size_bytes 0000000 0000 0000 0000 0000 0000008 After patch: cat /sys/devices/system/memory/block_size_bytes 0x8000000 This complements commits c29af9636 and 4a0b2b4dbe. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: "Zheng, Shaohui" <shaohui.zheng@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-16memory-hotplug: add 0x prefix to HEX block_size_bytesWu Fengguang
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-11power: fix kernel-doc notationRandy Dunlap
Warning(drivers/base/power/main.c:453): No description found for parameter 'dev' Warning(drivers/base/power/main.c:453): No description found for parameter 'cb' Warning(drivers/base/power/main.c:719): No description found for parameter 'dev' Warning(drivers/base/power/main.c:719): No description found for parameter 'state' Warning(drivers/base/power/main.c:719): No description found for parameter 'cb' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-23devtmpfs: unlock mutex in case of string allocation errorKay Sievers
Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23Driver core: export platform_device_register_data as a GPL symbolMichael Hennerich
This allows MFD's to register/bind drivers for their sub devices while still being compiled as a module. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23driver core: Prevent reference to freed memory on error pathPhil Carmody
priv is drv->p. So only free drv->p after we've finished using priv. Found using a static code analysis tool Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23Driver-core: Fix bogus 0 error return in device_add()Thomas Gleixner
If device_add() is called with a device which does not have dev->p set up, then device_private_init() is called. If that succeeds, then the error variable is set to 0. Now if the dev_name(dev) check further down fails, then device_add() correctly terminates, but returns 0. That of course lets the driver progress. If later another driver uses this half set up device as parent then device_add() of the child device explodes and renders sysfs completely unusable. Set the error to -EINVAL if dev_name() check fails. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: "Hans J. Koch" <hjk@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23Driver core: driver_attribute parameters can often be const*Phil Carmody
Many struct driver_attribute descriptors are purely read-only structures, and there's no need to change them. Therefore make the promise not to, which will let those descriptors be put in a ro section. Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23Driver core: bin_attribute parameters can often be const*Phil Carmody
Many struct bin_attribute descriptors are purely read-only structures, and there's no need to change them. Therefore make the promise not to, which will let those descriptors be put in a ro section. Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23Driver core: device_attribute parameters can often be const*Phil Carmody
Most device_attributes are const, and are begging to be put in a ro section. However, the create and remove file interfaces were failing to propagate the const promise which the only functions they call offer. Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-23devtmpfs: Convert dirlock to a mutexThomas Gleixner
devtmpfs has a rw_lock dirlock which serializes delete_path and create_path. This code was obviously never tested with the usual set of debugging facilities enabled. In the dirlock held sections the code calls: - vfs functions which take mutexes - kmalloc(, GFP_KERNEL) In both code pathes the might sleep warning triggers and spams dmesg. Convert the rw_lock to a mutex. There is no reason why this needs to be a rwlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: Runtime PM documentation update PM / Runtime: Use device type and device class callbacks PM: Use pm_runtime_put_sync in system resume PM: Measure device suspend and resume times PM: Make the initcall_debug style timing for suspend/resume complete
2009-12-22PM / Runtime: Use device type and device class callbacksRafael J. Wysocki
The power management of some devices is handled through device types and device classes rather than through bus types. Since these devices may also benefit from using the run-time power management core, extend it so that the device type and device class run-time PM callbacks can be taken into consideration by it if the bus type callback is not defined. Update the run-time PM core documentation to reflect this change. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-12-21PM: Use pm_runtime_put_sync in system resumeAlan Stern
This patch (as1317) fixes a bug in the PM core. When a device is resumed following a system sleep, the core decrements the device's runtime PM usage counter but doesn't issue an idle notification if the counter reaches 0. This could prevent an otherwise unused device from being runtime-suspended again after the system sleep. The fix is to call pm_runtime_put_sync() instead of pm_runtime_put_noidle(). Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-12-18mm: Add notifier in pageblock isolation for balloon driversRobert Jennings
Memory balloon drivers can allocate a large amount of memory which is not movable but could be freed to accomodate memory hotplug remove. Prior to calling the memory hotplug notifier chain the memory in the pageblock is isolated. Currently, if the migrate type is not MIGRATE_MOVABLE the isolation will not proceed, causing the memory removal for that page range to fail. Rather than failing pageblock isolation if the migrateteype is not MIGRATE_MOVABLE, this patch checks if all of the pages in the pageblock, and not on the LRU, are owned by a registered balloon driver (or other entity) using a notifier chain. If all of the non-movable pages are owned by a balloon, they can be freed later through the memory notifier chain and the range can still be isolated in set_migratetype_isolate(). Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Ingo Molnar <mingo@elte.hu> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Gerald Schaefer <geralds@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-18PM: Measure device suspend and resume timesRafael J. Wysocki
Measure and print the time of suspending and resuming all devices. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-12-18PM: Make the initcall_debug style timing for suspend/resume completeRafael J. Wysocki
Commit f2511774863487e61b56a97da07ebf8dd61d7836 (PM: Add initcall_debug style timing for suspend/resume) introduced basic timing instrumentation, needed for a scritps/bootgraph.pl equivalent or humans, but it missed the fact that bus types and device classes which haven't been switched to using struct dev_pm_ops objects yet need special handling. As a result, the suspend/resume timing information is only available for devices whose bus types or device classes use struct dev_pm_ops objects, so the majority of devices is not covered. Fix this by adding basic suspend/resume timing instrumentation for devices whose bus types and device classes still don't use struct dev_pm_ops objects for power management. To reduce code duplication move the timing code to helper functions. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-12-16Merge branch 'hwpoison' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (34 commits) HWPOISON: Remove stray phrase in a comment HWPOISON: Try to allocate migration page on the same node HWPOISON: Don't do early filtering if filter is disabled HWPOISON: Add a madvise() injector for soft page offlining HWPOISON: Add soft page offline support HWPOISON: Undefine short-hand macros after use to avoid namespace conflict HWPOISON: Use new shake_page in memory_failure HWPOISON: Use correct name for MADV_HWPOISON in documentation HWPOISON: mention HWPoison in Kconfig entry HWPOISON: Use get_user_page_fast in hwpoison madvise HWPOISON: add an interface to switch off/on all the page filters HWPOISON: add memory cgroup filter memcg: add accessor to mem_cgroup.css memcg: rename and export try_get_mem_cgroup_from_page() HWPOISON: add page flags filter mm: export stable page flags HWPOISON: limit hwpoison injector to known page types HWPOISON: add fs/device filters HWPOISON: return 0 to indicate success reliably HWPOISON: make semantics of IGNORED/DELAYED clear ...
2009-12-16HWPOISON: Add soft page offline supportAndi Kleen
This is a simpler, gentler variant of memory_failure() for soft page offlining controlled from user space. It doesn't kill anything, just tries to invalidate and if that doesn't work migrate the page away. This is useful for predictive failure analysis, where a page has a high rate of corrected errors, but hasn't gone bad yet. Instead it can be offlined early and avoided. The offlining is controlled from sysfs, including a new generic entry point for hard page offlining for symmetry too. We use the page isolate facility to prevent re-allocation race. Normally this is only used by memory hotplug. To avoid races with memory allocation I am using lock_system_sleep(). This avoids the situation where memory hotplug is about to isolate a page range and then hwpoison undoes that work. This is a big hammer currently, but the simplest solution currently. When the page is not free or LRU we try to free pages from slab and other caches. The slab freeing is currently quite dumb and does not try to focus on the specific slab cache which might own the page. This could be potentially improved later. Thanks to Fengguang Wu and Haicheng Li for some fixes. [Added fix from Andrew Morton to adapt to new migrate_pages prototype] Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-12-15PM: rwsem.h need not be included into main.cRafael J. Wysocki
It is not necessary to include <linux/rwsem.h> into drivers/base/power/main.c, so don't do that. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-12-15PM: Remove unnecessary goto from device_resume_noirq()Rafael J. Wysocki
In device_resume_noirq() there is the 'End' label and the associated goto statement that aren't strictly necessary, so rework the code to get rid of them. Also modify device_suspend_noirq() so that it looks completely analogous to device_resume_noirq(). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-12-15PM: Add initcall_debug style timing for suspend/resumeArjan van de Ven
In order to diagnose overall suspend/resume times, we need basic instrumentation to break down the total time into per device timing, similar to initcall_debug. This patch adds the basic timing instrumentation, needed for a scritps/bootgraph.pl equivalent or humans. The bootgraph.pl program is still a work in progress, but is far enough along to know that this patch is sufficient. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-12-15PM: allow for usage_count > 0 in pm_runtime_get()Alan Stern
This patch (as1308c) fixes __pm_runtime_get(). Currently the routine will resume a device if the prior usage count was 0. But this isn't right; thanks to pm_runtime_get_noresume() the usage count can be positive even while the device is suspended. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-12-15mm: slab-allocate memory section nodemask for large systemsDavid Rientjes
Nodemasks should not be allocated on the stack for large systems (when it is larger than 256 bytes) since there is a threat of overflow. This patch causes the unregister_mem_sect_under_nodes() nodemask to be allocated on the stack for smaller systems and be allocated by slab for larger systems. GFP_KERNEL is used since remove_memory_block() can block. Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Alex Chiang <achiang@hp.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15mm: add numa node symlink for cpu devices in sysfsAlex Chiang
You can discover which CPUs belong to a NUMA node by examining /sys/devices/system/node/node#/ However, it's not convenient to go in the other direction, when looking at /sys/devices/system/cpu/cpu#/ Yes, you can muck about in sysfs, but adding these symlinks makes life a lot more convenient. Signed-off-by: Alex Chiang <achiang@hp.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15mm: refactor unregister_cpu_under_node()Alex Chiang
By returning early if the node is not online, we can unindent the interesting code by two levels. No functional change. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15mm: refactor register_cpu_under_node()Alex Chiang
By returning early if the node is not online, we can unindent the interesting code by one level. No functional change. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15mm: add numa node symlink for memory section in sysfsAlex Chiang
Commit c04fc586c (mm: show node to memory section relationship with symlinks in sysfs) created symlinks from nodes to memory sections, e.g. /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 If you're examining the memory section though and are wondering what node it might belong to, you can find it by grovelling around in sysfs, but it's a little cumbersome. Add a reverse symlink for each memory section that points back to the node to which it belongs. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15hugetlb: offload per node attribute registrationsLee Schermerhorn
Offload the registration and unregistration of per node hstate sysfs attributes to a worker thread rather than attempt the allocation/attachment or detachment/freeing of the attributes in the context of the memory hotplug handler. I don't know that this is absolutely required, but the registration can sleep in allocations and other mem hot plug handlers do it this way. If it turns out this is NOT required, we can drop this patch. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15hugetlb: handle memory hot-plug eventsLee Schermerhorn
Register per node hstate attributes only for nodes with memory. As suggested by David Rientjes. With Memory Hotplug, memory can be added to a memoryless node and a node with memory can become memoryless. Therefore, add a memory on/off-line notifier callback to [un]register a node's attributes on transition to/from memoryless state. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Acked-by: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15hugetlb: add per node hstate attributesLee Schermerhorn
Add the per huge page size control/query attributes to the per node sysdevs: /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/ nr_hugepages - r/w free_huge_pages - r/o surplus_huge_pages - r/o The patch attempts to re-use/share as much of the existing global hstate attribute initialization and handling, and the "nodes_allowed" constraint processing as possible. Calling set_max_huge_pages() with no node indicates a change to global hstate parameters. In this case, any non-default task mempolicy will be used to generate the nodes_allowed mask. A valid node id indicates an update to that node's hstate parameters, and the count argument specifies the target count for the specified node. From this info, we compute the target global count for the hstate and construct a nodes_allowed node mask contain only the specified node. Setting the node specific nr_hugepages via the per node attribute effectively ignores any task mempolicy or cpuset constraints. With this patch: (me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB ./ ../ free_hugepages nr_hugepages surplus_hugepages Starting from: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 0 Node 2 HugePages_Free: 0 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 0 Allocate 16 persistent huge pages on node 2: (me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages [Note that this is equivalent to: numactl -m 2 hugeadmin --pool-pages-min 2M:+16 ] Yields: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 16 Node 2 HugePages_Free: 16 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 16 Global controls work as expected--reduce pool to 8 persistent huge pages: (me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 8 Node 2 HugePages_Free: 8 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits) m68k: rename global variable vmalloc_end to m68k_vmalloc_end percpu: add missing per_cpu_ptr_to_phys() definition for UP percpu: Fix kdump failure if booted with percpu_alloc=page percpu: make misc percpu symbols unique percpu: make percpu symbols in ia64 unique percpu: make percpu symbols in powerpc unique percpu: make percpu symbols in x86 unique percpu: make percpu symbols in xen unique percpu: make percpu symbols in cpufreq unique percpu: make percpu symbols in oprofile unique percpu: make percpu symbols in tracer unique percpu: make percpu symbols under kernel/ and mm/ unique percpu: remove some sparse warnings percpu: make alloc_percpu() handle array types vmalloc: fix use of non-existent percpu variable in put_cpu_var() this_cpu: Use this_cpu_xx in trace_functions_graph.c this_cpu: Use this_cpu_xx for ftrace this_cpu: Use this_cpu_xx in nmi handling this_cpu: Use this_cpu operations in RCU this_cpu: Use this_cpu ops for VM statistics ... Fix up trivial (famous last words) global per-cpu naming conflicts in arch/x86/kvm/svm.c mm/slab.c
2009-12-12Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits) powerpc: Fix usage of 64-bit instruction in 32-bit altivec code MAINTAINERS: Add PowerPC patterns powerpc/pseries: Track previous CPPR values to correctly EOI interrupts powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP powerpc: Make "intspec" pointers in irq_host->xlate() const powerpc/8xx: DTLB Miss cleanup powerpc/8xx: Remove DIRTY pte handling in DTLB Error. powerpc/8xx: Start using dcbX instructions in various copy routines powerpc/8xx: Restore _PAGE_WRITETHRU powerpc/8xx: Add missing Guarded setting in DTLB Error. powerpc/8xx: Fixup DAR from buggy dcbX instructions. powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions. powerpc/8xx: Update TLB asm so it behaves as linux mm expects. powerpc/8xx: Invalidate non present TLBs powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate pseries/pseries: Add code to online/offline CPUs of a DLPAR node powerpc: stop_this_cpu: remove the cpu from the online map. powerpc/pseries: Add kernel based CPU DLPAR handling sysfs/cpu: Add probe/release files powerpc/pseries: Kernel DLPAR Infrastructure ...
2009-12-11Driver core: fix race in dev_driver_stringAlan Stern
This patch (as1310) works around a race in dev_driver_string(). If the device is unbound while the function is running, dev->driver might become NULL after we test it and before we dereference it. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@kernel.org> Cc: Oliver Neukum <oliver@neukum.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>