aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2008-10-20ext3: fix ext3_dx_readdir hash collision handlingEugene Dashevsky
This fixes a bug where readdir() would return a directory entry twice if there was a hash collision in an hash tree indexed directory. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eugene Dashevsky <eugene@ibrix.com> Signed-off-by: Mike Snitzer <msnitzer@ibrix.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20jbd: ordered data integrity fixHidehiro Kawai
In ordered mode, if a file data buffer being dirtied exists in the committing transaction, we write the buffer to the disk, move it from the committing transaction to the running transaction, then dirty it. But we don't have to remove the buffer from the committing transaction when the buffer couldn't be written out, otherwise it would miss the error and the committing transaction would not abort. This patch adds an error check before removing the buffer from the committing transaction. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20ext3: add an option to control error handling on file dataHidehiro Kawai
If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount a ext3 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20ext3: fix ext3 block reservation early ENOSPC issueMingming Cao
We could run into ENOSPC error on ext3, even when there is free blocks on the filesystem. The problem is triggered in the case the goal block group has 0 free blocks , and the rest block groups are skipped due to the check of "free_blocks < windowsz/2". Current code could fall back to non reservation allocation to prevent early ENOSPC after examing all the block groups with reservation on , but this code was bypassed if the reservation window is turned off already, which is true in this case. This patch fixed two issues: 1) We don't need to turn off block reservation if the goal block group has 0 free blocks left and continue search for the rest of block groups. Current code the intention is to turn off the block reservation if the goal allocation group has a few (some) free blocks left (not enough for make the desired reservation window),to try to allocation in the goal block group, to get better locality. But if the goal blocks have 0 free blocks, it should leave the block reservation on, and continues search for the next block groups,rather than turn off block reservation completely. 2) we don't need to check the window size if the block reservation is off. The problem was originally found and fixed in ext4. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20ext3: don't try to resize if there are no reserved gdt blocks leftJosef Bacik
When trying to resize a ext3 fs and you run out of reserved gdt blocks, you get an error that doesn't actually tell you what went wrong, it just says that the gdb it picked is not correct, which is the case since you don't have any reserved gdt blocks left. This patch adds a check to make sure you have reserved gdt blocks to use, and if not prints out a more relevant error. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Andreas Dilger <adilger@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20jbd: don't dirty original metadata buffer on abortHidehiro Kawai
Currently, original metadata buffers are dirtied when they are unfiled whether the journal has aborted or not. Eventually these buffers will be written-back to the filesystem by pdflush. This means some metadata buffers are written to the filesystem without journaling if the journal aborts. So if both journal abort and system crash happen at the same time, the filesystem would become inconsistent state. Additionally, replaying journaled metadata can overwrite the latest metadata on the filesystem partly. Because, if the journal aborts, journaled metadata are preserved and replayed during the next mount not to lose uncheckpointed metadata. This would also break the consistency of the filesystem. This patch prevents original metadata buffers from being dirtied on abort by clearing BH_JBDDirty flag from those buffers. Thus, no metadata buffers are written to the filesystem without journaling. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20jbd: abort when failed to log metadata buffersHidehiro Kawai
If we failed to write metadata buffers to the journal space and succeeded to write the commit record, stale data can be written back to the filesystem as metadata in the recovery phase. To avoid this, when we failed to write out metadata buffers, abort the journal before writing the commit record. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20phonedev: remove BKLRichard Holden
The phone_device array is covered by the phone_lock mutex in all cases and request_module no longer needs the BKL so we can remove the only remaining instance of the BKL from phonedev. Signed-off-by: Richard Holden <aciddeath@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20fb: convert lock/unlock_kernel() into local fb mutexKrzysztof Helt
Change lock_kernel()/unlock_kernel() to local fb mutex. Each frame buffer instance has its own mutex. The one line try_to_load() function is unrolled to request_module() in two places for readability. [righi.andrea@gmail.com: fb: fix NULL pointer BUG dereference in fb_open()] Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20fb: push down the BKL in the ioctl handlerAlan Cox
Framebuffer is heavily BKL dependant at the moment so just wrap the ioctl handler in the driver as we push down. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alan Cox <alan@redhat.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20gpiolib: fix oops in gpio_get_value_cansleep()David Brownell
We can get the following oops from gpio_get_value_cansleep() when a GPIO controller doesn't provide a get() callback: Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0x00000000 Oops: Kernel access of bad area, sig: 11 [#1] [...] NIP [00000000] 0x0 LR [c0182fb0] gpio_get_value_cansleep+0x40/0x50 Call Trace: [c7b79e80] [c0183f28] gpio_value_show+0x5c/0x94 [c7b79ea0] [c01a584c] dev_attr_show+0x30/0x7c [c7b79eb0] [c00d6b48] fill_read_buffer+0x68/0xe0 [c7b79ed0] [c00d6c54] sysfs_read_file+0x94/0xbc [c7b79ef0] [c008f24c] vfs_read+0xb4/0x16c [c7b79f10] [c008f580] sys_read+0x4c/0x90 [c7b79f40] [c0013a14] ret_from_syscall+0x0/0x38 It's OK to request the value of *any* GPIO; most GPIOs are bidirectional, so configuring them as outputs just enables an output driver and doesn't disable the input logic. So the problem is that gpio_get_value_cansleep() isn't making the same sanity check that gpio_get_value() does: making sure this GPIO isn't one of the atypical "no input logic" cases. Reported-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: <stable@kernel.org> [2.6.27.x, 2.6.26.x, 2.6.25.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20gpio: modify sysfs gpio export so that "value" displays as 0 or 1Steven A. Falco
gpiolib can export GPIOs to userspace via sysfs. This patch modifies the gpio_value_show() so that any non-zero value is explicitly printed as "1", rather than whatever numerical value the lower-level driver returns. Signed-off-by: Steve Falco <sfalco@harris.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20rtc-cmos: export second NVRAM bankDavid Brownell
Teach rtc-cmos about the second bank of registers found on most modern x86 systems, giving access to 128 bytes more NVRAM. This version only sees that extra NVRAM when both register banks are provided as part of *one* PNP resource. Since BIOS on some systems presents them using two IO resources, and nothing merges them, this can't always show all the NVRAM. (We're supposed to be able to use PNP id PNP0b01 too, but BIOS tables doesn't often seem to use that particular option.) Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20Altix serial: fixroel kluin
In function sn_sal_switch_to_asynch(): drivers/serial/sn_console.c:713: HZ * SN_SAL_UART_FIFO_DEPTH / SN_SAL_UART_FIFO_SPEED_CPS; After preprocessing (see defines in patch) this becomes HZ * 16 / 9600 / 10 (associativity from left to right), not equivalent to HZ * 16 / 960. Looks-obviously-right-to: Tony Luck <tony.luck@intel.com> Cc: Jes Sorensen <jes@sgi.com> Acked-by: Pat Gefre <pfg@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20make probe_serial_gsc() staticAdrian Bunk
This patch makes the needlessly global probe_serial_gsc() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20Char: sx, remove bogus iomapJiri Slaby
readl/writel are not expected to accept iomap return value. Replace bogus mapping by standard ioremap. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: <R.E.Wolff@BitWizard.nl> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon: applesmc: lighter wait mechanism, drastic improvementHenrik Rydberg
The read fail ratio is sensitive to the delay between the first byte written and the first byte read; apparently the sensors cannot be rushed. Increasing the minimum wait time, without changing the total wait time, improves the fail ratio from a 8% chance that any of the sensors fails in one read, down to 0.4%, on a Macbook Air. On a Macbook Pro 3,1, the effect is even more apparent. By reducing the number of status polls, the ratio is further improved to below 0.1%. Finally, increasing the total wait time brings the fail ratio down to virtually zero. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Tested-by: Bob McElrath <bob@mcelrath.org> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon: applesmc: Add support for Macbook Pro 3Henrik Rydberg
Add temperature sensor support for Macbook Pro 3. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon: applesmc: Add support for Macbook Pro 4Henrik Rydberg
Adds temperature sensor support for the Macbook Pro 4. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20drivers/hwmon/applesmc.c: remove unneeded castsAndrew Morton
dmi_system_id.driver_data is already void*. Cc: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon: applesmc: add support for Macbook AirHenrik Rydberg
This patch adds accelerometer, backlight and temperature sensor support for the Macbook Air. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon: applesmc: allow for variable ALV0 and ALV1 package lengthHenrik Rydberg
On some recent Macbooks, the package length for the light sensors ALV0 and ALV1 has changed from 6 to 10. This patch allows for a variable package length encompassing both variants. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon: applesmc: prolong status waitHenrik Rydberg
The time to wait for a status change while reading or writing to the SMC ports is a balance between read reliability and system performance. The current setting yields rougly three errors in a thousand when simultaneously reading three different temperature values on a Macbook Air. This patch increases the setting to a value yielding roughly one error in ten thousand, with no noticable system performance degradation. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon: applesmc: fix the 'wait status failed: c != 8' problemHenrik Rydberg
On many Macbooks since mid 2007, the Pro, C2D and Air models, applesmc fails to read some or all SMC ports. This problem has various effects, such as flooded logfiles, malfunctioning temperature sensors, accelerometers failing to initialize, and difficulties getting backlight functionality to work properly. The root of the problem seems to be the command protocol. The current code sends out a command byte, then repeatedly polls for an ack before continuing to send or recieve data. From experiments leading to this patch, it seems the command protocol never quite worked or changed so that one now sends a command byte, waits a little bit, polls for an ack, and if it fails, repeats the whole thing by sending the command byte again. This patch implements a send_command function according to the new interpretation of the protocol, and should work also for earlier models. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon: applesmc: specified number of bytes to read should match actualHenrik Rydberg
At one single place in the code, the specified number of bytes to read and the actual number of bytes read differ by one. This one-liner patch fixes that inconsistency. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Cc: Nicolas Boichat <nicolas@boichat.ch> Cc: Riki Oktarianto <rkoktarianto@gmail.com> Cc: Mark M. Hoffman <mhoffman@lightlink.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon/pc87360 separate alarm files: add therm-min/max/crit-alarmsJim Cromie
Adds therm-min/max/crit-alarm callbacks, sensor-device-attribute declarations, and refs to those new decls in the macro used to initialize the therm_group (of sysfs files) The thermistors use voltage channels to measure; so they don't have a fault-alarm, but unlike the other voltages, they do have an overtemp, which we call crit (by convention). [akpm@linux-foundation.org: cleanup] Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon/pc87360 separate alarm files: add dev_dbg helpJim Cromie
temp and vin status register values may be set by chip specifications, set again by bios, or by this previously loaded driver. Debug output nicely displays modprobe init=\d actions. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon/pc87360 separate alarm files: define LDNI_MAX constJim Cromie
Driver handles 3 logical devices in fixed length array. Give this a define-d constant. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon/pc87360 separate alarm files: add temp-min/max/crit/fault-alarmsJim Cromie
Adds temp-min/max/crit/fault-alarm callbacks, sensor-device-attribute declarations, and refs to those new decls in the macro used to initialize the temp_group (of sysfs files) [akpm@linux-foundation.org: cleanups] Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon/pc87360 separate alarm files: add in-min/max-alarmsJim Cromie
Adds vin-min/max-alarm callbacks, sensor-device-attribute declarations, and refs to those new decls in the macro used to initialize the vin_group (of sysfs files) [akpm@linux-foundation.org: cleanups] Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20hwmon/pc87360 separate alarm files: define some constantsJim Cromie
Bring hwmon/pc87360 into agreement with Documentation/hwmon/sysfs-interface. Patchset adds separate limit alarms for voltages and temps, it also adds temp[123]_fault files. On my Soekris, temps 1,2 are unused/unconnected, so temp[123]_fault = 1,1,0 respectively. This agrees with /usr/bin/sensors, which has always shown them as OPEN. Temps 4,5,6 are thermistor based, and dont have a fault bit in their status register. This patch: 2 different kinds of constants added: - CHAN_ALM_* constants for (later) vin, temp alarm callbacks. - CHAN_* conversion constants, used in _init_device, partly for RW1C bits Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: "Mark M. Hoffman" <mhoffman@lightlink.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20intel-iommu: typo fix and correct word in the commentAmeya Palande
Fix for a typo and and replacing incorrect word in the comment. Signed-off-by: Ameya Palande <2ameya@gmail.com> Cc: "Ashok Raj" <ashok.raj@intel.com> Cc: "Shaohua Li" <shaohua.li@intel.com> Cc: "Anil S Keshavamurthy" <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20kernel/configs.c: remove useless commentsWANG Cong
These comments are useless, remove them. Signed-off-by: WANG Cong <wangcong@zeuux.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20HP-WMI: additional keycode (or typo)Eric Piel
On my HP 2510, pressing the (i) button generates an unknown keycode: 0x213b. So here is a patch adding support for it. However, as it seems there is already support for a similar button connected to 0x231b as keycode, I wonder if it could be a typo in the driver? Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20Fix documentation of sysrq-qAndi Kleen
I fell into the trap recently that it only dumps hrtimers instead of all timers. Fix the documentation. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20uml: fix a compile errorWANG Cong
Fix arch/um/sys-i386/signal.c: In function 'copy_sc_from_user': arch/um/sys-i386/signal.c:182: warning: dereferencing 'void *' pointer arch/um/sys-i386/signal.c:182: error: request for member '_fxsr_env' in something not a structure or union Signed-off-by: WANG Cong <wangcong@zeuux.org> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20arch/m68k/bvme6000/rtc.c: remove duplicated includeHuang Weiyi
Removed duplicated include file <linux/smp_lock.h> in arch/m68k/bvme6000/rtc.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20container freezer: document the cgroup freezer subsystem.Matt Helsley
Describe why we need the freezer subsystem and how to use it in a documentation file. Since the cgroups.txt file is focused on the subsystem-agnostic portions of cgroups make a directory and move the old cgroups.txt file at the same time. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: containers@lists.linux-foundation.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20container freezer: rename check_if_frozen()Matt Helsley
check_if_frozen() sounds like it should return something when in fact it's just updating the freezer state. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20container freezer: make freezer state names less genericMatt Helsley
Rename cgroup freezer states to be less generic to avoid any name collisions while also better describing what each state is. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20container freezer: prevent frozen tasks or cgroups from changingMatt Helsley
Don't let frozen tasks or cgroups change. This means frozen tasks can't leave their current cgroup for another cgroup. It also means that tasks cannot be added to or removed from a cgroup in the FROZEN state. We enforce these rules by checking for frozen tasks and cgroups in the can_attach() function. Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20container freezer: skip frozen cgroups during power management resumeMatt Helsley
When a system is resumed after a suspend, it will also unfreeze frozen cgroups. This patchs modifies the resume sequence to skip the tasks which are part of a frozen control group. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20container freezer: implement freezer cgroup subsystemMatt Helsley
This patch implements a new freezer subsystem in the control groups framework. It provides a way to stop and resume execution of all tasks in a cgroup by writing in the cgroup filesystem. The freezer subsystem in the container filesystem defines a file named freezer.state. Writing "FROZEN" to the state file will freeze all tasks in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. Reading will return the current state. * Examples of usage : # mkdir /containers/freezer # mount -t cgroup -ofreezer freezer /containers # mkdir /containers/0 # echo $some_pid > /containers/0/tasks to get status of the freezer subsystem : # cat /containers/0/freezer.state RUNNING to freeze all tasks in the container : # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FREEZING # cat /containers/0/freezer.state FROZEN to unfreeze all tasks in the container : # echo RUNNING > /containers/0/freezer.state # cat /containers/0/freezer.state RUNNING This is the basic mechanism which should do the right thing for user space task in a simple scenario. It's important to note that freezing can be incomplete. In that case we return EBUSY. This means that some tasks in the cgroup are busy doing something that prevents us from completely freezing the cgroup at this time. After EBUSY, the cgroup will remain partially frozen -- reflected by freezer.state reporting "FREEZING" when read. The state will remain "FREEZING" until one of these things happens: 1) Userspace cancels the freezing operation by writing "RUNNING" to the freezer.state file 2) Userspace retries the freezing operation by writing "FROZEN" to the freezer.state file (writing "FREEZING" is not legal and returns EIO) 3) The tasks that blocked the cgroup from entering the "FROZEN" state disappear from the cgroup's set of tasks. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: export thaw_process] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20container freezer: make refrigerator always availableMatt Helsley
Now that the TIF_FREEZE flag is available in all architectures, extract the refrigerator() and freeze_task() from kernel/power/process.c and make it available to all. The refrigerator() can now be used in a control group subsystem implementing a control group freezer. Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Matt Helsley <matthltc@us.ibm.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20container freezer: add TIF_FREEZE flag to all architecturesMatt Helsley
This patch series introduces a cgroup subsystem that utilizes the swsusp freezer to freeze a group of tasks. It's immediately useful for batch job management scripts. It should also be useful in the future for implementing container checkpoint/restart. The freezer subsystem in the container filesystem defines a cgroup file named freezer.state. Reading freezer.state will return the current state of the cgroup. Writing "FROZEN" to the state file will freeze all tasks in the cgroup. Subsequently writing "RUNNING" will unfreeze the tasks in the cgroup. * Examples of usage : # mkdir /containers/freezer # mount -t cgroup -ofreezer freezer /containers # mkdir /containers/0 # echo $some_pid > /containers/0/tasks to get status of the freezer subsystem : # cat /containers/0/freezer.state RUNNING to freeze all tasks in the container : # echo FROZEN > /containers/0/freezer.state # cat /containers/0/freezer.state FREEZING # cat /containers/0/freezer.state FROZEN to unfreeze all tasks in the container : # echo RUNNING > /containers/0/freezer.state # cat /containers/0/freezer.state RUNNING This patch: The first step in making the refrigerator() available to all architectures, even for those without power management. The purpose of such a change is to be able to use the refrigerator() in a new control group subsystem which will implement a control group freezer. [akpm@linux-foundation.org: fix sparc] Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: Pavel Machek <pavel@suse.cz> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Tested-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20mm: extract do_pages_move() out of sys_move_pages()Brice Goglin
To prepare the chunking, move the sys_move_pages() code that is used when nodes!=NULL into do_pages_move(). And rename do_move_pages() into do_move_page_to_node_array(). Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20mm: don't vmalloc a huge page_to_node array for do_pages_stat()Brice Goglin
do_pages_stat() does not need any page_to_node entry for real. Just pass the pointers to the user-space page address array and to the user-space status array, and have do_pages_stat() traverse the former and fill the latter directly. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20mm: stop returning -ENOENT from sys_move_pages() if nothing got migratedBrice Goglin
A patchset reworking sys_move_pages(). It removes the possibly large vmalloc by using multiple chunks when migrating large buffers. It also dramatically increases the throughput for large buffers since the lookup in new_page_node() is now limited to a single chunk, causing the quadratic complexity to have a much slower impact. There is no need to use any radix-tree-like structure to improve this lookup. sys_move_pages() duration on a 4-quadcore-opteron 2347HE (1.9Gz), migrating between nodes #2 and #3: length move_pages (us) move_pages+patch (us) 4kB 126 98 40kB 198 168 400kB 963 937 4MB 12503 11930 40MB 246867 11848 Patches #1 and #4 are the important ones: 1) stop returning -ENOENT from sys_move_pages() if nothing got migrated 2) don't vmalloc a huge page_to_node array for do_pages_stat() 3) extract do_pages_move() out of sys_move_pages() 4) rework do_pages_move() to work on page_sized chunks 5) move_pages: no need to set pp->page to ZERO_PAGE(0) by default This patch: There is no point in returning -ENOENT from sys_move_pages() if all pages were already on the right node, while we return 0 if only 1 page was not. Most application don't know where their pages are allocated, so it's not an error to try to migrate them anyway. Just return 0 and let the status array in user-space be checked if the application needs details. It will make the upcoming chunked-move_pages() support much easier. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20memory hotplug: release memory regions in PAGES_PER_SECTION chunksNathan Fontenot
During hotplug memory remove, memory regions should be released on a PAGES_PER_SECTION size chunks. This mirrors the code in add_memory where resources are requested on a PAGES_PER_SECTION size. Attempting to release the entire memory region fails because there is not a single resource for the total number of pages being removed. Instead the resources for the pages are split in PAGES_PER_SECTION size chunks as requested during memory add. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20documentation: clarify dirty_ratio and dirty_background_ratio descriptionAndrea Righi
The current documentation of dirty_ratio and dirty_background_ratio is a bit misleading. In the documentation we say that they are "a percentage of total system memory", but the current page writeback policy, intead, is to apply the percentages to the dirtyable memory, that means free pages + reclaimable pages. Better to be more explicit to clarify this concept. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>