aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2007-10-18fuse: add file handle to getattr operationMiklos Szeredi
Add necessary protocol changes for supplying a file handle with the getattr operation. Step the API version to 7.9. This patch doesn't actually supply the file handle, because that needs some kind of VFS support, which we haven't yet been able to agree upon. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18ext3: support large blocksize up to PAGESIZETakashi Sato
This patch set supports large block size(>4k, <=64k) in ext3 just enlarging the block size limit. But it is NOT possible to have 64kB blocksize on ext3 without some changes to the directory handling code. The reason is that an empty 64kB directory block would have a rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in the filesystem. The proposed solution is treat 64k rec_len with a an impossible value like rec_len = 0xffff to handle this. The Patch-set consists of the following 2 patches. [1/2] ext3: enlarge blocksize - Allow blocksize up to pagesize [2/2] ext3: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize Now on 64k page ppc64 box runs with this patch set we could create a 64k block size ext3, and able to handle empty directory block. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18bit_spin_lock: use lock bitopsNick Piggin
Convert bit_spin_lock to new locking bitops. Slub can use the non-atomic store version to clear (Christoph?) Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18powerpc: lock bitopsNick Piggin
Add non-trivial lock bitops implementation for powerpc. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18mips: lock bitopsNick Piggin
mips can avoid one mb when acquiring a lock with test_and_set_bit_lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18mips: fix bitopsNick Piggin
Documentation/atomic_ops.txt defines these primitives must contain a memory barrier both before and after their memory operation. This is consistent with the atomic ops implementation on mips. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18ia64: lock bitopsNick Piggin
Convert ia64 to new bitops. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18alpha: lock bitopsNick Piggin
Alpha can avoid one mb when acquiring a lock with test_and_set_bit_lock. [bunk@kernel.org: alpha bitops.h must #include <asm/barrier.h>] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18alpha: fix bitopsNick Piggin
Documentation/atomic_ops.txt defines these primitives must contain a memory barrier both before and after their memory operation. This is consistent with the atomic ops implementation on alpha. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18bitops: introduce lock opsNick Piggin
Introduce test_and_set_bit_lock / clear_bit_unlock bitops with lock semantics. Convert all architectures to use the generic implementation. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-By: David Howells <dhowells@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Andi Kleen <ak@muc.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18Redefine {un}register_hotcpu_notifier() !HOTPLUG_CPU stubsSatyam Sharma
The return of the present "do {} while" based stub definition of register_hotcpu_notifier() cannot be checked. This makes the stub asymmetric w.r.t. the real HOTPLUG_CPU=y implementation that is int-returning. So let us redefine this to be consistent with the full version. Also do the same for unregister_hotcpu_notifier(). We cannot define these as static inline functions due to an existing GCC bug (#33172). So define as macros that return appropriately instead (int '0' for the register_hotcpu_notifier case and void for unregister_hotcpu_notifier). Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18powerpc: add scaled time accountingMichael Neuling
This adds POWERPC specific hooks for scaled time accounting. POWER6 includes a SPURR register. The SPURR is based off the PURR register but is scaled based on CPU frequency and issue rates. This gives a more accurate account of the instructions used per task. The PURR and timebase will be constant relative to the wall clock, irrespective of the CPU frequency. This implementation reads the SPURR register in account_system_vtime which is only call called on context witch and hard and soft irq entry and exit. The percentage of user and system time is then estimated using the ratio of these accounted by the PURR. If the SPURR is not present, the PURR read. An earlier implementation of this patch read the SPURR whenever the PURR was read, which included the system call entry and exit path. Unfortunately this showed a performance regression on lmbench runs, so was re-implemented. I've included the lmbench results here when run bare metal on POWER6. 1st column is the unpatch results. 2nd column is the results using the below patch and the 3rd is the % diff of these results from the base. 4th and 5th columns are the results and % differnce from the base using the older patch (SPURR read in syscall entry/exit path). Base Scaled-Acct SPURR-in-syscall Result Result % diff Result % diff Simple syscall: 0.3086 0.3086 0.0000 0.3452 11.8600 Simple read: 0.4591 0.4671 1.7425 0.5044 9.86713 Simple write: 0.4364 0.4366 0.0458 0.4731 8.40971 Simple stat: 2.0055 2.0295 1.1967 2.0669 3.06158 Simple fstat: 0.5962 0.5876 -1.442 0.6368 6.80979 Simple open/close: 3.1283 3.1009 -0.875 3.2088 2.57328 Select on 10 fd's: 0.8554 0.8457 -1.133 0.8667 1.32101 Select on 100 fd's: 3.5292 3.6329 2.9383 3.6664 3.88756 Select on 250 fd's: 7.9097 8.1881 3.5197 8.2242 3.97613 Select on 500 fd's: 15.2659 15.836 3.7357 15.873 3.97814 Select on 10 tcp fd's: 0.9576 0.9416 -1.670 0.9752 1.83792 Select on 100 tcp fd's: 7.248 7.2254 -0.311 7.2685 0.28283 Select on 250 tcp fd's: 17.7742 17.707 -0.375 17.749 -0.1406 Select on 500 tcp fd's: 35.4258 35.25 -0.496 35.286 -0.3929 Signal handler installation: 0.6131 0.6075 -0.913 0.647 5.52927 Signal handler overhead: 2.0919 2.1078 0.7600 2.1831 4.35967 Protection fault: 0.7345 0.7478 1.8107 0.8031 9.33968 Pipe latency: 33.006 16.398 -50.31 33.475 1.42368 AF_UNIX sock stream latency: 14.5093 30.910 113.03 30.715 111.692 Process fork+exit: 219.8 222.8 1.3648 229.37 4.35623 Process fork+execve: 876.14 873.28 -0.32 868.66 -0.8533 Process fork+/bin/sh -c: 2830 2876.5 1.6431 2958 4.52296 File /var/tmp/XXX write bw: 1193497 1195536 0.1708 118657 -0.5799 Pagefaults on /var/tmp/XXX: 3.1272 3.2117 2.7020 3.2521 3.99398 Also, kernel compile times show no difference with this patch applied. [pbadari@us.ibm.com: Avoid unnecessary PURR reading] Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18add-scaled-time-to-taskstats-based-process-accounting fixMichael Neuling
This moves the new items to the end of the taskstats struct as requested by Balbir and yourself. Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18Add scaled time to taskstats based process accountingMichael Neuling
This adds items to the taststats struct to account for user and system time based on scaling the CPU frequency and instruction issue rates. Adds account_(user|system)_time_scaled callbacks which architectures can use to account for time using this mechanism. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18Char: cyclades, fix some -W warningsJiri Slaby
Most of them are signedness, the rest unused function parameters. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18Char: cyclades, remove bottom half processingJiri Slaby
The work done in bottom half doesn't cost much cpu time (e.g. tty_hangup itself schedules its own bottom half), it's possible to do the work in isr directly and save hence some .text. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18V3 file capabilities: alter behavior of cap_setpcapAndrew Morgan
The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1, can change the capabilities of another process, p2. This is not the meaning that was intended for this capability at all, and this implementation came about purely because, without filesystem capabilities, there was no way to use capabilities without one process bestowing them on another. Since we now have a filesystem support for capabilities we can fix the implementation of CAP_SETPCAP. The most significant thing about this change is that, with it in effect, no process can set the capabilities of another process. The capabilities of a program are set via the capability convolution rules: pI(post-exec) = pI(pre-exec) pP(post-exec) = (X(aka cap_bset) & fP) | (pI(post-exec) & fI) pE(post-exec) = fE ? pP(post-exec) : 0 at exec() time. As such, the only influence the pre-exec() program can have on the post-exec() program's capabilities are through the pI capability set. The correct implementation for CAP_SETPCAP (and that enabled by this patch) is that it can be used to add extra pI capabilities to the current process - to be picked up by subsequent exec()s when the above convolution rules are applied. Here is how it works: Let's say we have a process, p. It has capability sets, pE, pP and pI. Generally, p, can change the value of its own pI to pI' where (pI' & ~pI) & ~pP = 0. That is, the only new things in pI' that were not present in pI need to be present in pP. The role of CAP_SETPCAP is basically to permit changes to pI beyond the above: if (pE & CAP_SETPCAP) { pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */ } This capability is useful for things like login, which (say, via pam_cap) might want to raise certain inheritable capabilities for use by the children of the logged-in user's shell, but those capabilities are not useful to or needed by the login program itself. One such use might be to limit who can run ping. You set the capabilities of the 'ping' program to be "= cap_net_raw+i", and then only shells that have (pI & CAP_NET_RAW) will be able to run it. Without CAP_SETPCAP implemented as described above, login(pam_cap) would have to also have (pP & CAP_NET_RAW) in order to raise this capability and pass it on through the inheritable set. Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18sysctl: Error on bad sysctl tablesEric W. Biederman
After going through the kernels sysctl tables several times it has become clear that code review and testing is just not effective in prevent problematic sysctl tables from being used in the stable kernel. I certainly can't seem to fix the problems as fast as they are introduced. Therefore this patch adds sysctl_check_table which is called when a sysctl table is registered and checks to see if we have a problematic sysctl table. The biggest part of the code is the table of valid binary sysctl entries, but since we have frozen our set of binary sysctls this table should not need to change, and it makes it much easier to detect when someone unintentionally adds a new binary sysctl value. As best as I can determine all of the several hundred errors spewed on boot up now are legitimate. [bunk@kernel.org: kernel/sysctl_check.c must #include <linux/string.h>] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18sysctl: properly register the irda binary sysctl numbersEric W. Biederman
Grumble. These numbers should have been in sysctl.h from the beginning if we ever expected anyone to use them. Oh well put them there now so we can find them and make maintenance easier. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18sysctl: parport remove binary pathsEric W. Biederman
The sysctl binary paths don't look as if they even code work, .data is not filled in, and all of the proc_handlers look at extra1 and there is not strategy routine. So just kill the binary paths. In addition this patch removes the setting of extra1 on directories. It doesn't look like the parport code ever examines it, and it's bad sysctl form. [bunk@kernel.org: remove parport_device_num()] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18sysctl: Factor out sysctl_data.Eric W. Biederman
There as been no easy way to wrap the default sysctl strategy routine except for returning 0. Which is not always what we want. The few instances I have seen that want different behaviour have written their own version of sysctl_data. While not too hard it is unnecessary code and has the potential for extra bugs. So to make these situations easier and make that part of sysctl more symetric I have factord sysctl_data out of do_sysctl_strategy and exported as a function everyone can use. Further having sysctl_data be an explicit function makes checking for badly formed sysctl tables much easier. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18sysctl core: Stop using the unnecessary ctl_table typedefEric W. Biederman
In sysctl.h the typedef struct ctl_table ctl_table violates coding style isn't needed and is a bit of a nuisance because it makes it harder to recognize ctl_table is a type name. So this patch removes it from the generic sysctl code. Hopefully I will have enough energy to send the rest of my patches will follow and to remove it from the rest of the kernel. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18stop using DMA_xxBIT_MASKAndrew Morton
Now that we have DMA_BIT_MASK(), these macros are pointless. Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18unify DMA_..BIT_MASK definitions: v3.1Borislav Petkov
Remove redundant DMA_..BIT_MASK definitions across two drivers. The computation of the majority of the bitmasks is done by the compiler. The initial split of the patch touching each a different file got removed due to possible git bisect breakage. Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Reviewed-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18Fix discrepancy between VDSO based gettimeofday() and sys_gettimeofday().Tony Breeds
On platforms that copy sys_tz into the vdso (currently only x86_64, soon to include powerpc), it is possible for the vdso to get out of sync if a user calls (admittedly unusual) settimeofday(NULL, ptr). This patch adds a hook for architectures that set CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also updatee their copy in the vdso. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Andi Kleen <ak@suse.de> Cc: Tony Luck <tony.luck@intel.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18Remove struct task_struct::io_waitAlexey Dobriyan
Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b during 2.6.9 development. Commit introduced io_wait field which remained write-only than and still remains write-only. Also garbage collect macros which "use" io_wait. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18Hibernation: Check if ACPI is enabled during restore in the right placeRafael J. Wysocki
The following scenario leads to total confusion of the platform firmware on some boxes (eg. HPC nx6325): * Hibernate with ACPI enabled * Resume passing "acpi=off" to the boot kernel To prevent this from happening it's necessary to check if ACPI is enabled (and enable it if that's not the case) _right_ _after_ control has been transfered from the boot kernel to the image kernel, before device_power_up() is called (ie. with interrupts disabled).  Enabling ACPI after calling device_power_up() turns out to be insufficient. For this reason, introduce new hibernation callback ->leave() that will be executed before device_power_up() by the restored image kernel.  To make it work, it also is necessary to move swsusp_suspend() from swsusp.c to disk.c (it's name is changed to "create_image", which is more up to the point). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18Hibernation: Arbitrary boot kernel support on x86_64Rafael J. Wysocki
Make it possible to restore a hibernation image on x86_64 with the help of a kernel different from the one in the image. The idea is to split the core restoration code into two separate parts and to place each of them in a different page.  The first part belongs to the boot kernel and is executed as the last step of the image kernel's memory restoration procedure.  Before being executed, it is relocated to a safe page that won't be overwritten while copying the image kernel pages. The final operation performed by it is a jump to the second part of the core restoration code that belongs to the image kernel and has just been restored. This code makes the CPU switch to the image kernel's page tables and restores the state of general purpose registers (including the stack pointer) from before the hibernation. The main issue with this idea is that in order to jump to the second part of the core restoration code the boot kernel needs to know its address.  However, this address may be passed to it in the image header.  Namely, the part of the image header previously used for checking if the version of the image kernel is correct can be replaced with some architecture specific data that will allow the boot kernel to jump to the right address within the image kernel.  These data should also be used for checking if the image kernel is compatible with the boot kernel (as far as the memory restroration procedure is concerned). It can be done, for example, with the help of a "magic" value that has to be equal in both kernels, so that they can be regarded as compatible. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18serial: turn serial console suspend a boot rather than compile time optionAndres Salomon
Currently, there's a CONFIG_DISABLE_CONSOLE_SUSPEND that allows one to stop the serial console from being suspended when the rest of the machine goes to sleep. This is incredibly useful for debugging power management-related things; however, having it as a compile-time option has proved to be incredibly inconvenient for us (OLPC). There are plenty of times that we want serial console to not suspend, but for the most part we'd like serial console to be suspended. This drops CONFIG_DISABLE_CONSOLE_SUSPEND, and replaces it with a kernel boot parameter (no_console_suspend). By default, the serial console will be suspended along with the rest of the system; by passing 'no_console_suspend' to the kernel during boot, serial console will remain alive during suspend. For now, this is pretty serial console specific; further fixes could be applied to make this work for things like netconsole. Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18freezer: introduce freezer-friendly waiting macrosRafael J. Wysocki
Introduce freezer-friendly wrappers around wait_event_interruptible() and wait_event_interruptible_timeout(), originally defined in <linux/wait.h>, to be used in freezable kernel threads. Make some of the freezable kernel threads use them. This is necessary for the freezer to stop sending signals to kernel threads, which is implemented in the next patch. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18PM: Rename hibernation_ops to platform_hibernation_opsRafael J. Wysocki
Rename 'struct hibernation_ops' to 'struct platform_hibernation_ops' in analogy with 'struct platform_suspend_ops'. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18PM: Rework struct hibernation_opsRafael J. Wysocki
During hibernation we also need to tell the ACPI core that we're going to put the system into the S4 sleep state. For this reason, an additional method in 'struct hibernation_ops' is needed, playing the role of set_target() in 'struct platform_suspend_operations'. Moreover, the role of the .prepare() method is now different, so it's better to introduce another method, that in general may be different from .prepare(), that will be used to prepare the platform for creating the hibernation image (.prepare() is used anyway to notify the platform that we're going to enter the low power state after the image has been saved). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18PM: Make suspend_ops staticRafael J. Wysocki
The variable suspend_ops representing the set of global platform-specific suspend-related operations, used by the PM core, need not be exported outside of kernel/power/main.c .  Make it static. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18PM: Rework struct platform_suspend_opsRafael J. Wysocki
There is no reason why the .prepare() and .finish() methods in 'struct platform_suspend_ops' should take any arguments, since architectures don't use these methods' argument in any practically meaningful way (ie. either the target system sleep state is conveyed to the platform by .set_target(), or there is only one suspend state supported and it is indicated to the PM core by .valid(), or .prepare() and .finish() aren't defined at all).  There also is no reason why .finish() should return any result. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18PM: Rename struct pm_ops and related thingsRafael J. Wysocki
The name of 'struct pm_ops' suggests that it is related to the power management in general, but in fact it is only related to suspend.  Moreover, its name should indicate what this structure is used for, so it seems reasonable to change it to 'struct platform_suspend_ops'.  In that case, the name of the global variable of this type used by the PM core and the names of related functions should be changed accordingly. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18PM: Move definition of struct pm_ops to suspend.hRafael J. Wysocki
Move the definition of 'struct pm_ops' and related functions from <linux/pm.h> to <linux/suspend.h> . There are, at least, the following reasons to do that: * 'struct pm_ops' is specifically related to suspend and not to the power management in general. * As long as 'struct pm_ops' is defined in <linux/pm.h>, any modification of it causes the entire kernel to be recompiled, which is unnecessary and annoying. * Some suspend-related features are already defined in <linux/suspend.h>, so it is logical to move the definition of 'struct pm_ops' into there. * 'struct hibernation_ops', being the hibernation-related counterpart of 'struct pm_ops', is defined in <linux/suspend.h> . Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ext4: Convert ext4_extent_idx.ei_leaf to ext4_extent_idx.ei_leaf_loAneesh Kumar K.V
Convert ext4_extent_idx.ei_leaf ext4_extent_idx.ei_leaf_lo This helps in finding BUGs due to direct partial access of these split 48 bit values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2007-10-17ext4: Convert ext4_extent.ee_start to ext4_extent.ee_start_loAneesh Kumar K.V
Convert ext4_extent.ee_start to ext4_extent.ee_start_lo This helps in finding BUGs due to direct partial access of these split 48 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2007-10-17ext4: Convert s_r_blocks_count and s_free_blocks_countAneesh Kumar K.V
Convert s_r_blocks_count and s_free_blocks_count to s_r_blocks_count_lo and s_free_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-10-17ext4: Convert s_blocks_count to s_blocks_count_loAneesh Kumar K.V
Convert s_blocks_count to s_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2007-10-17ext4: Convert bg_inode_bitmap and bg_inode_tableAneesh Kumar K.V
Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo and bg_inode_table_lo. This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix one direct partial access Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2007-10-17ext4: Convert bg_block_bitmap to bg_block_bitmap_loAneesh Kumar K.V
Convert bg_block_bitmap to bg_block_bitmap_lo This helps in catching some BUGS due to direct partial access of these split fields. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2007-10-17ext4: FLEX_BG Kernel support v2.Jose R. Santos
This feature relaxes check restrictions on where each block groups meta data is located within the storage media. This allows for the allocation of bitmaps or inode tables outside the block group boundaries in cases where bad blocks forces us to look for new blocks which the owning block group can not satisfy. This will also allow for new meta-data allocation schemes to improve performance and scalability. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-17ext4: Fix sparse warningsAneesh Kumar K.V
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-17Ext4: Uninitialized Block GroupsAndreas Dilger
In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. The feature is enabled through a mkfs option mke2fs /dev/ -O uninit_groups A patch adding support for uninitialized block groups to e2fsprogs tools has been posted to the linux-ext4 mailing list. The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is. The attached graph shows the major improvements in e2fsck times in filesystems with a large total inode count, but few inodes in use. In each group descriptor if we have EXT4_BG_INODE_UNINIT set in bg_flags: Inode table is not initialized/used in this group. So we can skip the consistency check during fsck. EXT4_BG_BLOCK_UNINIT set in bg_flags: No block in the group is used. So we can skip the block bitmap verification for this group. We also add two new fields to group descriptor as a part of uninitialized group patch. __le16 bg_itable_unused; /* Unused inodes count */ __le16 bg_checksum; /* crc16(sb_uuid+group+desc) */ bg_itable_unused: If we have EXT4_BG_INODE_UNINIT not set in bg_flags then bg_itable_unused will give the offset within the inode table till the inodes are used. This can be used by fsck to skip list of inodes that are marked unused. bg_checksum: Now that we depend on bg_flags and bg_itable_unused to determine the block and inode usage, we need to make sure group descriptor is not corrupt. We add checksum to group descriptor to detect corruption. If the descriptor is found to be corrupt, we mark all the blocks and inodes in the group used. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2007-10-17ext4: remove #ifdef CONFIG_EXT4_INDEXEric Sandeen
CONFIG_EXT4_INDEX is not an exposed config option in the kernel, and it is unconditionally defined in ext4_fs.h. tune2fs is already able to turn off dir indexing, so at this point it's just cluttering up the code. Remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-17ext4: Remove (partial, never completed) fragment supportColy Li
Fragment support in ext2/3/4 was never implemented, and it probably will never be implemented. So remove it from ext4. Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-10-17jbd2: JBD_XXX to JBD2_XXX naming cleanupMingming Cao
change JBD_XXX macros to JBD2_XXX in JBD2/Ext4 Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2007-10-17JBD2: replace jbd_kmalloc with kmalloc directly.Mingming Cao
This patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2007-10-17JBD: replace jbd_kmalloc with kmalloc directlyMingming Cao
This patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Mingming Cao <cmm@us.ibm.com>