aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2006-09-28[GFS2] inode-diet: Eliminate i_blksize from the inode structureTheodore Ts'o
This eliminates the i_blksize field from struct inode. Filesystems that want to provide a per-inode st_blksize can do so by providing their own getattr routine instead of using the generic_fillattr() function. Note that some filesystems were providing pretty much random (and incorrect) values for i_blksize. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-28[GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs)Theodore Ts'o
The following patches reduce the size of the VFS inode structure by 28 bytes on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction in the inode size on a UP kernel that is configured in a production mode (i.e., with no spinlock or other debugging functions enabled; if you want to save memory taken up by in-core inodes, the first thing you should do is disable the debugging options; they are responsible for a huge amount of bloat in the VFS inode structure). This patch: The filesystem or device-specific pointer in the inode is inside a union, which is pretty pointless given that all 30+ users of this field have been using the void pointer. Get rid of the union and rename it to i_private, with a comment to explain who is allowed to use the void pointer. This is just a cleanup, but it allows us to reuse the union 'u' for something something where the union will actually be used. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-09-28Merge branch 'master' into gfs2Steven Whitehouse
2006-09-27[GFS2] Fix bug in Makefiles for lock modulesSteven Whitehouse
The Makefile had the wrong CONFIG_ variable in it so that in case GFS2 was y and the lock modules were m, they were not getting built properly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-09-27[PATCH] de_thread: Use tsk not currentEric W. Biederman
Ingo Oeser pointed out that because current expands to an inline function it is more space efficient and somewhat faster to simply keep a cached copy of current in another variable. This patch implements that for the de_thread function. (akpm: saves nearly 100 bytes of text on x86) Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] fs/nfs/: make code staticAdrian Bunk
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] add newline to nfs dprintkMartin Bligh
Add missing \n to dprintk Signed-off-by: Martin Bligh <mbligh@google.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] pid: Implement transfer_pid and use it to simplify de_threadEric W. Biederman
In de_thread we move pids from one process to another, a rather ugly case. The function transfer_pid makes it clear what we are doing, and makes the action atomic. This is useful we ever want to atomically traverse the process group and session lists, in a rcu safe manner. Even if the atomic properties this change should be a win as transfer_pid should be less code to execute than executing both attach_pid and detach_pid, and this should make de_thread slightly smaller as only a single function call needs to be emitted. The only downside is that the code might be slower to execute as the odds are against transfer_pid being in cache. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] sysctl: Allow /proc/sys without sys_sysctlEric W. Biederman
Since sys_sysctl is deprecated start allow it to be compiled out. This should catch any remaining user space code that cares, and paves the way for further sysctl cleanups. [akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] alloc_fdtable() cleanupAndrew Morton
free_fdset(NULL, ...) is legal. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] reiserfs: warn about the useless nolargeio optionAdrian Bunk
Since the nolargeio option no longer has any effect, print a warning instead of setting a write-only variable. Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <mason@suse.com> Cc: Hans Reiser <reiser@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] inode-diet: Eliminate i_blksize from the inode structureTheodore Ts'o
This eliminates the i_blksize field from struct inode. Filesystems that want to provide a per-inode st_blksize can do so by providing their own getattr routine instead of using the generic_fillattr() function. Note that some filesystems were providing pretty much random (and incorrect) values for i_blksize. [bunk@stusta.de: cleanup] [akpm@osdl.org: generic_fillattr() fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] inode-diet: Move i_cdev into a unionTheodore Ts'o
Move the i_cdev pointer in struct inode into a union. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] inode-diet: Move i_bdev into a unionTheodore Ts'o
Move the i_bdev pointer in struct inode into a union. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_privateTheodore Ts'o
The following patches reduce the size of the VFS inode structure by 28 bytes on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction in the inode size on a UP kernel that is configured in a production mode (i.e., with no spinlock or other debugging functions enabled; if you want to save memory taken up by in-core inodes, the first thing you should do is disable the debugging options; they are responsible for a huge amount of bloat in the VFS inode structure). This patch: The filesystem or device-specific pointer in the inode is inside a union, which is pretty pointless given that all 30+ users of this field have been using the void pointer. Get rid of the union and rename it to i_private, with a comment to explain who is allowed to use the void pointer. This is just a cleanup, but it allows us to reuse the union 'u' for something something where the union will actually be used. [judith@osdl.org: powerpc build fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Judith Lebzelter <judith@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] fat: cleanup fat_get_block(s)OGAWA Hirofumi
get_blocks() was removed. So, this removes it on fat, and will take advantage of the multi block mapping. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] autofs4 needs to force fail return revalidateIan Kent
For a long time now I have had a problem with not being able to return a lookup failure on an existsing directory. In autofs this corresponds to a mount failure on a autofs managed mount entry that is browsable (and so the mount point directory exists). While this problem has been present for a long time I've avoided resolving it because it was not very visible. But now that autofs v5 has "mount and expire on demand" of nested multiple mounts, such as is found when mounting an export list from a server, solving the problem cannot be avoided any longer. I've tried very hard to find a way to do this entirely within the autofs4 module but have not been able to find a satisfactory way to achieve it. So, I need to propose a change to the VFS. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: move the fallback arch_vma_name() to a sensible placeDavid Howells
Move the fallback arch_vma_name() to a sensible place (kernel/signal.c). Currently it's in fs/proc/task_mmu.c, a file that is dependent on both CONFIG_PROC_FS and CONFIG_MMU being enabled, but it's used from kernel/signal.c from where it is called unconditionally. [akpm@osdl.org: build fix] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: Implement /proc/pid/maps for NOMMUDavid Howells
Implement /proc/pid/maps for NOMMU by reading the vm_area_list attached to current->mm->context.vmlist. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] NOMMU: Set BDI capabilities for /dev/mem and /dev/kmemDavid Howells
Set the backing device info capabilities for /dev/mem and /dev/kmem to permit direct sharing under no-MMU conditions and full mapping capabilities under MMU conditions. Make the BDI used by these available to all directly mappable character devices. Also comment the capabilities for /dev/zero. [akpm@osdl.org: ifdef reductions] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] Really ignore kmem_cache_destroy return valueAlexey Dobriyan
* Rougly half of callers already do it by not checking return value * Code in drivers/acpi/osl.c does the following to be sure: (void)kmem_cache_destroy(cache); * Those who check it printk something, however, slab_error already printed the name of failed cache. * XFS BUGs on failed kmem_cache_destroy which is not the decision low-level filesystem driver should make. Converted to ignore. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] fs: Removing useless castsPanagiotis Issaris
* Removing useless casts * Removing useless wrapper * Conversion from kmalloc+memset to kzalloc Signed-off-by: Panagiotis Issaris <takis@issaris.org> Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] fs: Conversions from kmalloc+memset to k(z|c)allocPanagiotis Issaris
Conversions from kmalloc+memset to kzalloc. Signed-off-by: Panagiotis Issaris <takis@issaris.org> Jffs2-bit-acked-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] more ext3 16T overflow fixesEric Sandeen
Some of the changes in balloc.c are just cosmetic, as Andreas pointed out - if they overflow they'll then underflow and things are fine. 5th hunk actually fixes an overflow problem. Also check for potential overflows in inode & block counts when resizing. Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: Fix sparse warningsDave Kleikamp
Fixing up some endian-ness warnings in preparation to clone ext4 from ext3. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: More whitespace cleanupsDave Kleikamp
More white space cleanups in preparation of cloning ext4 from ext3. Removing spaces that precede a tab. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: wrong error behaviorVasily Averin
SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error behavior was broken in linux kernels since 2.5.x versions by the following patch: 2002/10/31 02:15:26-05:00 tytso@snap.thunk.org Default mount options from superblock for ext2/3 filesystems http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ In case ext3 file system is mounted with errors=continue (EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at present in case of any error kernel aborts journal and remounts filesystem to read-only. Such behavior was hit number of times and noted to differ from that of 2.4.x kernels. This patch fixes this: - do nothing in case of EXT3_ERRORS_CONTINUE, - set EXT3_MOUNT_ABORT and call journal_abort() in all other cases - panic() should be called after ext3_commit_super() to save sb marked as EXT3_ERROR_FS Signed-off-by: Vasily Averin <vvs@sw.ru> Acked-by: Kirill Korotaev <dev@sw.ru> Cc: Theodore Ts'o <tytso@mit.edu> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: more comments about block allocation/reservation codeMingming Cao
Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: turn on reservation dump on block allocation errorsMingming Cao
In the past there were a few kernel panics related to block reservation tree operations failure (insert/remove etc). It would be very useful to get the block allocation reservation map info when such error happens. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] JBD: 16T fixesEric Sandeen
These are a few places I've found in jbd that look like they may not be 16T-safe, or consistent with the use of unsigned longs for block containers. Problems here would be somewhat hard to hit, would require journal blocks past the 8T boundary, which would not be terribly common. Still, should fix. (some of these have come from the ext4 work on jbd as well). I think there's one more possibility that the wrap() function may not be safe IF your last block in the journal butts right up against the 232 block boundary, but that seems like a VERY remote possibility, and I'm not worrying about it at this point. Signed-off-by: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3: inode numbers are unsigned longEric Sandeen
This is primarily format string fixes, with changes to ialloc.c where large inode counts could overflow, and also pass around journal_inum as an unsigned long, just to be pedantic about it.... Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext2: fix mounts at 16TEric Sandeen
Signed-off-by: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] fix ext3 mounts at 16TEric Sandeen
I need to do some actual IO testing now, but this gets things mounting for a 16T ext3 filesystem. (patched up e2fsprogs is needed too, I'll send that off the kernel list) This patch fixes these issues in the kernel: o sbi->s_groups_count overflows in ext3_fill_super() sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) - le32_to_cpu(es->s_first_data_block) + EXT3_BLOCKS_PER_GROUP(sb) - 1) / EXT3_BLOCKS_PER_GROUP(sb); at 16T, s_blocks_count is already maxed out; adding EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0. Not really what we want, and causes a failed mount. Feel free to check my math (actually, please do!), but changing it this way should work & avoid the overflow: (A + B - 1)/B changed to: ((A - 1)/B) + 1 o ext3_check_descriptors() overflows range checks ext3_check_descriptors() iterates over all block groups making sure that various bits are within the right block ranges... on the last pass through, it is checking the error case [item] >= block + EXT3_BLOCKS_PER_GROUP(sb) where "block" is the first block in the last block group. The last block in this group (and the last one that will fit in 32 bits) is block + EXT3_BLOCKS_PER_GROUP(sb)- 1. block + EXT3_BLOCKS_PER_GROUP(sb) wraps back around to 0. so, make things clearer with "first_block" and "last_block" where those are first and last, inclusive, and use <, > rather than <, >=. Finally, the last block group may be smaller than the rest, so account for this on the last pass through: last_block = sb->s_blocks_count - 1; (a similar patch could be done for ext2; does anyone in their right mind use ext2 at 16T? I'll send an ext2 patch doing the same thing if that's warranted) Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] jbd: use BUILD_BUG_ON in journal initAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Stephen Tweedie <sct@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] ext3 and jbd cleanup: remove whitespaceMingming Cao
Remove whitespace from ext3 and jbd, before we clone ext4. Signed-off-by: Mingming Cao<cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-27[PATCH] jbd: add lock annotation to jbd_sync_bhJosh Triplett
jbd_sync_bh releases journal->j_list_lock. Add a lock annotation to this function so that sparse can check callers for lock pairing, and so that sparse will not complain about this function since it intentionally uses the lock in this manner. Signed-off-by: Josh Triplett <josh@freedesktop.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6Linus Torvalds
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits) [PATCH] Don't set calgary iommu as default y [PATCH] i386/x86-64: New Intel feature flags [PATCH] x86: Add a cumulative thermal throttle event counter. [PATCH] i386: Make the jiffies compares use the 64bit safe macros. [PATCH] x86: Refactor thermal throttle processing [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64) [PATCH] Fix unwinder warning in traps.c [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1 [PATCH] x86: Move direct PCI scanning functions out of line [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI [PATCH] Don't leak NT bit into next task [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder [PATCH] Fix some broken white space in ia32_signal.c [PATCH] Initialize argument registers for 32bit signal handlers. [PATCH] Remove all traces of signal number conversion [PATCH] Don't synchronize time reading on single core AMD systems [PATCH] Remove outdated comment in x86-64 mmconfig code [PATCH] Use string instructions for Core2 copy/clear [PATCH] x86: - restore i8259A eoi status on resume [PATCH] i386: Split multi-line printk in oops output. ...
2006-09-26Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (47 commits) Driver core: Don't call put methods while holding a spinlock Driver core: Remove unneeded routines from driver core Driver core: Fix potential deadlock in driver core PCI: enable driver multi-threaded probe Driver Core: add ability for drivers to do a threaded probe sysfs: add proper sysfs_init() prototype drivers/base: check errors drivers/base: Platform notify needs to occur before drivers attach to the device v4l-dev2: handle __must_check add CONFIG_ENABLE_MUST_CHECK add __must_check to device management code Driver core: fixed add_bind_files() definition Driver core: fix comments in drivers/base/power/resume.c sysfs_remove_bin_file: no return value, dump_stack on error kobject: must_check fixes Driver core: add ability for devices to create and remove bin files Class: add support for class interfaces for devices Driver core: create devices/virtual/ tree Driver core: add device_rename function Driver core: add ability for classes to handle devices properly ...
2006-09-26[PATCH] binfmt_elf: consistently use loff_tAndrew Morton
As David Howells <dhowells@redhat.com> points out, binfmt_elf sometimes uses off_t, sometimes uses loff_t. Use loff_t throughout. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLEChristoph Lameter
Remove the atomic counter for slab_reclaim_pages and replace the counter and NR_SLAB with two ZVC counter that account for unreclaimable and reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE. Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The intend seems to be to check for slab pages that could be freed. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] reduce MAX_NR_ZONES: make display of highmem counters conditional on ↵Christoph Lameter
CONFIG_HIGHMEM Do not display HIGHMEM memory sizes if CONFIG_HIGHMEM is not set. Make HIGHMEM dependent texts and make display of highmem counters optional Some texts are depending on CONFIG_HIGHMEM. Remove those strings and remove the display of highmem counter values if CONFIG_HIGHMEM is not set. [akpm@osdl.org: remove some ifdefs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] mm: tracking shared dirty pagesPeter Zijlstra
Tracking of dirty pages in shared writeable mmap()s. The idea is simple: write protect clean shared writeable pages, catch the write-fault, make writeable and set dirty. On page write-back clean all the PTE dirty bits and write protect them once again. The implementation is a tad harder, mainly because the default backing_dev_info capabilities were too loosely maintained. Hence it is not enough to test the backing_dev_info for cap_account_dirty. The current heuristic is as follows, a VMA is eligible when: - its shared writeable (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED) - it is not a 'special' mapping (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0 - the backing_dev_info is cap_account_dirty mapping_cap_account_dirty(vma->vm_file->f_mapping) - f_op->mmap() didn't change the default page protection Page from remap_pfn_range() are explicitly excluded because their COW semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and because they don't have a backing store anyway. mprotect() is taught about the new behaviour as well. However it overrides the last condition. Cleaning the pages on write-back is done with page_mkclean() a new rmap call. It can be called on any page, but is currently only implemented for mapped pages, if the page is found the be of a VMA that accounts dirty pages it will also wrprotect the PTE. Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from under ->private_lock. This seems to be safe, since ->private_lock is used to serialize access to the buffers, not the page itself. This is needed because clear_page_dirty() will call into page_mkclean() and would thereby violate locking order. [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] jbd: fix commit of ordered data buffersJan Kara
Original commit code assumes, that when a buffer on BJ_SyncData list is locked, it is being written to disk. But this is not true and hence it can lead to a potential data loss on crash. Also the code didn't count with the fact that journal_dirty_data() can steal buffers from committing transaction and hence could write buffers that no longer belong to the committing transaction. Finally it could possibly happen that we tried writing out one buffer several times. The patch below tries to solve these problems by a complete rewrite of the data commit code. We go through buffers on t_sync_datalist, lock buffers needing write out and store them in an array. Buffers are also immediately refiled to BJ_Locked list or unfiled (if the write out is completed). When the array is full or we have to block on buffer lock, we submit all accumulated buffers for IO. [suitable for 2.6.18.x around the 2.6.19-rc2 timeframe] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-26[PATCH] Check return value of copy_to_user in compat_sys_pselect7Andi Kleen
Fix linux/fs/compat.c: In function compat_sys_pselect7 linux/fs/compat.c:1869: warning: ignoring return value of copy_to_user, declared with attribute warn_unused_result To make it easier to handle I changed to semantics to not try to write out a timespec if an error occurred. I hope that's ok. Cc: dwmw2@infradead.org Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-26[PATCH] i386/x86-64: Don't randomize stack top when no randomization ↵Andi Kleen
personality is set Based on patch from Frank van Maarseveen <frankvm@frankvm.com>, but extended. Signed-off-by: Andi Kleen <ak@suse.de>
2006-09-25sysfs: add proper sysfs_init() prototypeAndrew Morton
Don't be crufty. Mark it __must_check too. Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25sysfs_remove_bin_file: no return value, dump_stack on errorRandy.Dunlap
Make sysfs_remove_bin_file() void. If it detects an error, printk the file name and call dump_stack(). sysfs_hash_and_remove() now returns an error code indicating its success or failure so that sysfs_remove_bin_file() can know success/failure. Convert the only driver that checked the return value of sysfs_remove_bin_file(). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25SYSFS: allow sysfs_create_link to create symlinks in the root of sysfsGreg Kroah-Hartman
This is needed to make the compatible link for /sys/block in the future. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25Debugfs: kernel-doc fixes for debugfsRandy Dunlap
Fix kernel-doc and typos/spellos in fs/debugfs/. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-09-25sysfs: Make poll behaviour consistentJuha Yrjölä
When no events have been reported by sysfs_notify(), sd->s_events was previously set to zero. The initial value for new readers is also zero, so poll was blocking, regardless of whether the attribute was read by the process or not. Make poll behave consistently by setting the initial value of sd->s_events to non-zero. Signed-off-by: Juha Yrjola <juha.yrjola@solidboot.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>