aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2008-07-11ext4: Fix mb_find_next_bit not to return larger than maxAneesh Kumar K.V
Some architectures implement ext4_find_next_bit and ext4_find_next_zero_bit in such a way that they return greater than max for some input values. Make sure mb_find_next_bit and mb_find_next_zero_bit return the right values. On 2.6.25 we have include/asm-x86/bitops_32.h static inline unsigned find_first_bit(const unsigned long *addr, unsigned size) { unsigned x = 0; while (x < size) { unsigned long val = *addr++; if (val) return __ffs(val) + x; x += (sizeof(*addr)<<3); } return x; } This can return value greater than size. Reported and fixed here for lustre https://bugzilla.lustre.org/show_bug.cgi?id=15932 https://bugzilla.lustre.org/attachment.cgi?id=17205 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: validate directory entry data before useDuane Griffin
ext4_dx_find_entry uses ext4_next_entry without verifying that the entry is valid. If its rec_len == 0 this causes an infinite loop. Refactor the loop to check the validity of entries before checking whether they match and moving onto the next one. There are other uses of ext4_next_entry in this file which also look problematic. They should be reviewed and fixed if/when we have a test-case that triggers them. This patch fixes the first case (image hdb.25.softlockup.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-11ext4: handle deleting corrupted indirect blocksDuane Griffin
While freeing indirect blocks we attach a journal head to the parent buffer head, free the blocks, then journal the parent. If the indirect block list is corrupted and points to the parent the journal head will be detached when the block is cleared, causing an OOPS. Check for that explicitly and handle it gracefully. This patch fixes the third case (image hdb.20000057.nullderef.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-11ext4: handle corrupted orphan list at mountDuane Griffin
If the orphan node list includes valid, untruncatable nodes with nlink > 0 the ext4_orphan_cleanup loop which attempts to delete them will not do so, causing it to loop forever. Fix by checking for such nodes in the ext4_orphan_get function. This patch fixes the second case (image hdb.20000009.softlockup.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-12cifs: fix wksidarr declaration to be big-endian friendlyJeff Layton
The current definition of wksidarr works fine on little endian arches (since cpu_to_le32 is a no-op there), but on big-endian arches, it fails to compile with this error: error: braced-group within expression allowed only inside a function The problem is that this static declaration has cpu_to_le32 embedded within it, and that expands into a function macro. We need to use __constant_cpu_to_le32() instead. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Steven French <sfrench@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-12cifs: fix inode leak in cifs_get_inode_info_unixJeff Layton
Try this: mount a share with unix extensions create a file on it umount the share You'll get the following message in the ring buffer: VFS: Busy inodes after unmount of cifs. Self-destruct in 5 seconds. Have a nice day... ...the problem is that cifs_get_inode_info_unix is creating and hashing a new inode even when it's going to return error anyway. The first lookup when creating a file returns an error so we end up leaking this inode before we do the actual create. This appears to be a regression caused by commit 0e4bbde94fdc33f5b3d793166b21bf768ca3e098. The following patch seems to fix it for me, and fixes a minor formatting nit as well. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-11Fix reference counting race on log buffersDave Chinner
When we release the iclog, we do an atomic_dec_and_lock to determine if we are the last reference and need to trigger update of log headers and writeout. However, in xlog_state_get_iclog_space() we also need to check if we have the last reference count there. If we do, we release the log buffer, otherwise we decrement the reference count. But the compare and decrement in xlog_state_get_iclog_space() is not atomic, so both places can see a reference count of 2 and neither will release the iclog. That leads to a filesystem hang. Close the race by replacing the atomic_read() and atomic_dec() pair with atomic_add_unless() to ensure that they are executed atomically. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Tim Shimmin <tes@sgi.com> Tested-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10exec: fix stack excutability without PT_GNU_STACKHugh Dickins
Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32) exec'ing an ELF without a PT_GNU_STACK program header should default to an executable stack; but this got broken by the unlimited argv feature because stack vma is now created before the right personality has been established: so breaking old binaries using nested function trampolines. Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack vm_flags used to be set, before the mprotect_fixup. Checking through our existing VM_flags, none would have changed since insert_vm_struct: so this seems safer than finding a way through the personality labyrinth. Reported-by: pageexec@freemail.hu Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-10ocfs2: Fix flags in ocfs2_file_lockMark Fasheh
The stack-glue merge changed the way we use flags in dlmglue in that we now use the fs/dlm equivalents. Unfortunately, a merge error left the new flock code only partially updated. This took a while to show up though, because the lock level constants are actually identical between o2dlm and fs/dlm. The *_CONVERT and *_NOQUEUE flags have different values though, which is eventually causing a crash in flags_to_o2dlm(). Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-08Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: [PATCH] ocfs2/dlm: Fixes oops in dlm_new_lockres()
2008-07-08Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Fix an rpcbind breakage for the case of IPv6 lookups SUNRPC: Fix a double-free in rpcbind NFS: Fix readdir cache invalidation
2008-07-08reiserfs: discard prealloc in reiserfs_delete_inodeJeff Mahoney
With the removal of struct file from the xattr code, reiserfs_file_release() isn't used anymore, so the prealloc isn't discarded. This causes hangs later down the line. This patch adds it to reiserfs_delete_inode. In most cases it will be a no-op due to it already having been called, but will avoid hangs with xattrs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-08NFS: Fix readdir cache invalidationTrond Myklebust
invalidate_inode_pages2_range() takes page offset arguments, not byte ranges. Another thought is that individual pages might perhaps get evicted by VM pressure, in which case we might perhaps want to re-read not only the evicted page, but all subsequent pages too (in case the server returns more/less data per page so that the alignment of the next entry changes). We should therefore remove the condition that we only do this on page->index==0. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-07[PATCH] ocfs2/dlm: Fixes oops in dlm_new_lockres()Sunil Mushran
Patch fixes a race that can result in an oops while adding a lockres to the dlm lockres tracking list. Bug introduced by mainline commit 29576f8bb54045be944ba809d4fca1ad77c94165. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-07-05Fix pagemap_read() use of struct mm_walkAndrew Morton
Fix some issues in pagemap_read noted by Alexey: - initialize pagemap_walk.mm to "mm" , so the code starts working as advertised - initialize ->private to "&pm" so it wouldn't immediately oops in pagemap_pte_hole() - unstatic struct pagemap_walk, so two threads won't fsckup each other (including those started by root, including flipping ->mm when you don't have permissions) - pagemap_read() contains two calls to ptrace_may_attach(), second one looks unneeded. - avoid possible kmalloc(0) and integer wraparound. Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Personally, I'd just remove the functionality entirely - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-05Fix clear_refs_write() use of struct mm_walkAndrew Morton
Don't use a static entry, so as to prevent races during concurrent use of this function. Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04security: filesystem capabilities: fix fragile setuid fixup codeAndrew G. Morgan
This commit includes a bugfix for the fragile setuid fixup code in the case that filesystem capabilities are supported (in access()). The effect of this fix is gated on filesystem capability support because changing securebits is only supported when filesystem capabilities support is configured.) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04add kernel-doc for simple_read_from_buffer and memory_read_from_bufferAkinobu Mita
Add kernel-doc comments describing simple_read_from_buffer and memory_read_from_buffer. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04ntfs: update help textJess Guerrero
The url in the help text for ntfs should be updated. Acked-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04ecryptfs: remove unnecessary mux from ecryptfs_init_ecryptfs_miscdev()Michael Halcrow
The misc_mtx should provide all the protection required to keep the daemon hash table sane during miscdev registration. Since this mutex is causing gratuitous lockdep warnings, this patch removes it. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Reported-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04reiserfs: add missing unlock to an error path in reiserfs_quota_write()Jan Kara
When write in reiserfs_quota_write() fails, we have to properly release i_mutex. One error path has been missing the unlock... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04ext4: add missing unlock to an error path in ext4_quota_write()Jan Kara
When write in ext4_quota_write() fails, we have to properly release i_mutex. One error path has been missing the unlock... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04ext3: add missing unlock to error path in ext3_quota_write()Jan Kara
When write in ext3_quota_write() fails, we have to properly release i_mutex. One error path has been missing the unlock... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-039p: fix O_APPEND in legacy modeEric Van Hensbergen
The legacy protocol's open operation doesn't handle an append operation (it is expected that the client take care of it). We were incorrectly passing the extended protocol's flag through even in legacy mode. This was reported in bugzilla report #10689. This patch fixes the problem by disallowing extended protocol open modes from being passed in legacy mode and implemented append functionality on the client side by adding a seek after the open. Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-07-01Properly notify block layer of sync writesJens Axboe
fsync_buffers_list() and sync_dirty_buffer() both issue async writes and then immediately wait on them. Conceptually, that makes them sync writes and we should treat them as such so that the IO schedulers can handle them appropriately. This patch fixes a write starvation issue that Lin Ming reported, where xx is stuck for more than 2 minutes because of a large number of synchronous IO in the system: INFO: task kjournald:20558 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kjournald D ffff810010820978 6712 20558 2 ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2 ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb 0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537 Call Trace: [<ffffffff803ba6f2>] kobject_get+0x12/0x17 [<ffffffff80247537>] getnstimeofday+0x2f/0x83 [<ffffffff8029c1ac>] sync_buffer+0x0/0x3f [<ffffffff8066d195>] io_schedule+0x5d/0x9f [<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f [<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f [<ffffffff8029c1ac>] sync_buffer+0x0/0x3f [<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78 [<ffffffff80243909>] wake_bit_function+0x0/0x23 [<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb [<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6 [<ffffffff8023a676>] lock_timer_base+0x26/0x4b [<ffffffff8030300a>] kjournald+0xc1/0x1fb [<ffffffff802438db>] autoremove_wake_function+0x0/0x2e [<ffffffff80302f49>] kjournald+0x0/0x1fb [<ffffffff802437bb>] kthread+0x47/0x74 [<ffffffff8022de51>] schedule_tail+0x28/0x5d [<ffffffff8020cac8>] child_rip+0xa/0x12 [<ffffffff80243774>] kthread+0x0/0x74 [<ffffffff8020cabe>] child_rip+0x0/0x12 Lin Ming confirms that this patch fixes the issue. I've run tests with it for the past week and no ill effects have been observed, so I'm proposing it for inclusion into 2.6.26. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-29Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: Fix regression in UDF anchor block detection
2008-06-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [patch 2/3] vfs: dcache cleanups [patch 1/3] vfs: dcache sparse fixes [patch 3/3] vfs: make d_path() consistent across mount operations [patch 4/4] flock: remove unused fields from file_lock_operations [patch 3/4] vfs: fix ERR_PTR abuse in generic_readlink [patch 2/4] fs: make struct file arg to d_path const [patch 1/4] vfs: path_{get,put}() cleanups [patch for 2.6.26 4/4] vfs: utimensat(): fix write access check for futimens() [patch for 2.6.26 3/4] vfs: utimensat(): fix error checking for {UTIME_NOW,UTIME_OMIT} case [patch for 2.6.26 1/4] vfs: utimensat(): ignore tv_sec if tv_nsec == UTIME_OMIT or UTIME_NOW [patch for 2.6.26 2/4] vfs: utimensat(): be consistent with utime() for immutable and append-only files [PATCH] fix cgroup-inflicted breakage in block_dev.c
2008-06-24[GFS2] fix gfs2 block allocation (cleaned up)Benjamin Marzinski
This patch fixes bz 450641. This patch changes the computation for zero_metapath_length(), which it renames to metapath_branch_start(). When you are extending the metadata tree, The indirect blocks that point to the new data block must either diverge from the existing tree either at the inode, or at the first indirect block. They can diverge at the first indirect block because the inode has room for 483 pointers while the indirect blocks have room for 509 pointers, so when the tree is grown, there is some free space in the first indirect block. What metapath_branch_start() now computes is the height where the first indirect block for the new data block is located. It can either be 1 (if the indirect block diverges from the inode) or 2 (if it diverges from the first indirect block). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-24[GFS2] BUG: unable to handle kernel paging request at ffff81002690e000Bob Peterson
This patch fixes bugzilla bug bz448866: gfs2: BUG: unable to handle kernel paging request at ffff81002690e000. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-24Merge branch 'master' into for_mmJan Kara
2008-06-24udf: Fix regression in UDF anchor block detectionTomas Janousek
In some cases it could happen that some block passed test in udf_check_anchor_block() even though udf_read_tagged() refused to read it later (e.g. because checksum was not correct). This patch makes udf_check_anchor_block() use udf_read_tagged() so that the checking is stricter. This fixes the regression (certain disks unmountable) caused by commit 423cf6dc04eb79d441bfda2b127bc4b57134b41d. Signed-off-by: Tomas Janousek <tomi@nomi.cz> Signed-off-by: Jan Kara <jack@suse.cz>
2008-06-23NFS: nfs_updatepage(): don't mark page as dirty if an error occurredTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23NFS: Fix filehandle size comparisons in the mount codeTrond Myklebust
Fix a sign issue in xdr_decode_fhstatus3() Fix incorrect comparison in nfs_validate_mount_data() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23NFS: Reduce the NFS mount code stack usage.Trond Myklebust
This appears to fix the Oops reported in http://bugzilla.kernel.org/show_bug.cgi?id=10826 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23[patch 2/3] vfs: dcache cleanupsMiklos Szeredi
Comment from Al Viro: add prepend_name() wrapper. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 1/3] vfs: dcache sparse fixesMiklos Szeredi
Fix the following sparse warnings: fs/dcache.c:2183:19: warning: symbol 'filp_cachep' was not declared. Should it be static? fs/dcache.c:115:3: warning: context imbalance in 'dentry_iput' - unexpected unlock fs/dcache.c:188:2: warning: context imbalance in 'dput' - different lock contexts for basic block fs/dcache.c:400:2: warning: context imbalance in 'prune_one_dentry' - different lock contexts for basic block fs/dcache.c:431:22: warning: context imbalance in 'prune_dcache' - different lock contexts for basic block fs/dcache.c:563:2: warning: context imbalance in 'shrink_dcache_sb' - different lock contexts for basic block fs/dcache.c:1385:6: warning: context imbalance in 'd_delete' - wrong count at exit fs/dcache.c:1636:2: warning: context imbalance in '__d_unalias' - unexpected unlock fs/dcache.c:1735:2: warning: context imbalance in 'd_materialise_unique' - different lock contexts for basic block Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 3/3] vfs: make d_path() consistent across mount operationsAndreas Gruenbacher
The path that __d_path() computes can become slightly inconsistent when it races with mount operations: it grabs the vfsmount_lock when traversing mount points but immediately drops it again, only to re-grab it when it reaches the next mount point. The result is that the filename computed is not always consisent, and the file may never have had that name. (This is unlikely, but still possible.) Fix this by grabbing the vfsmount_lock for the whole duration of __d_path(). Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: John Johansen <jjohansen@suse.de> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 4/4] flock: remove unused fields from file_lock_operationsDenis V. Lunev
fl_insert and fl_remove are not used right now in the kernel. Remove them. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 3/4] vfs: fix ERR_PTR abuse in generic_readlinkMarcin Slusarz
generic_readlink calls ERR_PTR for negative and positive values (vfs_readlink returns length of "link"), but it should not (not an errno) and does not need to. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 2/4] fs: make struct file arg to d_path constJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 1/4] vfs: path_{get,put}() cleanupsJan Blunck
Here are some more places where path_{get,put}() can be used instead of dput()/mntput() pair. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch for 2.6.26 4/4] vfs: utimensat(): fix write access check for futimens()Michael Kerrisk
The POSIX.1 draft spec for futimens()/utimensat() says: Only a process with the effective user ID equal to the user ID of the file, *or with write access to the file*, or with appropriate privileges may use futimens() or utimensat() with a null pointer as the times argument or with both tv_nsec fields set to the special value UTIME_NOW. The important piece here is "with write access to the file", and this matters for futimens(), which deals with an argument that is a file descriptor referring to the file whose timestamps are being updated, The standard is saying that the "writability" check is based on the file permissions, not the access mode with which the file is opened. (This behavior is consistent with the semantics of FreeBSD's futimes().) However, Linux is currently doing the latter -- futimens(fd, times) is a library function implemented as utimensat(fd, NULL, times, 0) and within the utimensat() implementation we have the code: f = fget(dfd); // dfd is 'fd' ... if (f) { if (!(f->f_mode & FMODE_WRITE)) goto mnt_drop_write_and_out; The check should instead be based on the file permissions. Thanks to Miklos for pointing out how to do this check. Miklos also pointed out a simplification that could be made to my first version of this patch, since the checks for the pathname and file descriptor cases can now be conflated. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch for 2.6.26 3/4] vfs: utimensat(): fix error checking for ↵Michael Kerrisk
{UTIME_NOW,UTIME_OMIT} case The POSIX.1 draft spec for utimensat() says: Only a process with the effective user ID equal to the user ID of the file or with appropriate privileges may use futimens() or utimensat() with a non-null times argument that does not have both tv_nsec fields set to UTIME_NOW and does not have both tv_nsec fields set to UTIME_OMIT. If this condition is violated, then the error EPERM should result. However, the current implementation does not generate EPERM if one tv_nsec field is UTIME_NOW while the other is UTIME_OMIT. It should give this error for that case. This patch: a) Repairs that problem. b) Removes the now unneeded nsec_special() helper function. c) Adds some comments to explain the checks that are being performed. Thanks to Miklos, who provided comments on the previous iteration of this patch. As a result, this version is a little simpler and and its logic is better structured. Miklos suggested an alternative idea, migrating the is_owner_or_cap() checks into fs/attr.c:inode_change_ok() via the use of an ATTR_OWNER_CHECK flag. Maybe we could do that later, but for now I've gone with this version, which is IMO simpler, and can be more easily read as being correct. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch for 2.6.26 1/4] vfs: utimensat(): ignore tv_sec if tv_nsec == ↵Michael Kerrisk
UTIME_OMIT or UTIME_NOW The POSIX.1 draft spec for utimensat() says that if a times[n].tv_nsec field is UTIME_OMIT or UTIME_NOW, then the value in the corresponding tv_sec field is ignored. See the last sentence of this para, from the spec: If the tv_nsec field of a timespec structure has the special value UTIME_NOW, the file's relevant timestamp shall be set to the greatest value supported by the file system that is not greater than the current time. If the tv_nsec field has the special value UTIME_OMIT, the file's relevant timestamp shall not be changed. In either case, the tv_sec field shall be ignored. However the current Linux implementation requires the tv_sec value to be zero (or the EINVAL error results). This requirement should be removed. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch for 2.6.26 2/4] vfs: utimensat(): be consistent with utime() for ↵Michael Kerrisk
immutable and append-only files This patch fixes utimensat() to make its behavior consistent with that of utime()/utimes() when dealing with files marked immutable and append-only. The current utimensat() implementation also returns EPERM if 'times' is non-NULL and the tv_nsec fields are both UTIME_NOW. For consistency, the (times != NULL && times[0].tv_nsec == UTIME_NOW && times[1].tv_nsec == UTIME_NOW) case should be treated like the traditional utimes() case where 'times' is NULL. That is, the call should succeed for a file marked append-only and should give the error EACCES if the file is marked as immutable. The simple way to do this is to set 'times' to NULL if (times[0].tv_nsec == UTIME_NOW && times[1].tv_nsec == UTIME_NOW). This is also the natural approach, since POSIX.1 semantics consider the times == {{x, UTIME_NOW}, {y, UTIME_NOW}} to be exactly equivalent to the case for times == NULL. (Thanks to Miklos for pointing this out.) Patch 3 in this series relies on the simplification provided by this patch. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[PATCH] fix cgroup-inflicted breakage in block_dev.cAl Viro
devcgroup_inode_permission() expects MAY_FOO, not FMODE_FOO; kindly keep your misdesign consistent if you positively have to inflict it on the kernel. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-22Fix performance regression on lmbench select benchmarkLinus Torvalds
Christian Borntraeger reported that reinstating cond_resched() with CONFIG_PREEMPT caused a performance regression on lmbench: For example select file 500: 23 microseconds 32 microseconds and that's really because we totally unnecessarily do the cond_resched() in the innermost loop of select(), which is just silly. This moves it out from the innermost loop (which only ever loops ove the bits in a single "unsigned long" anyway), which makes the performance regression go away. Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-21Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: Ext4: Fix online resize block group descriptor corruption
2008-06-20Ext4: Fix online resize block group descriptor corruptionFrederic Bohe
This is the patch for the group descriptor table corruption during online resize pointed out by Theodore Tso. The problem was caused by the fact that the ext4 group descriptor can be either 32 or 64 bytes long. Only the 64 bytes structure was taken into account. Signed-off-by: Frederic Bohe <frederic.bohe@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-06-18Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6: udf: restore UDFFS_DEBUG to being undefined by default