aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-08-31autofs4 - fix missed case when changing to use struct pathIan Kent
In the recent change by Al Viro that changes verious subsystems to use "struct path" one case was missed in the autofs4 module which causes mounts to no longer expire. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-28inotify: update the group mask on mark additionEric Paris
Seperating the addition and update of marks in inotify resulted in a regression in that inotify never gets events. The inotify group mask is always 0. This mask should be updated any time a new mark is added. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-28inotify: fix length reporting and size checkingEric Paris
0db501bd0610ee0c0 introduced a regresion in that it now sends a nul terminator but the length accounting when checking for space or reporting to userspace did not take this into account. This corrects all of the rounding logic. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-28inotify: do not send a block of zeros when no pathname is availableBrian Rogers
When an event has no pathname, there's no need to pad it with a null byte and therefore generate an inotify_event sized block of zeros. This fixes a regression introduced by commit 0db501bd0610ee0c0aca84d927f90bcccd09e2bd where my system wouldn't finish booting because some process was being confused by this. Signed-off-by: Brian Rogers <brian@xyzw.org> Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds
* 'for-linus' of git://git.infradead.org/users/eparis/notify: inotify: Ensure we alwasy write the terminating NULL. inotify: fix locking around inotify watching in the idr inotify: do not BUG on idr entries at inotify destruction inotify: seperate new watch creation updating existing watches
2009-08-27Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: update documentation pointers 9p: remove unnecessary v9fses->options which duplicates the mount string net/9p: insulate the client against an invalid error code sent by a 9p server 9p: Add missing cast for the error return value in v9fs_get_inode 9p: Remove redundant inode uid/gid assignment 9p: Fix possible regressions when ->get_sb fails. 9p: Fix v9fs show_options 9p: Fix possible memleak in v9fs_inode_from fid. 9p: minor comment fixes 9p: Fix possible inode leak in v9fs_get_inode. 9p: Check for error in return value of v9fs_fid_add
2009-08-27AFS: Stop readlink() on AFS crashing due to NULL 'file' ptrDavid Howells
kAFS crashes when asked to read a symbolic link because page_getlink() passes a NULL file pointer to read_mapping_page(), but afs_readpage() expects a file pointer from which to extract a key. Modify afs_readpage() to request the appropriate key from the calling process's keyrings if a file struct is not supplied with one attached. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-27inotify: Ensure we alwasy write the terminating NULL.Eric W. Biederman
Before the rewrite copy_event_to_user always wrote a terqminating '\0' byte to user space after the filename. Since the rewrite that terminating byte was skipped if your filename is exactly a multiple of event_size. Ouch! So add one byte to name_size before we round up and use clear_user to set userspace to zero like /dev/zero does instead of copying the strange nul_inotify_event. I can't quite convince myself len_to_zero will never exceed 16 and even if it doesn't clear_user should be more efficient and a more accurate reflection of what the code is trying to do. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27inotify: fix locking around inotify watching in the idrEric Paris
The are races around the idr storage of inotify watches. It's possible that a watch could be found from sys_inotify_rm_watch() in the idr, but it could be removed from the idr before that code does it's removal. Move the locking and the refcnt'ing so that these have to happen atomically. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27inotify: do not BUG on idr entries at inotify destructionEric Paris
If an inotify watch is left in the idr when an fsnotify group is destroyed this will lead to a BUG. This is not a dangerous situation and really indicates a programming bug and leak of memory. This patch changes it to use a WARN and a printk rather than killing people's boxes. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-27inotify: seperate new watch creation updating existing watchesEric Paris
There is nothing known wrong with the inotify watch addition/modification but this patch seperates the two code paths to make them each easy to verify as correct. Signed-off-by: Eric Paris <eparis@redhat.com>
2009-08-25Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: ext3: Improve error message that changing journaling mode on remount is not possible ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED
2009-08-24NFSv4: Fix an infinite looping problem with the nfs4_state_managerTrond Myklebust
Commit 76db6d9500caeaa774a3e32a997eba30bbdc176b (nfs41: add session setup to the state manager) introduces an infinite loop possibility in the NFSv4 state manager. By first checking nfs4_has_session() before clearing the NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets that flag, but it never gets cleared, and so the state manager loops. In fact commit c3fad1b1aaf850bf692642642ace7cd0d64af0a3 (nfs41: add session reset to state manager) causes this to happen every time we get a network partition error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-24Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/dlm: Wait on lockres instead of erroring cancel requests ocfs2: Add missing lock name ocfs2: Don't oops in ocfs2_kill_sb on a failed mount ocfs2: release the buffer head in ocfs2_do_truncate. ocfs2: Handle quota file corruption more gracefully
2009-08-24mm: fix hugetlb bug due to user_shm_unlock callHugh Dickins
2.6.30's commit 8a0bdec194c21c8fdef840989d0d7b742bb5d4bc removed user_shm_lock() calls in hugetlb_file_setup() but left the user_shm_unlock call in shm_destroy(). In detail: Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock() is not called in hugetlb_file_setup(). However, user_shm_unlock() is called in any case in shm_destroy() and in the following atomic_dec_and_lock(&up->__count) in free_uid() is executed and if up->__count gets zero, also cleanup_user_struct() is scheduled. Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set. However, the ref counter up->__count gets unexpectedly non-positive and the corresponding structs are freed even though there are live references to them, resulting in a kernel oops after a lots of shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set. Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the time of shm_destroy() may give a different answer from at the time of hugetlb_file_setup(). And fixed newseg()'s no_id error path, which has missed user_shm_unlock() ever since it came in 2.6.9. Reported-by: Stefan Huber <shuber2@gmail.com> Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Tested-by: Stefan Huber <shuber2@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-24ext3: Improve error message that changing journaling mode on remount is not ↵Jan Kara
possible This patch makes the error message about changing journaling mode on remount more descriptive. Some people are going to hit this error now due to commit bbae8bcc49bc4d002221dab52c79a50a82e7cd1f if they configure a kernel to default to data=writeback mode. The problem happens if they have data=ordered set for the root filesystem in /etc/fstab but not in the kernel command line (and they don't use initrd). Their filesystem then gets mounted as data=writeback by kernel but then their boot fails because init scripts won't be able to remount the filesystem rw. Better error message will hopefully make it easier for them to find the error in their setup and bother us less with error reports :). Signed-off-by: Jan Kara <jack@suse.cz>
2009-08-24ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDEREDTheodore Ts'o
The old description for this configuration option was perhaps not completely balanced in terms of describing the tradeoffs of using a default of data=writeback vs. data=ordered. Despite the fact that old description very strongly recomended disabling this feature, all of the major distributions have elected to preserve the existing 'legacy' default, which is a strong hint that it perhaps wasn't telling the whole story. This revised description has been vetted by a number of ext3 developers as being better at informing the user about the tradeoffs of enabling or disabling this configuration feature. Cc: linux-ext4@vger.kernel.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
2009-08-24kernel_read: redefine offset typeMimi Zohar
vfs_read() offset is defined as loff_t, but kernel_read() offset is only defined as unsigned long. Redefine kernel_read() offset as loff_t. Cc: stable@kernel.org Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-08-21Re-introduce page mapping check in mark_buffer_dirty()Linus Torvalds
In commit a8e7d49aa7be728c4ae241a75a2a124cdcabc0c5 ("Fix race in create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test for a NULL page mapping unintentionally when some of the code inside __set_page_dirty() was moved to the callers. That removal generally didn't matter, since a filesystem would serialize truncation (which clears the page mapping) against writing (which marks the buffer dirty), so locking at a higher level (either per-page or an inode at a time) should mean that the buffer page would be stable. And indeed, nothing bad seemed to happen. Except it turns out that apparently reiserfs does something odd when under load and writing out the journal, and we have a number of bugzilla entries that look similar: http://bugzilla.kernel.org/show_bug.cgi?id=13556 http://bugzilla.kernel.org/show_bug.cgi?id=13756 http://bugzilla.kernel.org/show_bug.cgi?id=13876 and it looks like reiserfs depended on that check (the common theme seems to be "data=journal", and a journal writeback during a truncate). I suspect reiserfs should have some additional locking, but in the meantime this should get us back to the pre-2.6.29 behavior. Pattern-pointed-out-by: Roland Kletzing <devzero@web.de> Cc: stable@kernel.org (2.6.29 and 2.6.30) Cc: Jeff Mahoney <jeffm@suse.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-21Merge branch 'btrfs' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'btrfs' of git://git.kernel.dk/linux-2.6-block: btrfs: fix inode rbtree corruption
2009-08-21btrfs: fix inode rbtree corruptionFrom: Nick Piggin
Node may not be inserted over existing node. This causes inode tree corruption and I was seeing crashes in inode_tree_del which I can not reproduce after this patch. The other way to fix this would be to tie inode lifetime in the rbtree with inode while not in freeing state. I had a look at this but it is not so trivial at this point. At least this patch gets things working again. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Acked-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-08-20ocfs2/dlm: Wait on lockres instead of erroring cancel requestsGoldwyn Rodrigues
In case a downconvert is queued, and a flock receives a signal, BUG_ON(lockres->l_action != OCFS2_AST_INVALID) is triggered because a lock cancel triggers a dlmunlock while an AST is scheduled. To avoid this, allow a LKM_CANCEL to pass through, and let it wait on __dlm_wait_on_lockres(). Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Acked-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-20ocfs2: Add missing lock nameJan Kara
There is missing name for NFSSync cluster lock. This makes lockdep unhappy because we end up passing NULL to lockdep when initializing lock key. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix oopses with doubly mounted snapshots nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint()
2009-08-18mm: revert "oom: move oom_adj value"KOSAKI Motohiro
The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to the mm_struct. It was a very good first step for sanitize OOM. However Paul Menage reported the commit makes regression to his job scheduler. Current OOM logic can kill OOM_DISABLED process. Why? His program has the code of similar to the following. ... set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */ ... if (vfork() == 0) { set_oom_adj(0); /* Invoked child can be killed */ execve("foo-bar-cmd"); } .... vfork() parent and child are shared the same mm_struct. then above set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also change oom_adj for vfork() parent. Then, vfork() parent (job scheduler) lost OOM immune and it was killed. Actually, fork-setting-exec idiom is very frequently used in userland program. We must not break this assumption. Then, this patch revert commit 2ff05b2b and related commit. Reverted commit list --------------------- - commit 2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct) - commit 4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE) - commit 8123681022 (oom: only oom kill exiting tasks with attached memory) - commit 933b787b57 (mm: copy over oom_adj value at fork time) Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-18vfs: make get_sb_pseudo set s_maxbytes to value that can be cast to signedJeff Layton
get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast to a signed value. Fix it to use MAX_LFS_FILESIZE which casts properly to a positive signed value. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Steve French <smfrench@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-19nilfs2: fix oopses with doubly mounted snapshotsRyusuke Konishi
will fix kernel oopses like the following: # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1 # mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2 # umount /test1 # umount /test2 BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069 in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2 1 lock held by umount.nilfs2/3886: #0: (&type->s_umount_key#31){+.+...}, at: [<c10b398a>] deactivate_super+0x52/0x6c irq event stamp: 1219 hardirqs last enabled at (1219): [<c135c774>] __mutex_unlock_slowpath+0xf8/0x119 hardirqs last disabled at (1218): [<c135c6d5>] __mutex_unlock_slowpath+0x59/0x119 softirqs last enabled at (1214): [<c1033316>] __do_softirq+0x1a5/0x1ad softirqs last disabled at (1205): [<c1033354>] do_softirq+0x36/0x5a Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55 Call Trace: [<c1023549>] __might_sleep+0x107/0x10e [<c13603c0>] do_page_fault+0x246/0x397 [<c136017a>] ? do_page_fault+0x0/0x397 [<c135e753>] error_code+0x6b/0x70 [<c136017a>] ? do_page_fault+0x0/0x397 [<c104f805>] ? __lock_acquire+0x91/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050a62>] ? __lock_acquire+0x12ee/0x12fd [<c1050b2b>] lock_acquire+0xba/0xdd [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c135d4fe>] down_write+0x2a/0x46 [<d0d17d3f>] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<d0d17d3f>] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2] [<c104ea2c>] ? mark_held_locks+0x43/0x5b [<c104ecb1>] ? trace_hardirqs_on_caller+0x10b/0x133 [<c104ece4>] ? trace_hardirqs_on+0xb/0xd [<d0d09ac1>] nilfs_put_super+0x2f/0xca [nilfs2] [<c10b3352>] generic_shutdown_super+0x49/0xb8 [<c10b33de>] kill_block_super+0x1d/0x31 [<c10e6599>] ? vfs_quota_off+0x0/0x12 [<c10b398f>] deactivate_super+0x57/0x6c [<c10c4bc3>] mntput_no_expire+0x8c/0xb4 [<c10c5094>] sys_umount+0x27f/0x2a4 [<c10c50c6>] sys_oldumount+0xd/0xf [<c10031a4>] sysenter_do_call+0x12/0x38 ... This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify remaining sget() use"). In the patch, a new "put resource" function, nilfs_put_sbinfo() was introduced to delay freeing nilfs_sb_info struct. But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test() function to check the reference count, and it caused the nilfs_sb_info was freed when user mounted a snapshot twice. This bug also suggests there was unseen memory leak in usual mount /umount operations for nilfs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-08-18nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint()Zhang Qiang
'ns_cno' of structure 'the_nilfs' must be protected from segment writer, in other words, the caller of nilfs_get_checkpoint should hold read lock for nilfs->ns_segctor_sem. This patch adds the lock/unlock operations in nilfs_attach_checkpoint() when calling nilfs_cpfile_get_checkpoint(). Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-08-179p: remove unnecessary v9fses->options which duplicates the mount stringAbhishek Kulkarni
The mount options string is saved in sb->s_options. This patch removes the redundant duplicating of the mount options. Also, since we are not displaying anything special in show options, we replace v9fs_show_options with generic_show_options for now. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-179p: Add missing cast for the error return value in v9fs_get_inodeAbhishek Kulkarni
Cast the error return value (ENOMEM) in v9fs_get_inode() to its correct type using ERR_PTR. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-17ocfs2: Don't oops in ocfs2_kill_sb on a failed mountJan Kara
If we fail to mount the filesystem, we have to be careful not to dereference uninitialized structures in ocfs2_kill_sb. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-179p: Remove redundant inode uid/gid assignmentAbhishek Kulkarni
Remove a redundant update of inode's i_uid and i_gid after v9fs_get_inode() since the latter already sets up a new inode and sets the proper uid and gid values. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-179p: Fix possible regressions when ->get_sb fails.Abhishek Kulkarni
->get_sb can fail causing some badness. this patch fixes * clear sb->fs_s_info in kill_sb. * deactivate_locked_super() calls kill_sb (v9fs_kill_super) which closes the destroys the client, clunks all its fids and closes the v9fs session. Attempting to do it twice will cause an oops. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-179p: Fix v9fs show_optionsAbhishek Kulkarni
Add the delimiter ',' before the options when they are passed and check if no option parameters are passed to prevent displaying NULL in /proc/mounts. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-179p: Fix possible memleak in v9fs_inode_from fid.Abhishek Kulkarni
Add missing p9stat_free in v9fs_inode_from_fid to avoid any possible leaks. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-179p: minor comment fixesAbhishek Kulkarni
Fix the comments -- mostly the improper and/or missing descriptions of function parameters. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-179p: Fix possible inode leak in v9fs_get_inode.Abhishek Kulkarni
Add a missing iput when cleaning up if v9fs_get_inode fails after returning a valid inode. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-179p: Check for error in return value of v9fs_fid_addAbhishek Kulkarni
Check if v9fs_fid_add was successful or not based on its return value. Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2009-08-17Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix locking in xfs_iget_cache_hit
2009-08-17inotify: start watch descriptor count at 1Eric Paris
The inotify_add_watch man page specifies that inotify_add_watch() will return a non-negative integer. However, historically the inotify watches started at 1, not at 0. Turns out that the inotifywait program provided by the inotify-tools package doesn't properly handle a 0 watch descriptor. In 7e790dd5 we changed from starting at 1 to starting at 0. This patch starts at 1, just like in previous kernels, but also just like in previous kernels it's possible for it to wrap back to 0. This preserves the kernel functionality exactly like it was before the patch (neither method broke the spec) Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-17inotify: tail drop inotify q_overflow eventsEric Paris
In f44aebcc the tail drop logic of events with no file backing (q_overflow and in_ignored) was reversed so IN_IGNORED events would never be tail dropped. This now means that Q_OVERFLOW events are NOT tail dropped. The fix is to not tail drop IN_IGNORED, but to tail drop Q_OVERFLOW. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-17notify: unused event private raceEric Paris
inotify decides if private data it passed to get added to an event was used by checking list_empty(). But it's possible that the event may have been dequeued and the private event removed so it would look empty. The fix is to use the return code from fsnotify_add_notify_event rather than looking at the list. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-17ocfs2: release the buffer head in ocfs2_do_truncate.Tao Ma
In ocfs2_do_truncate, we forget to release last_eb_bh which will cause memleak. So call brelse in the end. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-17ocfs2: Handle quota file corruption more gracefullyJan Kara
ocfs2_read_virt_blocks() does BUG when we try to read a block from a file beyond its end. Since this can happen due to filesystem corruption, it is not really an appropriate answer. Make ocfs2_read_quota_block() check the condition and handle it by calling ocfs2_error() and returning EIO. [ Modified to print ip_blkno in the error - Joel ] Reported-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-08-17xfs: fix locking in xfs_iget_cache_hitChristoph Hellwig
The locking in xfs_iget_cache_hit currently has numerous problems: - we clear the reclaim tag without i_flags_lock which protects modifications to it - we call inode_init_always which can sleep with pag_ici_lock held (this is oss.sgi.com BZ #819) - we acquire and drop i_flags_lock a lot and thus provide no consistency between the various flags we set/clear under it This patch fixes all that with a major revamp of the locking in the function. The new version acquires i_flags_lock early and only drops it once we need to call into inode_init_always or before calling xfs_ilock. This patch fixes a bug seen in the wild where we race modifying the reclaim tag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-08-15poll/select: initialize triggered field of struct poll_wqueuesGuillaume Knispel
The triggered field of struct poll_wqueues introduced in commit 5f820f648c92a5ecc771a96b3c29aa6e90013bba ("poll: allow f_op->poll to sleep"). It was first set to 1 in pollwake() (now __pollwake() ), tested and later set to 0 in poll_schedule_timeout(), but not initialized before. As a result when the process needs to sleep, triggered was likely to be non-zero even if pollwake() is not called before the first poll_schedule_timeout(), meaning schedule_hrtimeout_range() would not be called and an extra loop calling all ->poll() would be done. This patch initialize triggered to 0 in poll_initwait() so the ->poll() are not called twice before the process goes to sleep when it needs to. Signed-off-by: Guillaume Knispel <gknispel@proformatique.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-14GFS2: Fix permissions on "recover" fileSteven Whitehouse
Although this file is only ever written and not read by userspace, it seems that the utils are opening this file O_RDWR, so we need to allow that. Also fixes the whitespace which seemed to be broken. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: David Teigland <teigland@redhat.com>
2009-08-13Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (22 commits) ocfs2: Fix possible deadlock when extending quota file ocfs2: keep index within status_map[] ocfs2: Initialize the cluster we're writing to in a non-sparse extend ocfs2: Remove redundant BUG_ON in __dlm_queue_ast() ocfs2/quota: Release lock for error in ocfs2_quota_write. ocfs2: Define credit counts for quota operations ocfs2: Remove syncjiff field from quota info ocfs2: Fix initialization of blockcheck stats ocfs2: Zero out padding of on disk dquot structure ocfs2: Initialize blocks allocated to local quota file ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq() ocfs2: Make global quota files blocksize aligned ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records. ocfs2: Fix deadlock on umount ocfs2: Add extra credits and access the modified bh in update_edge_lengths. ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation ocfs2: Fix error return in ocfs2_write_cluster() ocfs2: Fix compilation warning for fs/ocfs2/xattr.c ocfs2: Initialize count in aio_write before generic_write_checks ocfs2: log the actual return value of ocfs2_file_aio_write() ...
2009-08-12Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: fix spin_is_locked assert on uni-processor builds xfs: check for dinode realtime flag corruption use XFS_CORRUPTION_ERROR in xfs_btree_check_sblock xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_get xfs: switch to NOFS allocation under i_lock in xfs_readlink_bmap xfs: switch to NOFS allocation under i_lock in xfs_attr_rmtval_set xfs: switch to NOFS allocation under i_lock in xfs_buf_associate_memory xfs: switch to NOFS allocation under i_lock in xfs_dir_cilookup_result xfs: switch to NOFS allocation under i_lock in xfs_da_buf_make xfs: switch to NOFS allocation under i_lock in xfs_da_state_alloc xfs: switch to NOFS allocation under i_lock in xfs_getbmap xfs: avoid memory allocation under m_peraglock in growfs code
2009-08-12NFS: Fix an O_DIRECT Oops...Trond Myklebust
We can't call nfs_readdata_release()/nfs_writedata_release() without first initialising and referencing args.context. Doing so inside nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment() causes an Oops. We should rather be calling nfs_readdata_free()/nfs_writedata_free() in those cases. Looking at the O_DIRECT code, the "struct nfs_direct_req" is already referencing the nfs_open_context for us. Since the readdata and writedata structures carry a reference to that, we can simplify things by getting rid of the extra nfs_open_context references, so that we can replace all instances of nfs_readdata_release()/nfs_writedata_release(). Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>