aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs/write.c
AgeCommit message (Collapse)Author
2008-02-07NFS: Fix a potential file corruption issue when writingTrond Myklebust
If the inode is flagged as having an invalid mapping, then we can't rely on the PageUptodate() flag. Ensure that we don't use the "anti-fragmentation" write optimisation in nfs_updatepage(), since that will cause NFS to write out areas of the page that are no longer guaranteed to be up to date. A potential corruption could occur in the following scenario: client 1 client 2 =============== =============== fd=open("f",O_CREAT|O_WRONLY,0644); write(fd,"fubar\n",6); // cache last page close(fd); fd=open("f",O_WRONLY|O_APPEND); write(fd,"foo\n",4); close(fd); fd=open("f",O_WRONLY|O_APPEND); write(fd,"bar\n",4); close(fd); ----- The bug may lead to the file "f" reading 'fubar\n\0\0\0\nbar\n' because client 2 does not update the cached page after re-opening the file for write. Instead it keeps it marked as PageUptodate() until someone calls invaldate_inode_pages2() (typically by calling read()). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-05Pagecache zeroing: zero_user_segment, zero_user_segments and zero_userChristoph Lameter
Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-01Merge branch 'task_killable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc * 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits) Remove commented-out code copied from NFS NFS: Switch from intr mount option to TASK_KILLABLE Add wait_for_completion_killable Add wait_event_killable Add schedule_timeout_killable Use mutex_lock_killable in vfs_readdir Add mutex_lock_killable Use lock_page_killable Add lock_page_killable Add fatal_signal_pending Add TASK_WAKEKILL exit: Use task_is_* signal: Use task_is_* sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL ptrace: Use task_is_* power: Use task_is_* wait: Use TASK_NORMAL proc/base.c: Use task_is_* proc/array.c: Use TASK_REPORT perfmon: Use task_is_* ... Fixed up conflicts in NFS/sunrpc manually..
2008-01-30NFS: Fix minor mixed sign comparison in NFS client's write logicChuck Lever
Clean up: PAGE_CACHE_SIZE is unsigned, and nfs_pageio_init() takes a size_t. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30NFS/SUNRPC: Convert users of rpc_init_task+rpc_execute to rpc_run_task()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30NFS: Clean up the (commit|read|write)_setup() callback routinesTrond Myklebust
Move the common code for setting up the nfs_write_data and nfs_read_data structures into fs/nfs/read.c, fs/nfs/write.c and fs/nfs/direct.c. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Clean up the initialisation of priority queue scheduling info.Trond Myklebust
We want the default scheduling priority (priority == 0) to remain RPC_PRIORITY_NORMAL. Also ensure that the priority wait queue scheduling is per process id instead of sometimes being per thread, and sometimes being per inode. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Cleanup of rpc_task initialisationTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30NFS: Clean up the write request locking.Trond Myklebust
Ensure that we set/clear NFS_PAGE_TAG_LOCKED when the nfs_page is hashed. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-12-06NFS: Switch from intr mount option to TASK_KILLABLEMatthew Wilcox
By using the TASK_KILLABLE infrastructure, we can get rid of the 'intr' mount option. We have to use _killable everywhere instead of _interruptible as we get rid of rpc_clnt_sigmask/sigunmask. Signed-off-by: Liam R. Howlett <howlett@gmail.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2007-11-26NFS: make nfs_wb_page_priority() staticAdrian Bunk
nfs_wb_page_priority() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-19NFS: Fix a writeback race...Trond Myklebust
This patch fixes a regression that was introduced by commit 44dd151d5c21234cc534c47d7382f5c28c3143cd We cannot zero the user page in nfs_mark_uptodate() any more, since a) We'd be modifying the page without holding the page lock b) We can race with other updates of the page, most notably because of the call to nfs_wb_page() in nfs_writepage_setup(). Instead, we do the zeroing in nfs_update_request() if we see that we're creating a request that might potentially be marked as up to date. Thanks to Olivier Paquet for reporting the bug and providing a test-case. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-17mm: count reclaimable pages per BDIPeter Zijlstra
Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17nfs: remove congestion_end()Peter Zijlstra
These patches aim to improve balance_dirty_pages() and directly address three issues: 1) inter device starvation 2) stacked device deadlocks 3) inter process starvation 1 and 2 are a direct result from removing the global dirty limit and using per device dirty limits. By giving each device its own dirty limit is will no longer starve another device, and the cyclic dependancy on the dirty limit is broken. In order to efficiently distribute the dirty limit across the independant devices a floating proportion is used, this will allocate a share of the total limit proportional to the device's recent activity. 3 is done by also scaling the dirty limit proportional to the current task's recent dirty rate. This patch: nfs: remove congestion_end(). It's redundant, clear_bdi_congested() already wakes the waiters. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-09NFS: Remove nfs_begin_data_update/nfs_end_data_updateTrond Myklebust
The lower level routines in fs/nfs/proc.c, fs/nfs/nfs3proc.c and fs/nfs/nfs4proc.c should already be dealing with the revalidation issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09NFS: Replace file->private_data with calls to nfs_file_open_context()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09NFS: Fall back to synchronous writes when a background write errors...Trond Myklebust
This helps prevent huge queues of background writes from building up whenever the server runs out of disk or quota space, or if someone changes the file access modes behind our backs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09NFS: Writeback optimisationTrond Myklebust
Schedule writes using WB_SYNC_NONE first, then come back for a second pass using WB_SYNC_ALL. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09NFS: Clean up NFS writeback flush codeTrond Myklebust
The only user of nfs_sync_mapping_range() is nfs_getattr(), which uses it to flush out the entire inode without sending a commit. We therefore replace nfs_sync_mapping_range with a more appropriate helper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09NFS: Clean up nfs_writepages()Trond Myklebust
Just call write_cache_pages directly instead of hacking the writeback control structure in order to find out if we were called from writepages() or directly from the VM. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09NFS: Clean up write code...Trond Myklebust
The addition of nfs_page_mkwrite means that We should no longer need to create requests inside nfs_writepage() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-09-01NFS: Fix a write request leak in nfs_invalidate_page()Trond Myklebust
Ryusuke Konishi says: The recent truncate_complete_page() clears the dirty flag from a page before calling a_ops->invalidatepage(), ^^^^^^ static void truncate_complete_page(struct address_space *mapping, struct page *page) { ... cancel_dirty_page(page, PAGE_CACHE_SIZE); <--- Inserted here at kernel 2.6.20 if (PagePrivate(page)) do_invalidatepage(page, 0); ---> will call a_ops->invalidatepage() ... } and this is disturbing nfs_wb_page_priority() from calling nfs_writepage_locked() that is expected to handle the pending request (=nfs_page) associated with the page. int nfs_wb_page_priority(struct inode *inode, struct page *page, int how) { ... if (clear_page_dirty_for_io(page)) { ret = nfs_writepage_locked(page, &wbc); if (ret < 0) goto out; } ... } Since truncate_complete_page() will get rid of the page after a_ops->invalidatepage() returns, the request (=nfs_page) associated with the page becomes a garbage in nfs_inode->nfs_page_tree. ------------------------ Fix this by ensuring that nfs_wb_page_priority() recognises that it may also need to clear out non-dirty pages that have an nfs_page associated with them. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-20mm: Remove slab destructors from kmem_cache_create().Paul Mundt
Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-10NFS: Replace NFS_I(inode)->req_lock with inode->i_lockTrond Myklebust
There is no justification for keeping a special spinlock for the exclusive use of the NFS writeback code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10NFS: Remove the redundant 'dirty' and 'commit' lists from nfs_inodeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10NFS cleanup: speed up nfs_scan_commit using radix tree tagsTrond Myklebust
Add a tag for requests that are waiting for a COMMIT Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10NFS cleanup: Rename NFS_PAGE_TAG_WRITEBACK to NFS_PAGE_TAG_LOCKEDTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10NFS: Convert struct nfs_page to use krefsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10NFS: Replace vfsmount and dentry in nfs_open_context with struct pathTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10NFS: Don't mark a written page as uptodate until it is on diskTrond Myklebust
The write may fail, so we should not mark the page as uptodate until we are certain that the data has been accepted and written to disk by the server. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-24NFS: Avoid a deadlock situation on writeTrond Myklebust
When processes are allowed to attempt to lock a non-contiguous range of nfs write requests, it is possible for generic_writepages to 'wrap round' the address space, and call writepage() on a request that is already locked by the same process. We avoid the deadlock by checking if the page index is contiguous with the list of nfs write requests that is already held in our nfs_pageio_descriptor prior to attempting to lock a new request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-14NFS: Fix some 'sparse' warnings...Trond Myklebust
- fs/nfs/dir.c:610:8: warning: symbol 'nfs_llseek_dir' was not declared. Should it be static? - fs/nfs/dir.c:636:5: warning: symbol 'nfs_fsync_dir' was not declared. Should it be static? - fs/nfs/write.c:925:19: warning: symbol 'req' shadows an earlier one - fs/nfs/write.c:61:6: warning: symbol 'nfs_commit_rcu_free' was not declared. Should it be static? - fs/nfs/nfs4proc.c:793:5: warning: symbol 'nfs4_recover_expired_lease' was not declared. Should it be static? Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-14NFS: use zero_user_pageNate Diller
Use zero_user_page() instead of the newly deprecated memclear_highpage_flush(). Signed-off-by: Nate Diller <nate.diller@gmail.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-08nfs: fix congestion control: use atomic_longsPeter Zijlstra
Change the atomic_t in struct nfs_server to atomic_long_t in anticipation of machines that can handle 8+TB of (4K) pages under writeback. However I suspect other things in NFS will start going *bang* by then. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08header cleaning: don't include smp_lock.h when not usedRandy Dunlap
Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30NFS: Use pgoff_t in structures and functions that pass page cache offsetsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30NFS: Clean up nfs_sync_mapping_wait()Trond Myklebust
It has no business touching wbc->pages_skipped. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedataTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30NFS: Fix a race when doing NFS write coalescingTrond Myklebust
Currently we do write coalescing in a very inefficient manner: one pass in generic_writepages() in order to lock the pages for writing, then one pass in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather the locked pages for coalescing into RPC requests of size "wsize". In fact, it turns out there is actually a deadlock possible here since we only start I/O on the second pass. If the user signals the process while we're in nfs_sync_mapping_wait(), for instance, then we may exit before starting I/O on all the requests that have been queued up. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30NFS: Another cleanup of the read/write request coalescing codeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30NFS: Cleanup the coalescing codeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30NFS: Don't wait for congestion in nfs_update_request()Trond Myklebust
It is redundant, and will interfere with the call to balance_dirty_pages_ratelimited_nr in generic_file_write(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30NFS: Fix nfs_set_page_dirty()Trond Myklebust
Be more careful about testing page->mapping. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-20NFS: Fix race in nfs_set_page_dirtyTrond Myklebust
Protect nfs_set_page_dirty() against races with nfs_inode_add_request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20NFS: Fix the 'desynchronized value of nfs_i.ncommit' errorTrond Myklebust
Redirtying a request that is already marked for commit will screw up the accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit. Ensure that all requests on the commit queue are labelled with the PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside nfs_page_mark_flush(). Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for atomicity reasons. Avoid dropping the spinlock until we're done marking the request in the radix tree and have added it to the ->dirty list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20NFS: Don't clear PG_writeback until after we've processed unstable writesTrond Myklebust
Ensure that we don't release the PG_writeback lock until after the page has either been redirtied, or queued on the nfs_inode 'commit' list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20NFS: clean up the unstable write codeTrond Myklebust
Get rid of the inlined #ifdefs. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-15NFS: Fix a list corruption problemTrond Myklebust
We must remove the request from whatever list it is currently on before we can add it to the dirty list. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14NFS: Ensure PG_writeback is cleared when writeback failsTrond Myklebust
If the writebacks are cancelled via nfs_cancel_dirty_list, or due to the memory allocation failing in nfs_flush_one/nfs_flush_multi, then we must ensure that the PG_writeback flag is cleared. Also ensure that we actually own the PG_writeback flag whenever we schedule a new writeback by making nfs_set_page_writeback() return the value of test_set_page_writeback(). The PG_writeback page flag ends up replacing the functionality of the PG_FLUSHING nfs_page flag, so we rip that out too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16[PATCH] nfs: fix congestion controlPeter Zijlstra
The current NFS client congestion logic is severly broken, it marks the backing device congested during each nfs_writepages() call but doesn't mirror this in nfs_writepage() which makes for deadlocks. Also it implements its own waitqueue. Replace this by a more regular congestion implementation that puts a cap on the number of active writeback pages and uses the bdi congestion waitqueue. Also always use an interruptible wait since it makes sense to be able to SIGKILL the process even for mounts without 'intr'. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>