aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs/direct.c
AgeCommit message (Collapse)Author
2006-04-19NFS,SUNRPC: Fix compiler warnings if CONFIG_PROC_FS & CONFIG_SYSCTL are unsetTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-25Merge git://git.linux-nfs.org/pub/linux/nfs-2.6Linus Torvalds
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (103 commits) SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc LOCKD: Make nlmsvc_traverse_shares return void LOCKD: nlmsvc_traverse_blocks return is unused SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers. NFSv4: Dont list system.nfs4_acl for filesystems that don't support it. SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release() SUNRPC: Fix memory barriers for req->rq_received NFS: Fix a race in nfs_sync_inode() NFS: Clean up nfs_flush_list() NFS: Fix a race with PG_private and nfs_release_page() NFSv4: Ensure the callback daemon flushes signals SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs NFS, NLM: Allow blocking locks to respect signals NFS: Make nfs_fhget() return appropriate error values NFSv4: Fix an oops in nfs4_fill_super lockd: blocks should hold a reference to the nlm_file NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE NFSv4: Send the delegation stateid for SETATTR calls ...
2006-03-24[PATCH] cpuset memory spread: slab cache formatPaul Jackson
Rewrap the overly long source code lines resulting from the previous patch's addition of the slab cache flag SLAB_MEM_SPREAD. This patch contains only formatting changes, and no function change. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-24[PATCH] cpuset memory spread: slab cache filesystemsPaul Jackson
Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD memory spreading. If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's in a cpuset with the 'memory_spread_slab' option enabled goes to allocate from such a slab cache, the allocations are spread evenly over all the memory nodes (task->mems_allowed) allowed to that task, instead of favoring allocation on the node local to the current cpu. The following inode and similar caches are marked SLAB_MEM_SPREAD: file cache ==== ===== fs/adfs/super.c adfs_inode_cache fs/affs/super.c affs_inode_cache fs/befs/linuxvfs.c befs_inode_cache fs/bfs/inode.c bfs_inode_cache fs/block_dev.c bdev_cache fs/cifs/cifsfs.c cifs_inode_cache fs/coda/inode.c coda_inode_cache fs/dquot.c dquot fs/efs/super.c efs_inode_cache fs/ext2/super.c ext2_inode_cache fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr fs/ext3/super.c ext3_inode_cache fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr fs/fat/cache.c fat_cache fs/fat/inode.c fat_inode_cache fs/freevxfs/vxfs_super.c vxfs_inode fs/hpfs/super.c hpfs_inode_cache fs/isofs/inode.c isofs_inode_cache fs/jffs/inode-v23.c jffs_fm fs/jffs2/super.c jffs2_i fs/jfs/super.c jfs_ip fs/minix/inode.c minix_inode_cache fs/ncpfs/inode.c ncp_inode_cache fs/nfs/direct.c nfs_direct_cache fs/nfs/inode.c nfs_inode_cache fs/ntfs/super.c ntfs_big_inode_cache_name fs/ntfs/super.c ntfs_inode_cache fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache fs/ocfs2/super.c ocfs2_inode_cache fs/proc/inode.c proc_inode_cache fs/qnx4/inode.c qnx4_inode_cache fs/reiserfs/super.c reiser_inode_cache fs/romfs/inode.c romfs_inode_cache fs/smbfs/inode.c smb_inode_cache fs/sysv/inode.c sysv_inode_cache fs/udf/super.c udf_inode_cache fs/ufs/super.c ufs_inode_cache net/socket.c sock_inode_cache net/sunrpc/rpc_pipe.c rpc_inode_cache The choice of which slab caches to so mark was quite simple. I marked those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache, inode_cache, and buffer_head, which were marked in a previous patch. Even though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same potentially large file system i/o related slab caches as we need for memory spreading. Given that the rule now becomes "wherever you would have used a SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain. Future file system writers will just copy one of the existing file system slab cache setups and tend to get it right without thinking. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-20NFS: O_DIRECT needs to use a completionTrond Myklebust
Now that we have aio writes, it is possible for dreq->outstanding to be zero, but for the I/O not to have completed. Convert struct nfs_direct_req to use a completion to signal when the I/O is done. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Clean up nfs_get_user_pagesTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: fix compiler warnings on 64-bit platformsChuck Lever
Introduced by NFS aio+dio patches. Test plan: Compile kernel with CONFIG_NFS enabled on 64-bit hardware. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Debugging code for nfs_direct_(read|write)_schedule()Trond Myklebust
Make sure that we're doing our list accounting correctly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: O_DIRECT async IO may lose contextTrond Myklebust
The struct nfs_direct_req currently keeps a pointer to the file descriptor without referencing it. This may cause problems if the parent process is killed. The nfs_open_context should normally have all the information that we're currently using the filp for, and unlike fput(), is safe to release from an rpciod process context. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20nfs: Use UNSTABLE + COMMIT for NFS O_DIRECT writesTrond Myklebust
Currently NFS O_DIRECT writes use FILE_SYNC so that a COMMIT is not necessary. This simplifies the internal logic, but this could be a difficult workload for some servers. Instead, let's send UNSTABLE writes, and after they all complete, send a COMMIT for the dirty range. After the COMMIT returns successfully, then do the wake_up or fire off aio_complete(). Test plan: Async direct I/O tests against Solaris (or any server that requires committed unstable writes). Reboot server during test. Based on an earlier patch by Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: fix data_update accounting in NFS direct I/O pathChuck Lever
^C against "iozone -I" is hitting the assertion in nfs_clear_inode(). Test plan: "iozone -i0 -I -a -c" against a slow server, then control C. This should not cause an oops. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Replace atomic_t variables in nfs_direct_req with a single spin lockChuck Lever
Three atomic_t variables cause a lot of bus locking. Because they are all used in the same places in the code, just use a single spin lock. Now that the atomic_t variables are gone, we can remove the request size limitation since the code no longer depends on the limited width of atomic_t on some platforms. Test plan: Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx operations, iozone, OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: clean up comments and tab damage in direct.cChuck Lever
Clean up tab damage and comments. Replace "file_offset" with more commonly used "pos". Test plan: Compile with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: support EIOCBQUEUED return in direct write pathChuck Lever
For async iocb's, the NFS direct write path now returns EIOCBQUEUED, and calls aio_complete when all the requested writes are finished. The synchronous part of the NFS direct write path behaves exactly as it was before. Shared mapped NFS files will have some coherency difficulties when accessed concurrently with aio+dio. Will need to explore how this is handled in the local file system case. Test plan: aio-stress with "-O". OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: make iocb available everywhere in direct write pathChuck Lever
Pass the iocb argument all the way down to the direct write request scheduler, and make it available in nfs_direct_write_result. Test plan: Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: remove support for multi-segment iovs in the direct write pathChuck Lever
Eliminate the persistent use of automatic storage in all parts of the NFS client's direct write path to pave the way for introducing support for aio against files opened with the O_DIRECT flag. Test plan: Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: make direct write path generate write requests concurrentlyChuck Lever
Duplicate infrastructure from direct read path that will allow write path to generate multiple write requests concurrently. This will enable us to add support for aio in this path. Temporarily we will lose the ability to do UNSTABLE writes followed by a COMMIT in the direct write path. However, all applications I am aware of that use NFS O_DIRECT currently write in relatively small chunks, so this should not be inconvenient in any way. Test plan: Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: create common routine for handling direct I/O completionChuck Lever
Factor out the common piece of completing an NFS direct I/O request. Test plan: Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: create common routine for allocating nfs_direct_reqChuck Lever
Factor out a small common piece of the path that allocate nfs_direct_req structures. Test plan: Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: create common routine for waiting for direct I/O to completeChuck Lever
We're about to add asynchrony to the NFS direct write path. Begin by abstracting out the common pieces in the read path. The first piece is nfs_direct_read_wait, which works the same whether the process is waiting for a read or a write. Test plan: Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: support EIOCBQUEUED return in direct read pathChuck Lever
For async iocb's, the NFS direct read path should return EIOCBQUEUED and call aio_complete when all the requested reads are finished. The synchronous part of the NFS direct read path behaves exactly as it was before. Test plan: aio-stress with "-O". OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: make iocb available everywhere in direct read pathChuck Lever
Pass the iocb argument all the way down to the direct read request scheduler, and make it available in nfs_direct_read_result. Test plan: Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: remove support for multi-segment iovs in the direct read pathChuck Lever
Eliminate the persistent use of automatic storage in all parts of the NFS client's direct read path to pave the way for introducing support for aio against files opened with the O_DIRECT flag. Test plan: Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: use size_t type for holding rsize bytes in NFS O_DIRECT read pathChuck Lever
size_t is used for holding byte counts, so use it for variables storing rsize. Note that the write path will be updated as we add support for async O_DIRECT writes. Test plan: Need to verify that existing comparisons against new size_t variables behave correctly. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: update comments and function definitions in fs/nfs/direct.cChuck Lever
Update to latest coding style standards. Remove block comments on statically defined functions, and place function definitions all on one line. Test plan: Compile kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: clean up NFS client's a_ops->direct_IO methodChuck Lever
The NFS client's a_ops->direct_IO method, nfs_direct_IO, is required to be present to allow NFS files to be opened with O_DIRECT, but is never called because the NFS client shunts reads and writes to files opened with O_DIRECT directly to its own routines. Gut the nfs_direct_IO function. This eliminates the only part of the NFS client's direct I/O path that requires support for multi-segment iovs, allowing further simplification in subsequent patches. Test plan: Compile the kernel with CONFIG_NFS and CONFIG_NFS_DIRECTIO enabled. Millions of fsx-odirect ops. OraSim. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: Cleanup of NFS read codeTrond Myklebust
Same callback hierarchy inversion as for the NFS write calls. This patch is not strictly speaking needed by the O_DIRECT code, but avoids confusing differences between the asynchronous read and write code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-20NFS: add I/O performance countersChuck Lever
Invoke the byte and event counter macros where we want to count bytes and events. Clean-up: fix a possible NULL dereference in nfs_lock, and simplify nfs_file_open. Test-plan: fsx and iozone on UP and SMP systems, with and without pre-emption. Watch for memory overwrite bugs, and performance loss (significantly more CPU required per op). Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-03-14[PATCH] NFS: Fix a potential panic in O_DIRECTTrond Myklebust
Based on an original patch by Mike O'Connor and Greg Banks of SGI. Mike states: A normal user can panic an NFS client and cause a local DoS with 'judicious'(?) use of O_DIRECT. Any O_DIRECT write to an NFS file where the user buffer starts with a valid mapped page and contains an unmapped page, will crash in this way. I haven't followed the code, but O_DIRECT reads with similar user buffers will probably also crash albeit in different ways. Details: when nfs_get_user_pages() calls get_user_pages(), it detects and correctly handles get_user_pages() returning an error, which happens if the first page covered by the user buffer's address range is unmapped. However, if the first page is mapped but some subsequent page isn't, get_user_pages() will return a positive number which is less than the number of pages requested (this behaviour is sort of analagous to a short write() call and appears to be intentional). nfs_get_user_pages() doesn't detect this and hands off the array of pages (whose last few elements are random rubbish from the newly allocated array memory) to it's caller, whence they go to nfs_direct_write_seg(), which then totally ignores the nr_pages it's given, and calculates its own idea of how many pages are in the array from the user buffer length. Needless to say, when it comes to transmit those uninitialised page* pointers, we see a crash in the network stack. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-02-01NFSv3: fix sync_retry in direct i/o NFSDirk Mueller
Only do a sync_retry if the memcmp failed. Signed-off-by: Dirk Mueller <dmueller@suse.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: Make directIO aware of compound pages...Trond Myklebust
...and avoid calling set_page_dirty on them Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: support large reads and writes on the wireChuck Lever
Most NFS server implementations allow up to 64KB reads and writes on the wire. The Solaris NFS server allows up to a megabyte, for instance. Now the Linux NFS client supports transfer sizes up to 1MB, too. This will help reduce protocol and context switch overhead on read/write intensive NFS workloads, and support larger atomic read and write operations on servers that support them. Test-plan: Connectathon and iozone on mount point with wsize=rsize>32768 over TCP. Tests with NFS over UDP to verify the maximum RPC payload size cap. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFS: use generic_write_checks() to sanity check direct writesChuck Lever
Replace ad hoc write parameter sanity checking in nfs_file_direct_write() with a call to generic_write_checks(). This should make the proper checks modulo the O_LARGEFILE flag, and should catch NFSv2-specific limitations by virtue of i_sb->s_maxbytes. Test plan: Posix compliance testing with both NFSv2 and NFSv3. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06NFSv4: stateful NFSv4 RPC call interfaceTrond Myklebust
The NFSv4 model requires us to complete all RPC calls that might establish state on the server whether or not the user wants to interrupt it. We may also need to schedule new work (including new RPC calls) in order to cancel the new state. The asynchronous RPC model will allow us to ensure that RPC calls always complete, but in order to allow for "synchronous" RPC, we want to add the ability to wait for completion. The waits are, of course, interruptible. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-01-06RPC: Clean up RPC task structureTrond Myklebust
Shrink the RPC task structure. Instead of storing separate pointers for task->tk_exit and task->tk_release, put them in a structure. Also pass the user data pointer as a parameter instead of passing it via task->tk_calldata. This enables us to nest callbacks. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-12-19NFS: Fix another O_DIRECT raceTrond Myklebust
Ensure we call unmap_mapping_range() and sync dirty pages to disk before doing an NFS direct write. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-11-04NFS,SUNRPC,NLM: fix unused variable warnings when CONFIG_SYSCTL is disabledChuck Lever
Fix some dprintk's so that NLM, NFS client, and RPC client compile cleanly if CONFIG_SYSCTL is disabled. Test plan: Compile kernel with CONFIG_NFS enabled and CONFIG_SYSCTL disabled. Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-06-23[PATCH] Remove f_error field from struct fileChristoph Lameter
The following patch removes the f_error field and all checks of f_error. Trond said: f_error was introduced for NFS, and made sense when we were guaranteed always to have a file pointer around when write errors occurred. Since then, we have (for various reasons) had to introduce the nfs_open_context in order to track the file read/write state, and it made sense to move our f_error tracking there too. Signed-off-by: Christoph Lameter <christoph@lameter.com> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-22[PATCH] NFS: Fix the file size revalidationTrond Myklebust
Instead of looking at whether or not the file is open for writes before we accept to update the length using the server value, we should rather be looking at whether or not we are currently caching any writes. Failure to do so means in particular that we're not updating the file length correctly after obtaining a POSIX or BSD lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!