aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2008-07-09SUNRPC: Use only rpcbind v2 for AF_INET requestsChuck Lever
Some server vendors support the higher versions of rpcbind only for AF_INET6. The kernel doesn't need to use v3 or v4 for AF_INET anyway, so change the kernel's rpcbind client to query AF_INET servers over rpcbind v2 only. This has a few interesting benefits: 1. If the rpcbind request is going over TCP, and the server doesn't support rpcbind versions 3 or 4, the client reduces by two the number of ephemeral ports left in TIME_WAIT for each rpcbind request. This will help during NFS mount storms. 2. The rpcbind interaction with servers that don't support rpcbind versions 3 or 4 will use less network traffic. Also helpful during mount storms. 3. We can eliminate the kernel build option that controls whether the kernel's rpcbind client uses rpcbind version 3 and 4 for AF_INET servers. Less complicated kernel configuration... Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Use GETADDR for rpcbind version 4 queriesChuck Lever
Some rpcbind servers that do support rpcbind version 4 do not support the GETVERSADDR procedure. Use GETADDR for querying rpcbind servers via rpcbind version 4 instead of GETVERSADDR. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Use rpcbind version 2 GETPORTChuck Lever
Clean up: Change the version 2 procedure name to GETPORT. It's the same procedure number as GETADDR, but version 2 implementations usually refer to it as GETPORT. This also now matches the procedure name used in the version 2 procedure entry in the rpcb_next_version[] array, making it slightly less confusing. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Document some naked integers in rpcbind clientChuck Lever
Clean up: Replace naked integers that represent rpcbind protocol versions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: More useful debugging output for rpcb clientChuck Lever
Clean up dprintk's in rpcb client's XDR decoder functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09nfs4: fix potential race with rapid nfs_callback_up/down cycleJeff Layton
If the nfsv4 callback thread is rapidly brought up and down, it's possible that nfs_callback_svc might never get a chance to run. If this happens, the cleanup at thread exit might never occur, throwing the refcounting off and nfs_callback_info in an incorrect state. Move the clean functions into nfs_callback_down. Also change the nfs_callback_info struct to track the svc_rqst rather than svc_serv since we need to know that to call svc_exit_thread. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09nfs4: remove BKL from nfs_callback_up and nfs_callback_downJeff Layton
The nfs_callback_mutex is sufficient protection. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09nfs: initialize timeout variable in nfs4_proc_setclientid_confirmBenny Halevy
gcc (4.3.0) rightfully warns about this: /usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/nfs/nfs4proc.c: In function ‘nfs4_proc_setclientid_confirm’: /usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/nfs/nfs4proc.c:2936: warning: ‘timeout’ may be used uninitialized in this function nfs4_delay that's passed a pointer to 'timeout' is looking at its value and sets it up to some value in the range: NFS4_POLL_RETRY_MIN..NFS4_POLL_RETRY_MAX if (*timeout <= 0) *timeout = NFS4_POLL_RETRY_MIN; if (*timeout > NFS4_POLL_RETRY_MAX) *timeout = NFS4_POLL_RETRY_MAX; Therefore it will end up set to some sane, though rather indeterministic, value. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: handle interface identifiers in incoming IPv6 addressesChuck Lever
Add support in the kernel NFS client's address parser for interface identifiers. IPv6 link-local addresses require an additional "interface identifier", which is a network device name or an integer that indexes the array of local network interfaces. They are suffixed to the address with a '%'. For example: fe80::215:c5ff:fe3b:e1b2%2 indicates an interface index of 2. Or fe80::215:c5ff:fe3b:e1b2%eth0 indicates that requests should be routed through the eth0 device. Without the interface ID, link-local addresses are not usable for NFS. Both the kernel NFS client mount option parser and the mount.nfs command can take either form. The mount.nfs command always passes the address through getnameinfo(3), which usually re-writes interface indices as device names. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Add string length argument to nfs_parse_server_addressChuck Lever
To make nfs_parse_server_address() more generally useful, allow it to accept input strings that are not terminated with '\0'. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Support raw IPv6 address hostnames during NFS mount operationChuck Lever
Traditionally the mount command has looked for a ":" to separate the server's hostname from the export path in the mounted on device name, like this: mount server:/export /mounted/on/dir The server's hostname is "server" and the export path is "/export". You can also substitute a specific IPv4 network address for the server hostname, like this: mount 192.168.0.55:/export /mounted/on/dir Raw IPv6 addresses present a problem, however, because they look something like this: fe80::200:5aff:fe00:30b Note the use of colons. To get around the presence of colons, copy the Solaris convention used for mounting IPv6 servers by address: wrap a raw IPv6 address with square brackets. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3Chuck Lever
To support passing a raw IPv6 address as a server hostname, we need to expand the logic that handles splitting the passed-in device name into a server hostname and export path Start by pulling device name parsing out of the mount option validation functions and into separate helper functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Fix a dependency on CONFIG_NFS_V4 in nfs_remountTrond Myklebust
Fix the 'nfs4_fs_type' undeclared error in nfs_remount when compiling sans NFSv4... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jeff Layton <jlayton@redhat.com>
2008-07-09NFS: Allow redirtying of a completed unstable write.Trond Myklebust
Currently, if an unstable write completes, we cannot redirty the page in order to reflect a new change in the page data until after we've sent a COMMIT request. This patch allows a page rewrite to proceed without the unnecessary COMMIT step, putting it immediately back onto the dirty page list, undoing the VM unstable write accounting, and removing the NFS_PAGE_TAG_COMMIT tag from the NFS radix tree. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Clean up nfs_update_request()Trond Myklebust
Simplify the loop in nfs_update_request by moving into a separate function the code that attempts to update an existing cached NFS write. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: missing newline in NFS mount debugging messageChuck Lever
Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Treat "intr" and "nointr" options as deprecatedChuck Lever
Clean up: the "intr" and "nointr" mount options were recently retired. Document this in the NFS mount option parser. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Allow any value for the "retry" optionChuck Lever
The kernel NFS mount option parser should ignore the retry= mount option since it is meaningful only in user space. Today it expects a number rather than arbitrary text, so it ignores the option if the value is numeric, but chokes if there are other characters in the value. Change it to allow any text (except ",") as its value. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Ensure we zap only the access and acl caches when setting new aclsTrond Myklebust
...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the acls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Fix a warning in nfs4_async_handle_errorTrond Myklebust
We're not modifying the nfs_server when we call nfs_inc_server_stats and friends, so allow the compiler to pass 'const' pointers too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Move fs/nfs/iostat.h to include/linuxChuck Lever
The fs/nfs/iostat.h header has definitions that were designed to be exposed to user space. Move these definitions under include/linux so user space can use the definitions in applications that read /proc/self/mountstats. Also address a handful of coding style issues called out by checkpatch.pl in fs/nfs/iostat.h. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Remove the redundant file_open entry from struct nfs_rpc_opsTrond Myklebust
All instances are set to nfs_open(), so we should just remove the redundant indirection. Ditto for the file_release op Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Ensure all transports set rq_xtime consistentlyChuck Lever
The RPC client uses the rq_xtime field in each RPC request to determine the round-trip time of the request. Currently, the rq_xtime field is initialized by each transport just before it starts enqueing a request to be sent. However, transports do not handle initializing this value consistently; sometimes they don't initialize it at all. To make the measurement of request round-trip time consistent for all RPC client transport capabilities, pull rq_xtime initialization into the RPC client's generic transport logic. Now all transports will get a standardized RTT measure automatically, from: xprt_transmit() to xprt_complete_rqst() This makes round-trip time calculation more accurate for the TCP transport. The socket ->sendmsg() method can return "-EAGAIN" if the socket's output buffer is full, so the TCP transport's ->send_request() method may call the ->sendmsg() method repeatedly until it gets all of the request's bytes queued in the socket's buffer. Currently, the TCP transport sets the rq_xtime field every time through that loop so the final value is the timestamp just before the *last* call to the underlying socket's ->sendmsg() method. After this patch, the rq_xtime field contains a timestamp that reflects the time just before the *first* call to ->sendmsg(). This is consequential under heavy workloads because large requests often take multiple ->sendmsg() calls to get all the bytes of a request queued. The TCP transport causes the request to sleep until the remote end of the socket has received enough bytes to clear space in the socket's local output buffer. This delay can be quite significant. The method introduced by this patch is a more accurate measure of RTT for stream transports, since the server can cause enough back pressure to delay (ie increase the latency of) requests from the client. Additionally, this patch corrects the behavior of the RDMA transport, which entirely neglected to initialize the rq_xtime field. RPC performance metrics for RDMA transports now display correct RPC request round trip times. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Tom Talpey <thomas.talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Fix the ftruncate() credential problemTrond Myklebust
ftruncate() access checking is supposed to be performed at open() time, just like reads and writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09rpc: minor cleanup of scheduler callback code\\\"J. Bruce Fields\\\
Try to make the comment here a little more clear and concise. Also, this macro definition seems unnecessary. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09rpc: remove some unused macros\\\"J. Bruce Fields\\\
There used to be a print_hexl() function that used isprint(), now gone. I don't know why NFS_NGROUPS and CA_RUN_AS_MACHINE were here. I also don't know why another #define that's actually used was marked "unused". Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09rpc: eliminate unused variable in auth_gss upcall code\\\"J. Bruce Fields\\\
Also, a minor comment grammar fix in the same file. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09rpc: bring back cl_chattyOlga Kornievskaia
The cl_chatty flag alows us to control whether a given rpc client leaves "server X not responding, timed out" messages in the syslog. Such messages make sense for ordinary nfs clients (where an unresponsive server means applications on the mountpoint are probably hanging), but not for the callback client (which can fail more commonly, with the only result just of disabling some optimizations). Previously cl_chatty was removed, do to lack of users; reinstate it, and use it for the nfsd's callback client. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: implement option checking when remounting NFS filesystems (resend)Jeff Layton
When remounting an NFS or NFS4 filesystem, the new NFS options are not respected, yet the remount will still return success. This patch adds a remount_fs sb op for NFS that checks any new nfs mount options against the existing ones and fails the mount if any have changed. This is only implemented for string-based mount options since doing this with binary options isn't really feasible. This is essentially the same as the original patch I sent out, but adds a check to see if the addr= option has changed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09fs/nfs/nfsroot.c: remove CVS keywordAdrian Bunk
This patch removes a CVS keyword that wasn't updated for a long time from a comment. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Remove obsolete messages during transport connectChuck Lever
Recent changes to the RPC client's transport connect logic make connect status values ECONNREFUSED and ECONNRESET impossible. Clean up xprt_connect_status() to account for these changes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Fix trace debugging nits in write.cChuck Lever
Clean up: fix a few dprintk messages that still need to show the RPC task ID correctly, and be sure we use the preferred %lld or %llu instead of %Ld or %Lu. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Use NFSDBG_FILE for all fopsChuck Lever
Clean up: some fops use NFSDBG_FILE, some use NFSDBG_VFS. Let's use NFSDBG_FILE for all fops, and consistently report file names instead of inode numbers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Add debugging facility for NFS aopsChuck Lever
Recent work in fs/nfs/file.c neglected to add appropriate trace debugging for the NFS client's address space operations. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Make nfs_open methods consistentChuck Lever
Clean up: Report the same debugging info and count function calls the same for files and directories in nfs_opendir() and nfs_file_open(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Make nfs_llseek methods consistentChuck Lever
Clean up: Report the same debugging info in nfs_llseek_dir() and nfs_llseek_file(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Make nfs_fsync methods consistentChuck Lever
Clean up: Report the same debugging info, count function calls the same, and use similar function naming in nfs_fsync_dir() and nfs_fsync(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Display some debugging information as text rather than numbersChuck Lever
In rpc_show_tasks(), display the program name, version number, procedure name and tk_action as human-readable variable-length text fields rather than columnar numbers. Doing the symbol lookup here helps in cases where we have actual debugging output from a kernel log, but don't have access to the kernel image or RPC module that generated the output. Sample output: -pid- flgs status -client- --rqstp- -timeout ---ops-- 5608 0001 -11 eeb42690 f6d93710 0 f8fa1764 nfsv3 WRITE a:call_transmit_status q:none 5609 0001 -11 eeb42690 f6d937e0 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending 5610 0001 -11 eeb42690 f6d93230 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending 5611 0001 -11 eeb42690 f6d93300 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending 5612 0001 -11 eeb42690 f6d93090 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending 5613 0001 -11 eeb42690 f6d933d0 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending 5614 0001 -11 eeb42690 f6d93cc0 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending 5615 0001 -11 eeb42690 f6d93a50 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending 5616 0001 -11 eeb42690 f6d93640 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending 5617 0001 -11 eeb42690 f6d93b20 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending 5618 0001 -11 eeb42690 f6d93160 0 f8fa1764 nfsv3 WRITE a:call_status q:xprt_sending Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Refactor rpc_show_tasksChuck Lever
Clean up: move the logic that displays each task to its own function. This removes indentation and makes future changes easier. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Don't display the rpc_show_tasks header if there are no tasksChuck Lever
Clean up: don't display the rpc_show_tasks column header unless there is at least one task to display. As far as I can tell, it is safe to let the list_for_each_entry macro decide that each list is empty. scripts/checkpatch.pl also wants a KERN_FOO at the start of any newly added printk() calls, so this and subsequent patches will also add KERN_INFO. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Rename "call_" functions that are no longer FSM statesChuck Lever
The RPC client uses a finite state machine to move RPC tasks through each step of an RPC request. Each state is contained in a function in net/sunrpc/clnt.c, and named call_foo. Some of the functions named call_foo have changed over the past few years and are no longer states in the FSM. These include: call_encode, call_header, and call_verify. As a clean up, rename the functions that have changed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Add a function to display the name of an RPC procedureChuck Lever
Improve debugging messages in call_start() and call_verify() by having them show the RPC procedure name instead of the procedure number. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Update help text for CONFIG_NFS_FSChuck Lever
Clean up: refresh the help text for Kconfig items related to the NFS client. Remove obsolete URLs, and make the language consistent among the options. Also move the ROOT_NFS config option next to the options related to the NFS client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: do_setlk(): don't flush caches when we have a delegationTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Use GFP_NOFS when allocating credentialsTrond Myklebust
Since the credentials may be allocated during the call to rpc_new_task(), which again may be called by a memory allocator... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Revert commit 44dd151dTrond Myklebust
Revert commit 44dd151d "NFS: Don't mark a written page as uptodate until it is on disk". While it is true that the write may fail, that is always the case. There is no reason why we should treat data on pages that are not already marked as PG_uptodate as being special. The only thing we gain is a noticeable slowdown when re-reading these pages. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Optimise append writes with holesTrond Myklebust
If a file is being extended, and we're creating a hole, we might as well declare the entire page to be up to date. This patch significantly improves the write performance for sparse files in the case where lseek(SEEK_END) is used to append several non-contiguous writes at intervals of < PAGE_SIZE. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: An ENOMEM error from call_encode is always fatalTrond Myklebust
The special 'ENOMEM' case that was previously flagged as non-fatal is bogus: auth_gss always returns EAGAIN for non-fatal errors, and may in fact return ENOMEM in the special case where xdr_buf_read_netobj runs out of preallocated buffer space (invariably a _fatal_ error, since there is no provision for preallocating larger buffers). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09SUNRPC: Ensure we exit early in case of an encode errorTrond Myklebust
All errors from call_encode(), with exception of EAGAIN are fatal, so we should immediately return instead of proceeding to xprt_transmit(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09NFS: Add correct bounds checking to NFSv2 locksTrond Myklebust
NFSv2 file locking currently fails the Connectathon tests, because the calls to the VFS locking code do not return an EINVAL error if the struct file_lock overflows the 32-bit boundaries. The problem is due to the fact that we occasionally call helpers from fs/locks.c in order to avoid RPC calls to the server when we know that a local process holds the lock. These helpers are, of course, always 64-bit enabled, so EINVAL is not returned in cases when it would if the call had gone to the NLM code. For consistency, we therefore add support for a bounds-checking helper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>