aboutsummaryrefslogtreecommitdiff
path: root/net/sunrpc/xprtrdma
AgeCommit message (Collapse)Author
2008-04-23SVCRDMA: Add check for XPT_CLOSE in svc_rdma_sendTom Tucker
SVCRDMA: Add check for XPT_CLOSE in svc_rdma_send The svcrdma transport can crash if a send is waiting for an empty SQ slot and the connection is closed due to an asynchronous error. The crash is caused when svc_rdma_send attempts to send on a deleted QP. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-16IB/core: Add support for "send with invalidate" work requestsRoland Dreier
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a "send with invalidate" work request as defined in the iWARP verbs and the InfiniBand base memory management extensions. Also put "imm_data" and a new "invalidate_rkey" member in a new "ex" union in struct ib_send_wr. The invalidate_rkey member can be used to pass in an R_Key/STag to be invalidated. Add this new union to struct ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in ib_uverbs_post_send(). Fix up low-level drivers to deal with the change to struct ib_send_wr, and just remove the imm_data initialization from net/sunrpc/xprtrdma/, since that code never does any send with immediate operations. Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since the iWARP drivers currently in the tree set the bit. The amso1100 driver at least will silently fail to honor the IB_SEND_INVALIDATE bit if passed in as part of userspace send requests (since it does not implement kernel bypass work request queueing). Remove the flag from all existing drivers that set it until we know which ones are OK. The values chosen for the new flag is not consecutive to avoid clashing with flags defined in the XRC patches, which are not merged yet but which are already in use and are likely to be merged soon. This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-03-26SVCRDMA: Check num_sge when setting LAST_CTXT bitTom Tucker
The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly when the last chunk in the read-list spanned multiple pages. This resulted in a kernel panic when the wrong context was used to build the RPC iovec page list. RDMA_READ is used to fetch RPC data from the client for NFS_WRITE requests. A scatter-gather is used to map the advertised client side buffer to the server-side iovec and associated page list. WR contexts are used to convey which scatter-gather entries are handled by each WR. When the write data is large, a single RPC may require multiple RDMA_READ requests so the contexts for a single RPC are chained together in a linked list. The last context in this list is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes, the CQ handler code can enqueue the RPC for processing. The code in rdma_read_xdr was setting this bit on the last two contexts on this list when the last read-list chunk spanned multiple pages. This caused the svc_rdma_recvfrom logic to incorrectly build the RPC and caused the kernel to crash because the second-to-last context doesn't contain the iovec page list. Modified the condition that sets this bit so that it correctly detects the last context for the RPC. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Tested-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-24SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adaptersRoland Dreier
The iWARP protocol limits RDMA read requests to a single scatter entry. NFS/RDMA has code in rdma_read_max_sge() that is supposed to limit the sge_count for RDMA read requests to 1, but the code to do that is inside an #ifdef RDMA_TRANSPORT_IWARP block. In the mainline kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a preprocessor #define, so the #ifdef'ed code is never compiled. In my test of a kernel build with -j8 on an NFS/RDMA mount, this problem eventually leads to trouble starting with: svcrdma: Error posting send = -22 svcrdma : RDMA_READ error = -22 and things go downhill from there. The trivial fix is to delete the #ifdef guard. The check seems to be a remnant of when the NFS/RDMA code was not merged and needed to compile against multiple kernel versions, although I don't think it ever worked as intended. In any case now that the code is upstream there's no need to test whether the RDMA_TRANSPORT_IWARP constant is defined or not. Without this patch, my kernel build on an NFS/RDMA mount using NetEffect adapters quickly and 100% reproducibly failed with an error like: ld: final link failed: Software caused connection abort With the patch applied I was able to complete a kernel build on the same setup. (Tom Tucker says this is "actually an _ancient_ remnant when it had to compile against iWARP vs. non-iWARP enabled OFA trees.") Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-12SVCRDMA: Fix erroneous BUG_ON in send_writeTom Tucker
The assertion that checks for sge context overflow is incorrectly hard-coded to 32. This causes a kernel bug check when using big-data mounts. Changed the BUG_ON to use the computed value RPCSVC_MAXPAGES. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-12SVCRDMA: Add xprt refs to fix close/unmount crashTom Tucker
RDMA connection shutdown on an SMP machine can cause a kernel crash due to the transport close path racing with the I/O tasklet. Additional transport references were added as follows: - A reference when on the DTO Q to avoid having the transport deleted while queued for I/O. - A reference while there is a QP able to generate events. - A reference until the DISCONNECTED event is received on the CM ID Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-07SUNRPC: Fix a nfs4 over rdma transport oopsTom Talpey
Prevent an RPC oops when freeing a dynamically allocated RDMA buffer, used in certain special-case large metadata operations. Signed-off-by: Tom Talpey <tmt@netapp.com> Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-10SUNPRC: Fix printk format warningRoland Dreier
net/sunrpc/xprtrdma/svc_rdma_sendto.c:160: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'u64' Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01rdma: makefileTom Tucker
Add the svcrdma module to the xprtrdma makefile. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01rdma: ONCRPC RDMA protocol marshallingTom Tucker
This logic parses the ONCRDMA protocol headers that precede the actual RPC header. It is placed in a separate file to keep all protocol aware code in a single place. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01rdma: SVCRDMA sendtoTom Tucker
This file implements the RDMA transport sendto function. A RPC reply on an RDMA transport consists of some number of RDMA_WRITE requests followed by an RDMA_SEND request. The sendto function parses the ONCRPC RDMA reply header to determine how to send the reply back to the client. The send queue is sized so as to be able to send complete replies for requests in most cases. In the event that there are not enough SQ WR slots to reply, e.g. big data, the send will block the NFSD thread. The I/O callback functions in svc_rdma_transport.c that reap WR completions wake any waiters blocked on the SQ. In general, the goal is not to block NFSD threads and the has_wspace method stall requests when the SQ is nearly full. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01rdma: SVCRDMA recvfromTom Tucker
This file implements the RDMA transport recvfrom function. The function dequeues work reqeust completion contexts from an I/O list that it shares with the I/O tasklet in svc_rdma_transport.c. For ONCRPC RDMA, an RPC may not be complete when it is received. Instead, the RDMA header that precedes the RPC message informs the transport where to get the RPC data from on the client and where to place it in the RPC message before it is delivered to the server. The svc_rdma_recvfrom function therefore, parses this RDMA header and issues any necessary RDMA operations to fetch the remainder of the RPC from the client. Special handling is required when the request involves an RDMA_READ. In this case, recvfrom submits the RDMA_READ requests to the underlying transport driver and then returns 0. When the transport completes the last RDMA_READ for the request, it enqueues it on a read completion queue and enqueues the transport. The recvfrom code favors this queue over the regular DTO queue when satisfying reads. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01rdma: SVCRDMA Core Transport ServicesTom Tucker
This file implements the core transport data management and I/O path. The I/O path for RDMA involves receiving callbacks on interrupt context. Since all the svc transport locks are _bh locks we enqueue the transport on a list, schedule a tasklet to dequeue data indications from the RDMA completion queue. The tasklet in turn takes _bh locks to enqueue receive data indications on a list for the transport. The svc_rdma_recvfrom transport function dequeues data from this list in an NFSD thread context. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01rdma: SVCRDMA Transport ModuleTom Tucker
This file implements the RDMA transport module initialization and termination logic and registers the transport sysctl variables. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-01-30SUNRPC: Clean up functions that free address_strings arrayChuck Lever
Clean up: document the rule (kfree) and the exceptions (RPC_DISPLAY_PROTO and RPC_DISPLAY_NETID) when freeing the objects in a transport's address_strings array. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Add support for per-client timeout valuesTrond Myklebust
In order to be able to support setting the timeo and retrans parameters on a per-mountpoint basis, we move the rpc_timeout structure into the rpc_clnt. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Clean up the transport timeout initialisationTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Check a return resultChuck Lever
Minor: Replace an empty if statement with a debugging dprintk. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Thomas Talpey <Thomas.Talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Fix an unnecessary implicit type cast in rpcrdma_count_chunks()Chuck Lever
Nit: rl_nchunks is an unsigned integer, so pass it into rpcrdma_count_chunks() via an unsigned integer argument. This eliminates a harmless mixed sign comparison in rpcrdma_count_chunks() Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Thomas Talpey <Thomas.Talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Prevent mixed sign comparisons in rpcrdma_convert_iovs()Chuck Lever
Keep the type of the buffer position the same during iovec conversion to reduce the likelihood of unexpected results from comparisons and length computations. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Thomas Talpey <Thomas.Talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30SUNRPC: Rename xprt_disconnect()Trond Myklebust
xprt_disconnect() should really only be called when the transport shutdown is completed, and it is time to wake up any pending tasks. Rename it to xprt_disconnect_done() in order to reflect the semantical change. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-28[SUNRPC]: Use htonl() where appropriate.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-12-11SUNRPC xprtrdma: fix XDR tail buf marshalling for all opsJames Lentini
rpcrdma_convert_iovs is passed an xdr_buf representing either an RPC request or an RPC reply. In the case of a request, several calculations and tests involving pos are unnecessary. In the case of a reply, several calculations and tests involving pos are incorrect (the code tests pos against the reply xdr buf's len field, which is always 0 at the time rpcrdma_convert_iovs is executed). This change removes the incorrect/unnecessary calculations and tests involving pos. This fixes an observed problem when reading certain file sizes over NFS/RDMA. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26SUNRPC: remove NFS/RDMA client's binary sysctlsJames Lentini
Support for binary sysctls is being deprecated in 2.6.24. Since there are no applications using the NFS/RDMA client's binary sysctls, it makes sense to remove them. The patch below does this while leaving the /proc/sys interface unchanged. Please consider this for 2.6.24. Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-14sunrpc/xprtrdma/transport.c: fix use-after-freeAdrian Bunk
Fix an obvious use-after-free spotted by the Coverity checker. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-30[SUNRPC] rpc_rdma: we need to cast u64 to unsigned long long for printingStephen Rothwell
as some architectures have unsigned long for u64. net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_create_chunks': net/sunrpc/xprtrdma/rpc_rdma.c:222: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64' net/sunrpc/xprtrdma/rpc_rdma.c:234: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64' net/sunrpc/xprtrdma/rpc_rdma.c: In function 'rpcrdma_count_chunks': net/sunrpc/xprtrdma/rpc_rdma.c:577: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64 Noticed on PowerPC pseries_defconfig build. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-29SUNRPC endianness annotationsAl Viro
rpcrdma stuff lacks endianness annotations for on-the-wire data. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16net/sunrpc/xprtrdma/verbs.c printk warning fixAndrew Morton
sparc64: net/sunrpc/xprtrdma/verbs.c:1264: warning: long long unsigned int format, u64 arg (arg 3) net/sunrpc/xprtrdma/verbs.c:1264: warning: long long unsigned int format, u64 arg (arg 4) Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "David S. Miller" <davem@davemloft.net> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-09RPCRDMA: rpc rdma verbs interface implementation\"Talpey, Thomas\
This implements the interface from rpcrdma to the RDMA verbs interface supported by Infniband and iWARP. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09RPCRDMA: rpc rdma protocol implementation\"Talpey, Thomas\
This implements the marshaling and unmarshaling of the rpcrdma transport headers. Connection management is also addressed. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09RPCRDMA: rpc rdma transport switch\"Talpey, Thomas\
This implements the configuration and building of the core transport switch implementation of the rpcrdma transport. Stubs are provided for the rpcrdma protocol handling, and the infiniband/iwarp verbs interface. These are provided in following patches. Signed-off-by: Tom Talpey <talpey@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>