aboutsummaryrefslogtreecommitdiff
path: root/include/linux/sunrpc/svc.h
AgeCommit message (Collapse)Author
2009-06-17nfs41: sunrpc: add a struct svc_xprt pointer to struct svc_serv for ↵Andy Adamson
backchannel use This svc_xprt is passed on to the callback service thread to be later used to processes incoming svc_rqst's Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17nfs41: Backchannel bc_svc_process()Ricardo Labiaga
Implement the NFSv4.1 backchannel service. Invokes the common callback processing logic svc_process_common() to authenticate the call and dispatch the appropriate NFSv4.1 XDR decoder and operation procedure. It then invokes bc_send() to send the reply over the same connection. bc_send() is implemented in a separate patch. At this time there is no slot validation or reply cache handling. [nfs41: Preallocate rpc_rqst receive buffer for handling callbacks] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [Move bc_svc_process() declaration to correct patch] Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17nfs41: client callback structuresRicardo Labiaga
Adds new list of rpc_xprt structures, and a readers/writers lock to protect the list. The list is used to preallocate resources for the backchannel during backchannel requests. Callbacks are not expected to cause significant latency, so only one callback will be allowed at this time. It also adds a pointer to the NFS callback service so that requests can be directed to it for processing. New callback members added to svc_serv. The NFSv4.1 callback service will sleep on the svc_serv->svc_cb_waitq until new callback requests arrive. The request will be queued in svc_serv->svc_cb_list. This patch adds this list, the sleep queue and spinlock to svc_serv. [nfs41: NFSv4.1 callback support] Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-04-06Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits) nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4 nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc nfsd41: Documentation/filesystems/nfs41-server.txt nfsd41: CREATE_EXCLUSIVE4_1 nfsd41: SUPPATTR_EXCLCREAT attribute nfsd41: support for 3-word long attribute bitmask nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify nfsd41: pass writable attrs mask to nfsd4_decode_fattr nfsd41: provide support for minor version 1 at rpc level nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap nfsd41: access_valid nfsd41: clientid handling nfsd41: check encode size for sessions maxresponse cached nfsd41: stateid handling nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op nfsd41: destroy_session operation nfsd41: non-page DRC for solo sequence responses nfsd41: Add a create session replay cache nfsd41: create_session operation ...
2009-04-03nfsd41: hard page limit for DRCAndy Adamson
Use no more than 1/128th of the number of free pages at nfsd startup for the v4.1 DRC. This is an arbitrary default which should probably end up under the control of an administrator. Signed-off-by: Andy Adamson <andros@netapp.com> [moved added fields in struct svc_serv under CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [fix set_max_drc calculation of sv_drc_max_pages] [moved NFSD_DRC_SIZE_SHIFT's declaration up in header file] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd: don't use the deferral service, return NFS4ERR_DELAYAndy Adamson
On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be returned. It is up to the NFSv4.1 client to resend only the operations that have not been processed. Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in nfsd4_proc_compound() only when NFSv4.1 Sessions are used. Note: this isn't an adequate solution on its own. It's acceptable as a way to get some minimal 4.1 up and working, but we're going to have to find a way to avoid returning DELAY in all common cases before 4.1 can really be considered ready. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: reverse rq_nodeferral negative logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: initialize rq_usedeferral] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-03-28SUNRPC: Remove @family argument from svc_create() and svc_create_pooled()Chuck Lever
Since an RPC service listener's protocol family is specified now via svc_create_xprt(), it no longer needs to be passed to svc_create() or svc_create_pooled(). Remove that argument from the synopsis of those functions, and remove the sv_family field from the svc_serv struct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-28SUNRPC: Pass a family argument to svc_register()Chuck Lever
The sv_family field is going away. Instead of using sv_family, have the svc_register() function take a protocol family argument. Since this argument represents a protocol family, and not an address family, this argument takes an int, as this is what is passed to sock_create_kern(). Also make sure svc_register's helpers are checking for PF_FOO instead of AF_FOO. The value of [AP]F_FOO are equivalent; this is simply a symbolic change to reflect the semantics of the value stored in that variable. sock_create_kern() should return EPFNOSUPPORT if the passed-in protocol family isn't supported, but it uses EAFNOSUPPORT for this case. We will stick with that tradition here, as svc_register() is called by the RPC server in the same path as sock_create_kern(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-18knfsd: add file to export stats about nfsd poolsGreg Banks
Add /proc/fs/nfsd/pool_stats to export to userspace various statistics about the operation of rpc server thread pools. This patch is based on a forward-ported version of knfsd-add-pool-thread-stats which has been shipping in the SGI "Enhanced NFS" product since 2006 and which was previously posted: http://article.gmane.org/gmane.linux.nfs/10375 It has also been updated thus: * moved EXPORT_SYMBOL() to near the function it exports * made the new struct struct seq_operations const * used SEQ_START_TOKEN instead of ((void *)1) * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of printk in svc_pool_stats_*()" by Harshula Jayasuriya. * merged fix from SGI PV 964001 "Crash reading pool_stats before nfsds are started". Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: Harshula Jayasuriya <harshula@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-03-18knfsd: avoid overloading the CPU scheduler with enormous load averagesGreg Banks
Avoid overloading the CPU scheduler with enormous load averages when handling high call-rate NFS loads. When the knfsd bottom half is made aware of an incoming call by the socket layer, it tries to choose an nfsd thread and wake it up. As long as there are idle threads, one will be woken up. If there are lot of nfsd threads (a sensible configuration when the server is disk-bound or is running an HSM), there will be many more nfsd threads than CPUs to run them. Under a high call-rate low service-time workload, the result is that almost every nfsd is runnable, but only a handful are actually able to run. This situation causes two significant problems: 1. The CPU scheduler takes over 10% of each CPU, which is robbing the nfsd threads of valuable CPU time. 2. At a high enough load, the nfsd threads starve userspace threads of CPU time, to the point where daemons like portmap and rpc.mountd do not schedule for tens of seconds at a time. Clients attempting to mount an NFS filesystem timeout at the very first step (opening a TCP connection to portmap) because portmap cannot wake up from select() and call accept() in time. Disclaimer: these effects were observed on a SLES9 kernel, modern kernels' schedulers may behave more gracefully. The solution is simple: keep in each svc_pool a counter of the number of threads which have been woken but have not yet run, and do not wake any more if that count reaches an arbitrary small threshold. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. The server was running 128 nfsds. Profiling showed schedule() taking 6.7% of every CPU, and __wake_up() taking 5.2%. This patch drops those contributions to 3.0% and 2.2%. Load average was over 120 before the patch, and 20.9 after. This patch is a forward-ported version of knfsd-avoid-nfsd-overload which has been shipping in the SGI "Enhanced NFS" product since 2006. It has been posted before: http://article.gmane.org/gmane.linux.nfs/10374 Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-06sunrpc: add sv_maxconn field to svc_serv (try #3)Jeff Layton
svc_check_conn_limits() attempts to prevent denial of service attacks by having the service close old connections once it reaches a threshold. This threshold is based on the number of threads in the service: (serv->sv_nrthreads + 3) * 20 Once we reach this, we drop the oldest connections and a printk pops to warn the admin that they should increase the number of threads. Increasing the number of threads isn't an option however for services like lockd. We don't want to eliminate this check entirely for such services but we need some way to increase this limit. This patch adds a sv_maxconn field to the svc_serv struct. When it's set to 0, we use the current method to calculate the max number of connections. RPC services can then set this on an as-needed basis. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29SUNRPC: Make svc_addr's argument a constantChuck Lever
Clean up: Add extra type safety and squelch a few compiler complaints in upcoming patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29SUNRPC: Support IPv6 when registering kernel RPC servicesChuck Lever
In order to advertise NFS-related services on IPv6 interfaces via rpcbind, the kernel RPC server implementation must use rpcb_v4_register() instead of rpcb_register(). A new kernel build option allows distributions to use the legacy v2 call until they integrate an appropriate user-space rpcbind daemon that can support IPv6 RPC services. I tried adding some automatic logic to fall back if registering with a v4 protocol request failed, but there are too many corner cases. So I just made it a compile-time switch that distributions can throw when they've replaced portmapper with rpcbind. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29SUNRPC: Add address family field to svc_serv data structureChuck Lever
Introduce and initialize an address family field in the svc_serv structure. This field will determine what family to use for the service's listener sockets and what families are advertised via the local rpcbind daemon. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23sunrpc: remove sv_kill_signal field from svc_serv structJeff Layton
Since we no longer make any distinction between shutdown signals with nfsd, then it becomes easier to just standardize on a particular signal to use to bring it down (SIGINT, in this case). Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-06-23knfsd: convert knfsd to kthread APIJeff Layton
This patch is rather large, but I couldn't figure out a way to break it up that would remain bisectable. It does several things: - change svc_thread_fn typedef to better match what kthread_create expects - change svc_pool_map_set_cpumask to be more kthread friendly. Make it take a task arg and and get rid of the "oldmask" - have svc_set_num_threads call kthread_create directly - eliminate __svc_create_thread Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23SUNRPC: remove svc_create_thread()Jeff Layton
Now that the nfs4 callback thread uses the kthread API, there are no more users of svc_create_thread(). Remove it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-10nfsd: clean up svc_reserve_auth()J. Bruce Fields
This is a void function attempting to return the return value from another void function, which seems harmless but extremely weird, and apparently makes some compilers complain. While we're there, clean up a little (e.g. the switch statement had a minor style problem and seemed overkill as long as there's only one case). Thanks to Trond for noticing this. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-01SUNRPC: spin svc_rqst initialization to its own functionJeff Layton
Move the initialzation in __svc_create_thread that happens prior to thread creation to a new function. Export the function to allow services to have better control over the svc_rqst structs. Also rearrange the rqstp initialization to prevent NULL pointer dereferences in svc_exit_thread in case allocations fail. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01svc: Add transport hdr size for defer/revisitTom Tucker
Some transports have a header in front of the RPC header. The current defer/revisit processing considers only the iov_len and arg_len to determine how much to back up when saving the original request to revisit. Add a field to the rqstp structure to save the size of the transport header so svc_defer can correctly compute the start of a request. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01svc: Removing remaining references to rq_sock in rqstpTom Tucker
This functionally empty patch removes rq_sock and unamed union from rqstp structure. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01svc: Make deferral processing xprt independentTom Tucker
This patch moves the transport independent sk_deferred list to the svc_xprt structure and updates the svc_deferred_req structure to keep pointers to svc_xprt's directly. The deferral processing code is also moved out of the transport dependent recvfrom functions and into the generic svc_recv path. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01svc: Add transport specific xpo_release functionTom Tucker
The svc_sock_release function releases pages allocated to a thread. For UDP this frees the receive skb. For RDMA it will post a receive WR and bump the client credit count. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01svc: Change the svc_sock in the rqstp structure to a transportTom Tucker
The rqstp structure contains a pointer to the transport for the RPC request. This functionaly trivial patch adds an unamed union with pointers to both svc_sock and svc_xprt. Ultimately the union will be removed and only the rq_xprt field will remain. This allows incrementally extracting transport independent interfaces without one gigundo patch. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Acked-by: Neil Brown <neilb@suse.de> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2007-07-17knfsd: nfsd: set rq_client to ip-address-determined-domainJ. Bruce Fields
We want it to be possible for users to restrict exports both by IP address and by pseudoflavor. The pseudoflavor information has previously been passed using special auth_domains stored in the rq_client field. After the preceding patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so now we use rq_client for the ip information, as auth_null and auth_unix do. However, we keep around the special auth_domain in the rq_gssclient field for backwards compatibility purposes, so we can still do upcalls using the old "gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an appropriate export. This allows us to continue supporting old mountd. In fact, for this first patch, we always use the "gss/pseudoflavor" auth_domain (and only it) if it is available; thus rq_client is ignored in the auth_gss case, and this patch on its own makes no change in behavior; that will be left to later patches. Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap upcall by a dummy value--no version of idmapd has ever used it, and it's unlikely anyone really wants to perform idmapping differently depending on the where the client is (they may want to perform *credential* mapping differently, but that's a different matter--the idmapper just handles id's used in getattr and setattr). But I'm updating the idmapd code anyway, just out of general backwards-compatibility paranoia. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17knfsd: nfsd4: store pseudoflavor in requestAndy Adamson
Add a new field to the svc_rqst structure to record the pseudoflavor that the request was made with. For now we record the pseudoflavor but don't use it for anything. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-10sendfile: convert nfsd to splice_direct_to_actor()Jens Axboe
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-05-09RPC: add wrapper for svc_reserve to account for checksumJeff Layton
When the kernel calls svc_reserve to downsize the expected size of an RPC reply, it fails to account for the possibility of a checksum at the end of the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O then you'll generally see messages similar to this in the server's ring buffer: RPC request reserved 164 but used 208 While I was never able to verify it, I suspect that this problem is also the root cause of some oopses I've seen under these conditions: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726 This is probably also a problem for other sec= types and for NFSv4. The large reserved size for NFSv4 compound packets seems to generally paper over the problem, however. This patch adds a wrapper for svc_reserve that accounts for the possibility of a checksum. It also fixes up the appropriate callers of svc_reserve to call the wrapper. For now, it just uses a hardcoded value that I determined via testing. That value may need to be revised upward as things change, or we may want to eventually add a new auth_op that attempts to calculate this somehow. Unfortunately, there doesn't seem to be a good way to reliably determine the expected checksum length prior to actually calculating it, particularly with schemes like spkm3. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-06[PATCH] knfsd: remove CONFIG_IPV6 ifdefs from sunrpc server codeNeilBrown
They don't really save that much, and aren't worth the hassle. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12[PATCH] knfsd: SUNRPC: support IPv6 addresses in RPC server's UDP receive pathChuck Lever
Add support for IPv6 addresses in the RPC server's UDP receive path. [akpm@linux-foundation.org: cleanups] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12[PATCH] knfsd: SUNRPC: Make rq_daddr field address-version independentChuck Lever
The rq_daddr field must support larger addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12[PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addressesChuck Lever
Expand the rq_addr field to allow it to contain larger addresses. Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then everywhere the 'sockaddr_in' was referenced, we use instead an accessor function (svc_addr_in) which safely casts the _storage to _in. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12[PATCH] knfsd: SUNRPC: Use sockaddr_storage to store address in svc_deferred_reqChuck Lever
Sockaddr_storage will allow us to store arbitrary socket addresses in the svc_deferred_req struct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12[PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst ↵Chuck Lever
for printing There are loads of places where the RPC server assumes that the rq_addr fields contains an IPv4 address. Top among these are error and debugging messages that display the server's IP address. Let's refactor the address printing into a separate function that's smart enough to figure out the difference between IPv4 and IPv6 addresses. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned readsNeilBrown
NFSd assumes that largest number of pages that will be needed for a request+response is 2+N where N pages is the size of the largest permitted read/write request. The '2' are 1 for the non-data part of the request, and 1 for the non-data part of the reply. However, when a read request is not page-aligned, and we choose to use ->sendfile to send it directly from the page cache, we may need N+1 pages to hold the whole reply. This can overflow and array and cause an Oops. This patch increases size of the array for holding pages by one and makes sure that entry is NULL when it is not in use. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2006-10-20[PATCH] fix svc_procfunc declarationAl Viro
svc_procfunc instances return __be32, not int Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-06[PATCH] knfsd: tidy up up meaning of 'buffer size' in nfsd/sunrpcNeilBrown
There is some confusion about the meaning of 'bufsz' for a sunrpc server. In some cases it is the largest message that can be sent or received. In other cases it is the largest 'payload' that can be included in a NFS message. In either case, it is not possible for both the request and the reply to be this large. One of the request or reply may only be one page long, which fits nicely with NFS. So we remove 'bufsz' and replace it with two numbers: 'max_payload' and 'max_mesg'. Max_payload is the size that the server requests. It is used by the server to check the max size allowed on a particular connection: depending on the protocol a lower limit might be used. max_mesg is the largest single message that can be sent or received. It is calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and with PAGE_SIZE added to overhead. Only one of the request and reply may be this size. The other must be at most one page. Cc: Greg Banks <gnb@sgi.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] knfsd: register all RPC programs with portmapper by defaultOlaf Kirch
The NFSACL patches introduced support for multiple RPC services listening on the same transport. However, only the first of these services was registered with portmapper. This was perfectly fine for nfsacl, as you traditionally do not want these to show up in a portmapper listing. The patch below changes the default behavior to always register all services listening on a given transport, but retains the old behavior for nfsacl services. Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCPGreg Banks
The limit over UDP remains at 32K. Also, make some of the apparently arbitrary sizing constants clearer. The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of the rqstp. This allows it to be different for different protocols (udp/tcp) and also allows it to depend on the servers declared sv_bufsiz. Note that we don't actually increase sv_bufsz for nfs yet. That comes next. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfromNeilBrown
.. by allocating the array of 'kvec' in 'struct svc_rqst'. As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer allocate an array of this size on the stack. So we allocate it in 'struct svc_rqst'. However svc_rqst contains (indirectly) an array of the same type and size (actually several, but they are in a union). So rather than waste space, we move those arrays out of the separately allocated union and into svc_rqst to share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at different times, so there is no conflict). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-04[PATCH] knfsd: Replace two page lists in struct svc_rqst with oneNeilBrown
We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256. This means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES. struct svc_rqst contains two such arrays. However the there are never more that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is needed. The two arrays are for the pages holding the request, and the pages holding the reply. Instead of two arrays, we can simply keep an index into where the first reply page is. This patch also removes a number of small inline functions that probably server to obscure what is going on rather than clarify it, and opencode the needed functionality. Also remove the 'rq_restailpage' variable as it is *always* 0. i.e. if the response 'xdr' structure has a non-empty tail it is always in the same pages as the head. check counters are initilised and incr properly check for consistant usage of ++ etc maybe extra some inlines for common approach general review Signed-off-by: Neil Brown <neilb@suse.de> Cc: Magnus Maatta <novell@kiruna.se> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: make rpc threads pools numa awareGreg Banks
Actually implement multiple pools. On NUMA machines, allocate a svc_pool per NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool. Enqueue sockets on the svc_pool corresponding to the CPU on which the socket bh is run (i.e. the NIC interrupt CPU). Threads have their cpu mask set to limit them to the CPUs in the svc_pool that owns them. This is the patch that allows an Altix to scale NFS traffic linearly beyond 4 CPUs and 4 NICs. Incorporates changes and feedback from Neil Brown, Trond Myklebust, and Christoph Hellwig. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: add svc_set_num_threadsGreg Banks
Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new way of managing the list of all threads in a svc_serv. Add svc_create_pooled() to allow creation of a svc_serv whose threads are managed by the sunrpc code. Add svc_set_num_threads() to manage the number of threads in a service, either per-pool or globally across the service. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: add svc_getGreg Banks
add svc_get() for those occasions when we need to temporarily bump up svc_serv->sv_nrthreads as a pseudo refcount. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: split svc_serv into poolsGreg Banks
Split out the list of idle threads and pending sockets from svc_serv into a new svc_pool structure, and allocate a fixed number (in this patch, 1) of pools per svc_serv. The new structure contains a lock which takes over several of the duties of svc_serv->sv_lock, which is now relegated to protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv. The point is to move the hottest fields out of svc_serv and into svc_pool, allowing a following patch to arrange for a svc_pool per NUMA node or per CPU. This is a major step towards making the NFS server NUMA-friendly. Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: move tempsock aging to a timerGreg Banks
Following are 11 patches from Greg Banks which combine to make knfsd more Numa-aware. They reduce hitting on 'global' data structures, and create some data-structures that can be node-local. knfsd threads are bound to a particular node, and the thread to handle a new request is chosen from the threads that are attach to the node that received the interrupt. The distribution of threads across nodes can be controlled by a new file in the 'nfsd' filesystem, though the default approach of an even spread is probably fine for most sites. Some (old) numbers that show the efficacy of these patches: N == number of NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2 N Throughput, MiB/s CPU usage, % (max=N*100) Before After Before After --- ------ ---- ----- ----- 4 312 435 350 228 6 500 656 501 418 8 562 804 690 589 This patch: Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces the amount of work that needs to be done in the main RPC loop and the length of time we need to hold the (effectively global) svc_serv->sv_lock. [akpm@osdl.org: cleanup] Signed-off-by: Greg Banks <gnb@melbourne.sgi.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: Drop 'serv' option to svc_recv and svc_processNeilBrown
It isn't needed as it is available in rqstp->rq_server, and dropping it allows some local vars to be dropped. [akpm@osdl.org: build fix] Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-02[PATCH] knfsd: add a callback for when last rpc thread finishesNeilBrown
nfsd has some cleanup that it wants to do when the last thread exits, and there will shortly be some more. So collect this all into one place and define a callback for an rpc service to call when the service is about to be destroyed. [akpm@osdl.org: cleanups, build fix] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-09-28[SUNRPC]: trivial endianness annotationsAlexey Dobriyan
pure s/u32/__be32/ [AV: large part based on Alexey's patches] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28[SUNRPC]: svc_{get,put}nl()Alexey Dobriyan
* add svc_getnl(): Take network-endian value from buffer, convert to host-endian and return it. * add svc_putnl(): Take host-endian value, convert to network-endian and put it into a buffer. * annotate svc_getu32()/svc_putu32() as dealing with network-endian. * convert to svc_getnl(), svc_putnl(). [AV: in large part it's a carved-up Alexey's patch] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>