aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2006-12-07[PATCH] new scheme to preempt swap tokenAshwin Chaugule
The new swap token patches replace the current token traversal algo. The old algo had a crude timeout parameter that was used to handover the token from one task to another. This algo, transfers the token to the tasks that are in need of the token. The urgency for the token is based on the number of times a task is required to swap-in pages. Accordingly, the priority of a task is incremented if it has been badly affected due to swap-outs. To ensure that the token doesnt bounce around rapidly, the token holders are given a priority boost. The priority of tasks is also decremented, if their rate of swap-in's keeps reducing. This way, the condition to check whether to pre-empt the swap token, is a matter of comparing two task's priority fields. [akpm@osdl.org: cleanups] Signed-off-by: Ashwin Chaugule <ashwin.chaugule@celunite.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] memory page_alloc zonelist caching reorder structurePaul Jackson
Rearrange the struct members in the 'struct zonelist_cache' structure, so as to put the readonly (once initialized) z_to_n[] array first, where it will come right after the zones[] array in struct zonelist. This pretty much eliminates the chance that the two frequently written elements of 'struct zonelist_cache', the fullzones bitmap and last_full_zap times, will end up on the same cache line as the performance sensitive, frequently read, never (after init) written zones[] array. Keeping frequently written data off frequently read cache lines is good for performance. Thanks to Rohit Seth for the suggestion. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] memory page_alloc zonelist caching speedupPaul Jackson
Optimize the critical zonelist scanning for free pages in the kernel memory allocator by caching the zones that were found to be full recently, and skipping them. Remembers the zones in a zonelist that were short of free memory in the last second. And it stashes a zone-to-node table in the zonelist struct, to optimize that conversion (minimize its cache footprint.) Recent changes: This differs in a significant way from a similar patch that I posted a week ago. Now, instead of having a nodemask_t of recently full nodes, I have a bitmask of recently full zones. This solves a problem that last weeks patch had, which on systems with multiple zones per node (such as DMA zone) would take seeing any of these zones full as meaning that all zones on that node were full. Also I changed names - from "zonelist faster" to "zonelist cache", as that seemed to better convey what we're doing here - caching some of the key zonelist state (for faster access.) See below for some performance benchmark results. After all that discussion with David on why I didn't need them, I went and got some ;). I wanted to verify that I had not hurt the normal case of memory allocation noticeably. At least for my one little microbenchmark, I found (1) the normal case wasn't affected, and (2) workloads that forced scanning across multiple nodes for memory improved up to 10% fewer System CPU cycles and lower elapsed clock time ('sys' and 'real'). Good. See details, below. I didn't have the logic in get_page_from_freelist() for various full nodes and zone reclaim failures correct. That should be fixed up now - notice the new goto labels zonelist_scan, this_zone_full, and try_next_zone, in get_page_from_freelist(). There are two reasons I persued this alternative, over some earlier proposals that would have focused on optimizing the fake numa emulation case by caching the last useful zone: 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems) have seen real customer loads where the cost to scan the zonelist was a problem, due to many nodes being full of memory before we got to a node we could use. Or at least, I think we have. This was related to me by another engineer, based on experiences from some time past. So this is not guaranteed. Most likely, though. The following approach should help such real numa systems just as much as it helps fake numa systems, or any combination thereof. 2) The effort to distinguish fake from real numa, using node_distance, so that we could cache a fake numa node and optimize choosing it over equivalent distance fake nodes, while continuing to properly scan all real nodes in distance order, was going to require a nasty blob of zonelist and node distance munging. The following approach has no new dependency on node distances or zone sorting. See comment in the patch below for a description of what it actually does. Technical details of note (or controversy): - See the use of "zlc_active" and "did_zlc_setup" below, to delay adding any work for this new mechanism until we've looked at the first zone in zonelist. I figured the odds of the first zone having the memory we needed were high enough that we should just look there, first, then get fancy only if we need to keep looking. - Some odd hackery was needed to add items to struct zonelist, while not tripping up the custom zonelists built by the mm/mempolicy.c code for MPOL_BIND. My usual wordy comments below explain this. Search for "MPOL_BIND". - Some per-node data in the struct zonelist is now modified frequently, with no locking. Multiple CPU cores on a node could hit and mangle this data. The theory is that this is just performance hint data, and the memory allocator will work just fine despite any such mangling. The fields at risk are the struct 'zonelist_cache' fields 'fullzones' (a bitmask) and 'last_full_zap' (unsigned long jiffies). It should all be self correcting after at most a one second delay. - This still does a linear scan of the same lengths as before. All I've optimized is making the scan faster, not algorithmically shorter. It is now able to scan a compact array of 'unsigned short' in the case of many full nodes, so one cache line should cover quite a few nodes, rather than each node hitting another one or two new and distinct cache lines. - If both Andi and Nick don't find this too complicated, I will be (pleasantly) flabbergasted. - I removed the comment claiming we only use one cachline's worth of zonelist. We seem, at least in the fake numa case, to have put the lie to that claim. - I pay no attention to the various watermarks and such in this performance hint. A node could be marked full for one watermark, and then skipped over when searching for a page using a different watermark. I think that's actually quite ok, as it will tend to slightly increase the spreading of memory over other nodes, away from a memory stressed node. =============== Performance - some benchmark results and analysis: This benchmark runs a memory hog program that uses multiple threads to touch alot of memory as quickly as it can. Multiple runs were made, touching 12, 38, 64 or 90 GBytes out of the total 96 GBytes on the system, and using 1, 19, 37, or 55 threads (on a 56 CPU system.) System, user and real (elapsed) timings were recorded for each run, shown in units of seconds, in the table below. Two kernels were tested - 2.6.18-mm3 and the same kernel with this zonelist caching patch added. The table also shows the percentage improvement the zonelist caching sys time is over (lower than) the stock *-mm kernel. number 2.6.18-mm3 zonelist-cache delta (< 0 good) percent GBs N ------------ -------------- ---------------- systime mem threads sys user real sys user real sys user real better 12 1 153 24 177 151 24 176 -2 0 -1 1% 12 19 99 22 8 99 22 8 0 0 0 0% 12 37 111 25 6 112 25 6 1 0 0 -0% 12 55 115 25 5 110 23 5 -5 -2 0 4% 38 1 502 74 576 497 73 570 -5 -1 -6 0% 38 19 426 78 48 373 76 39 -53 -2 -9 12% 38 37 544 83 36 547 82 36 3 -1 0 -0% 38 55 501 77 23 511 80 24 10 3 1 -1% 64 1 917 125 1042 890 124 1014 -27 -1 -28 2% 64 19 1118 138 119 965 141 103 -153 3 -16 13% 64 37 1202 151 94 1136 150 81 -66 -1 -13 5% 64 55 1118 141 61 1072 140 58 -46 -1 -3 4% 90 1 1342 177 1519 1275 174 1450 -67 -3 -69 4% 90 19 2392 199 192 2116 189 176 -276 -10 -16 11% 90 37 3313 238 175 2972 225 145 -341 -13 -30 10% 90 55 1948 210 104 1843 213 100 -105 3 -4 5% Notes: 1) This test ran a memory hog program that started a specified number N of threads, and had each thread allocate and touch 1/N'th of the total memory to be used in the test run in a single loop, writing a constant word to memory, one store every 4096 bytes. Watching this test during some earlier trial runs, I would see each of these threads sit down on one CPU and stay there, for the remainder of the pass, a different CPU for each thread. 2) The 'real' column is not comparable to the 'sys' or 'user' columns. The 'real' column is seconds wall clock time elapsed, from beginning to end of that test pass. The 'sys' and 'user' columns are total CPU seconds spent on that test pass. For a 19 thread test run, for example, the sum of 'sys' and 'user' could be up to 19 times the number of 'real' elapsed wall clock seconds. 3) Tests were run on a fresh, single-user boot, to minimize the amount of memory already in use at the start of the test, and to minimize the amount of background activity that might interfere. 4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM. 5) Notice that the 'real' time gets large for the single thread runs, even though the measured 'sys' and 'user' times are modest. I'm not sure what that means - probably something to do with it being slow for one thread to be accessing memory along ways away. Perhaps the fake numa system, running ostensibly the same workload, would not show this substantial degradation of 'real' time for one thread on many nodes -- lets hope not. 6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs) ran quite efficiently, as one might expect. Each pair of threads needed to allocate and touch the memory on the node the two threads shared, a pleasantly parallizable workload. 7) The intermediate thread count passes, when asking for alot of memory forcing them to go to a few neighboring nodes, improved the most with this zonelist caching patch. Conclusions: * This zonelist cache patch probably makes little difference one way or the other for most workloads on real numa hardware, if those workloads avoid heavy off node allocations. * For memory intensive workloads requiring substantial off-node allocations on real numa hardware, this patch improves both kernel and elapsed timings up to ten per-cent. * For fake numa systems, I'm optimistic, but will have to leave that up to Rohit Seth to actually test (once I get him a 2.6.18 backport.) Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: David Rientjes <rientjes@cs.washington.edu> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] Get rid of zone_table[]Christoph Lameter
The zone table is mostly not needed. If we have a node in the page flags then we can get to the zone via NODE_DATA() which is much more likely to be already in the cpu cache. In case of SMP and UP NODE_DATA() is a constant pointer which allows us to access an exact replica of zonetable in the node_zones field. In all of the above cases there will be no need at all for the zone table. The only remaining case is if in a NUMA system the node numbers do not fit into the page flags. In that case we make sparse generate a table that maps sections to nodes and use that table to to figure out the node number. This table is sized to fit in a single cache line for the known 32 bit NUMA platform which makes it very likely that the information can be obtained without a cache miss. For sparsemem the zone table seems to be have been fairly large based on the maximum possible number of sections and the number of zones per node. There is some memory saving by removing zone_table. The main benefit is to reduce the cache foootprint of the VM from the frequent lookups of zones. Plus it simplifies the page allocator. [akpm@osdl.org: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] add bottom_half.hAndrew Morton
With CONFIG_SMP=n: drivers/input/ff-memless.c:384: warning: implicit declaration of function 'local_bh_disable' drivers/input/ff-memless.c:393: warning: implicit declaration of function 'local_bh_enable' Really linux/spinlock.h should include linux/interrupt.h. But interrupt.h includes sched.h which will need spinlock.h. So the patch breaks the _bh declarations out into a separate header and includes it in both interrupt.h and spinlock.h. Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[TG3]: Add TG3_FLG2_IS_NIC flag.Michael Chan
Add Tg3_FLG2_IS_NIC flag to unambiguously determine whether the device is NIC or onboard. Previously, the EEPROM_WRITE_PROT flag was overloaded to also mean onboard. With the separation, we can support some devices that are onboard but do not use eeprom write protect. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-07[TG3]: Add 5787F device ID.Michael Chan
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06audit: Add auditing to ipsecJoy Latten
An audit message occurs when an ipsec SA or ipsec policy is created/deleted. Signed-off-by: Joy Latten <latten@austin.ibm.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06[NETFILTER]: nf_conntrack: fix warning in PPTP helperYasuyuki Kozakai
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06[CRYPTO] api: Remove unused functionsAdrian Bunk
This patch removes the following no longer used functions: - api.c: crypto_alg_available() - digest.c: crypto_digest_init() - digest.c: crypto_digest_update() - digest.c: crypto_digest_final() - digest.c: crypto_digest_digest() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-12-06[IPSEC]: Add support for AES-XCBC-MACKazunori MIYAZAWA
The glue of xfrm. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-12-06[GENETLINK]: Move command capabilities to flags.Jamal Hadi Salim
This patch moves command capabilities to command flags. Other than being cleaner, saves several bytes. We increment the nlctrl version so as to signal to user space that to not expect the attributes. We will try to be careful not to do this too often ;-> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-07[PATCH] i386: Preserve EFI run time regions with memmap parameterArtiom Myaskouvskey
When using memmap kernel parameter in EFI boot we should also add to memory map memory regions of runtime services to enable their mapping later. AK: merged and cleaned up the patch Signed-off-by: Artiom Myaskouvskey <artiom.myaskouvskey@intel.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07[PATCH] i386: call efi_get_time during suspendArtiom Myaskouvskey
Function efi_get_time called not only during init kernel phase but also during suspend (from get_cmos_time). When it is called from get_cmos_time the corresponding runtime service should be called in virtual and not in physical mode. Signed-off-by: Artiom Myaskouvskey <artiom.myaskouvskey@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: "Narayanan, Chandramouli" <chandramouli.narayanan@intel.com> Cc: "Jiossy, Rami" <rami.jiossy@intel.com> Cc: "Satt, Shai" <shai.satt@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07[PATCH] i386: change the 'no_control' field to 'hotpluggable' in the struct cpuSiddha, Suresh B
Change the 'no_control' field in the cpu struct to a more positive and better term 'hotpluggable'. And change(/cleanup) the logic accordingly. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07[PATCH] i386: cpu_detect extractionRusty Russell
Both lhype and Xen want to call the core of the x86 cpu detect code before calling start_kernel. (extracted from larger patch) AK: folded in start_kernel header patch Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org>
2006-12-07[PATCH] Generic: Move __user cast into probe_kernel_addressAndi Kleen
Caller of probe_kernel_address shouldn't need to know that pka is internally implemented with __get_user. So move the __user cast into pka. Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07[PATCH] i386: Relocatable kernel supportEric W. Biederman
This patch modifies the i386 kernel so that if CONFIG_RELOCATABLE is selected it will be able to be loaded at any 4K aligned address below 1G. The technique used is to compile the decompressor with -fPIC and modify it so the decompressor is fully relocatable. For the main kernel relocations are generated. Resulting in a kernel that is relocatable with no runtime overhead and no need to modify the source code. A reserved 32bit word in the parameters has been assigned to serve as a stack so we figure out where are running. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-07[PATCH] x86: all cpu backtraceAndrew Morton
When a spinlock lockup occurs, arrange for the NMI code to emit an all-cpu backtrace, so we get to see which CPU is holding the lock, and where. Cc: Andi Kleen <ak@muc.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>
2006-12-06SUNRPC: Remove pprintk() from net/sunrpc/xprt.cChuck Lever
These appear to be deprecated. Removing them also gets rid of some sparse noise. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Rename skb_reader_t and friendsChuck Lever
Clean-up: hch suggested that the RPC client shouldn't pollute the name space used by the generic skb manipulation routines in net/core/skbuff.c. Rename a couple of types in xdr.h to adhere to this convention. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: skb_read_bits is the same as xs_tcp_copy_dataChuck Lever
Clean-up: eliminate xs_tcp_copy_data -- it's exactly the same logic as the common routine skb_read_bits. The UDP and TCP socket read code now share the same routine for copying data into an xdr_buf. Now that skb_read_bits() is exported, rename it to avoid confusing it with a generic skb_* function. As these functions are XDR-specific, they should not have names that suggest they are of generic use. Also rename skb_read_and_csum_bits() to be consistent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Make address format buffers more genericChuck Lever
For now we will assume that all transports will use the address format buffers in the rpc_xprt struct to store their addresses. Change rpc_peer2str() to be a generic routine to handle this, and get rid of the print_address() op in the rpc_xprt_ops vector. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: move saved socket callback functions to a private data structureChuck Lever
Move the three fields for saving socket callback functions out of the rpc_xprt structure and into a private data structure maintained in net/sunrpc/xprtsock.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Move the UDP socket bufsize parameters to a private data structureChuck Lever
Move the socket-specific buffer size parameters for UDP sockets to a private data structure maintained in net/sunrpc/xprtsock.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Move rpc_xprt socket connect fields into private data structureChuck Lever
Move the socket-specific connection management fields out of the generic rpc_xprt structure into a private data structure maintained in net/sunrpc/xprtsock.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Move TCP state flags into xprtsock.cChuck Lever
Move "XPRT_LAST_FRAG" and friends from xprt.h into xprtsock.c, and rename them to use the naming scheme in use in xprtsock.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Move TCP receive state variables into private data structureChuck Lever
Move the TCP receive state variables from the generic rpc_xprt structure to a private structure maintained inside net/sunrpc/xprtsock.c. Also rename a function/variable pair to refer to RPC fragment headers instead of record markers, to be consistent with types defined in sunrpc/*.h. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Remove sock and inet fields from rpc_xprtChuck Lever
The "sock" and "inet" fields are socket-specific. Move them to a private data structure maintained entirely within net/sunrpc/xprtsock.c Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06rpcgss: krb5: ignore seedJ. Bruce Fields
We're currently not actually using seed or seed_init. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06rpcgss: krb5: sanity check sealalg value in the downcallJ. Bruce Fields
The sealalg is checked in several places, giving the impression it could be either SEAL_ALG_NONE or SEAL_ALG_DES. But in fact SEAL_ALG_NONE seems to be sufficient only for making mic's, and all the contexts we get must be capable of wrapping as well. So the sealalg must be SEAL_ALG_DES. As with signalg, just check for the right value on the downcall and ignore it otherwise. Similarly, tighten expectations for the sealalg on incoming tokens, in case we do support other values eventually. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06rpcgss: simplify make_checksumJ. Bruce Fields
We're doing some pointless translation between krb5 constants and kernel crypto string names. Also clean up some related spkm3 code as necessary. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06gss: krb5: remove signalg and sealalgJ. Bruce Fields
We designed the krb5 context import without completely understanding the context. Now it's clear that there are a number of fields that we ignore, or that we depend on having one single value. In particular, we only support one value of signalg currently; so let's check the signalg field in the downcall (in case we decide there's something else we could support here eventually), but ignore it otherwise. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06rpc: spkm3 updateOlga Kornievskaia
This updates the spkm3 code to bring it up to date with our current understanding of the spkm3 spec. In doing so, we're changing the downcall format used by gssd in the spkm3 case, which will cause an incompatilibity with old userland spkm3 support. Since the old code a) didn't implement the protocol correctly, and b) was never distributed except in the form of some experimental patches from the citi web site, we're assuming this is OK. We do detect the old downcall format and print warning (and fail). We also include a version number in the new downcall format, to be used in the future in case any further change is required. In some more detail: - fix integrity support - removed dependency on NIDs. instead OIDs are used - known OID values for algorithms added. - fixed some context fields and types Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06rpc: move process_xdr_bufOlga Kornievskaia
Since process_xdr_buf() is useful outside of the kerberos-specific code, we move it to net/sunrpc/xdr.c, export it, and rename it in keeping with xdr_* naming convention of xdr.c. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06rpc: gss: eliminate print_hexl()'sJ. Bruce Fields
Dumping all this data to the logs is wasteful (even when debugging is turned off), and creates too much output to be useful when it's turned on. Fix a minor style bug or two while we're at it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06NFS: Ensure we only call set_page_writeback() under the page lockTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06NFS: Make nfs_updatepage() mark the page as dirty.Trond Myklebust
This will ensure that we can call set_page_writeback() from within nfs_writepage(), which is always called with the page lock set. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06NFS: Ensure that nfs_wb_page() calls writepage when necessary.Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06NFS: Add nfs_set_page_dirty()Trond Myklebust
We will want to allow nfs_writepage() to distinguish between pages that have been marked as dirty by the VM, and those that have been marked as dirty by nfs_updatepage(). In the former case, the entire page will want to be written out, and so any requests that were pending need to be flushed out first. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06NFS: Remove nfs_writepage_sync()Trond Myklebust
Maintaining two parallel ways of doing synchronous writes is rather pointless. This patch gets rid of the legacy nfs_writepage_sync(), and replaces it with the faster asynchronous writes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06NFS: cleanup of nfs_sync_inode_wait()Trond Myklebust
Allow callers to directly pass it a struct writeback_control. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06NFS: Clean up nfs_scan_dirty()Trond Myklebust
Pass down struct writeback control. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Make the transport-specific setup routine allocate rpc_xprtChuck Lever
Change the location where the rpc_xprt structure is allocated so each transport implementation can allocate a private area from the same chunk of memory. Note also that xprt->ops->destroy, rather than xprt_destroy, is now responsible for freeing rpc_xprt when the transport is destroyed. Test plan: Connectathon. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: minor optimization of "xid" field in rpc_xprtChuck Lever
Move the xid field in the rpc_xprt structure to be in the same cache line as the reserve_lock, since these are used at the same time. Test plan: None. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Clean up argument types in xdr.cTrond Myklebust
Converts various integer buffer offsets and sizes to unsigned integer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Fix up missing BKL in asynchronous RPC callback functionsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Give cloned RPC clients their own rpc_pipefs directoryTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06SUNRPC: Fix a potential race in rpc_wake_up_task()Trond Myklebust
Use RCU to ensure that we can safely call rpc_finish_wakeup after we've called __rpc_do_wake_up_task. If not, there is a theoretical race, in which the rpc_task finishes executing, and gets freed first. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2006-12-06Fix a second potential rpc_wakeup race...Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>