aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-04-07nilfs2: segment usage fileKoji Sato
This adds a meta data file which stores the allocation state of segments. [konishi.ryusuke@lab.ntt.co.jp: fix wrong counting of checkpoints and dirty segments] Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: checkpoint fileKoji Sato
This adds a meta data file which holds checkpoint entries in its data blocks. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: inode map fileRyusuke Konishi
This adds a meta data file which stores on-disk inodes in its data blocks. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: disk address translatorKoji Sato
This adds the disk address translation file (DAT) whose primary function is to convert virtual disk block numbers to actual disk block numbers. The virtual block numbers of NILFS are associated with checkpoint generation numbers, and this file also provides functions to manage the lifetime information of each virtual block number. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: persistent object allocatorRyusuke Konishi
This adds common functions to allocate or deallocate entries with bitmaps on a meta data file. This feature is used by the DAT and ifile. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Yoshiji Amagai <amagai.yoshiji@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: meta data fileRyusuke Konishi
This adds the meta data file, which serves common buffer functions to the DAT, sufile, cpfile, ifile, and so forth. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: buffer and page operationsRyusuke Konishi
This adds common routines for buffer/page operations used in B-tree node caches, meta data files, or segment constructor (log writer). NILFS uses copy functions for buffers and pages due to the following reasons: 1) Relocation required for COW Since NILFS changes address of on-disk blocks, moving buffers in page cache is needed for the buffers which are not addressed by a file offset. If buffer size is smaller than page size, this involves partial copy of pages. 2) Freezing mmapped pages NILFS calculates checksums for each log to ensure its validity. If page data changes after the checksum calculation, this validity check will not work correctly. To avoid this failure for mmaped pages, NILFS freezes their data by copying. 3) Copy-on-write for DAT pages NILFS makes clones of DAT page caches in a copy-on-write manner during GC processes, and this ensures atomicity and consistency of the DAT in the transient state. In addition, NILFS uses two obsolete functions, nilfs_mark_buffer_dirty() and nilfs_clear_page_dirty() respectively. * nilfs_mark_buffer_dirty() was required to avoid NULL pointer dereference faults: Since the page cache of B-tree node pages or data page cache of pseudo inodes does not have a valid mapping->host, calling mark_buffer_dirty() for their buffers causes the fault; it calls __mark_inode_dirty(NULL) through __set_page_dirty(). * nilfs_clear_page_dirty() was needed in the two cases: 1) For B-tree node pages and data pages of the dat/gcdat, NILFS2 clears page dirty flags when it copies back pages from the cloned cache (gcdat->{i_mapping,i_btnode_cache}) to its original cache (dat->{i_mapping,i_btnode_cache}). 2) Some B-tree operations like insertion or deletion may dispose buffers in dirty state, and this needs to cancel the dirty state of their pages. clear_page_dirty_for_io() caused faults because it does not clear the dirty tag on the page cache. Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: B-tree node cacheRyusuke Konishi
This adds routines for B-tree node buffers. Signed-off-by: Seiji Kihara <kihara.seiji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: direct block mappingKoji Sato
This adds block mappings using direct pointers which are stored in the i_bmap array of inode. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: B-tree based block mappingKoji Sato
This adds declarations and functions of NILFS2 B-tree. Two variants are integrated in the NILFS2 B-tree. The B-tree for the most files points to the child nodes or data blocks with virtual block addresses, whereas the B-tree of the DAT uses actual block addresses. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: integrated block mappingKoji Sato
This adds structures and operations for the block mapping (bmap for short). NILFS2 uses direct mappings for short files or B-tree based mappings for longer files. Every on-disk data block is held with inodes and managed through this block mapping. The nilfs_bmap structure and a set of functions here provide this capability to the NILFS2 inode. [penberg@cs.helsinki.fi: remove a bunch of bmap wrapper macros] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07nilfs2: add inode and other major structuresRyusuke Konishi
This adds the following common structures of the NILFS2 file system. * nilfs_inode_info structure: gives on-memory inode. * nilfs_sb_info structure: keeps per-mount state and a special inode for the ifile. This structure is attached to the super_block structure. * the_nilfs structure: keeps shared state and locks among a read/write mount and snapshot mounts. This keeps special inodes for the sufile, cpfile, dat, and another dat inode used during GC (gcdat). This also has a hash table of dummy inodes to cache disk blocks during GC (gcinodes). * nilfs_transaction_info structure: keeps per task state while nilfs is writing logs or doing indivisible inode or namespace operations. This structure is used to identify context during log making and store nest level of the lock which ensures atomicity of file system operations. Signed-off-by: Koji Sato <sato.koji@lab.ntt.co.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07fs/romfs: return f_fsid for statfs(2)Coly Li
Make romfs return f_fsid info for statfs(2). Signed-off-by: Coly Li <coly.li@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07namespaces: move proc_net_get_sb to a generic fs/super.c helperSerge E. Hallyn
The mqueuefs filesystem will use this helper as well. Proc's main get_sb could also be made to use it, but that will require a bit more rework. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07/proc/pid/maps: don't show pgoff of pure ANON VMAsKAMEZAWA Hiroyuki
Recently, it's argued that what proc/pid/maps shows is ugly when a 32bit binary runs on 64bit host. /proc/pid/maps outputs vma's pgoff member but vma->pgoff is of no use information is the vma is for ANON. With this patch, /proc/pid/maps shows just 0 if no file backing store. [akpm@linux-foundation.org: coding-style fixes] [kamezawa.hiroyu@jp.fujitsu.com: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mike Waychison <mikew@google.com> Reported-by: Ying Han <yinghan@google.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07ramfs: fix double freeing s_fs_info on failed mountIngo Molnar
If ramfs mount fails, s_fs_info will be freed twice in ramfs_fill_super() and ramfs_kill_sb(), leading to kernel oops. Consolidate and beautify the code. Make sure s_fs_info and s_root are in known good states. Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06NFS: Fix a double free in nfs_parse_mount_options()Trond Myklebust
Due to an apparent typo, commit a67d18f89f5782806135aad4ee012ff78d45aae7 (NFS: load the rpc/rdma transport module automatically) lead to the 'proto=' mount option doing a double free, while Opt_mountproto leaks a string. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06ext3: make default data ordering mode configurableLinus Torvalds
This makes the defautl ext3 data ordering mode (when no explicit ordering is set) configurable, so as to allow people to default to 'data=writeback' and get the resulting latency improvements. This is a non-issue if a filesystem has been explicitly set to some ordering (with 'tune2fs'). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds
* 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: fix recovery bug UBIFS: add R/O compatibility UBIFS: fix compiler warnings UBIFS: fully sort GCed nodes UBIFS: fix commentaries UBIFS: introduce a helpful variable UBIFS: use KERN_CONT UBIFS: fix lprops committing bug UBIFS: fix bogus assertion UBIFS: fix bug where page is marked uptodate when out of space UBIFS: amend key_hash return value UBIFS: improve find function interface UBIFS: list usage cleanup UBIFS: fix dbg_chk_lpt_sz()
2009-04-06Merge git://git.infradead.org/mtd-2.6Linus Torvalds
* git://git.infradead.org/mtd-2.6: (53 commits) [MTD] struct device - replace bus_id with dev_name(), dev_set_name() [MTD] [NOR] Fixup for Numonyx M29W128 chips [MTD] mtdpart: Make ecc_stats more realistic. powerpc/85xx: TQM8548: Update DTS file for multi-chip support powerpc: NAND: FSL UPM: document new bindings [MTD] [NAND] FSL-UPM: Add wait flags to support board/chip specific delays [MTD] [NAND] FSL-UPM: add multi chip support [MTD] [NOR] Add device parent info to physmap_of [MTD] [NAND] Add support for NAND on the Socrates board [MTD] [NAND] Add support for 4KiB pages. [MTD] sysfs support should not depend on CONFIG_PROC_FS [MTD] [NAND] Add parent info for CAFÉ controller [MTD] support driver model updates [MTD] driver model updates (part 2) [MTD] driver model updates [MTD] [NAND] move gen_nand's probe function to .devinit.text [MTD] [MAPS] move sa1100 flash's probe function to .devinit.text [MTD] fix use after free in register_mtd_blktrans [MTD] [MAPS] Drop now unused sharpsl-flash map [MTD] ofpart: Check name property to determine partition nodes. ... Manually fix trivial conflict in drivers/mtd/maps/Makefile
2009-04-06Merge branch 'kmemtrace-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'kmemtrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kmemtrace: trace kfree() calls with NULL or zero-length objects kmemtrace: small cleanups kmemtrace: restore original tracing data binary format, improve ABI kmemtrace: kmemtrace_alloc() must fill type_id kmemtrace: use tracepoints kmemtrace, rcu: don't include unnecessary headers, allow kmemtrace w/ tracepoints kmemtrace, rcu: fix rcupreempt.c data structure dependencies kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies kmemtrace, mm: fix slab.h dependency problem in mm/failslab.c kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_unlzma.c kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_bunzip2.c kmemtrace, kbuild: fix slab.h dependency problem in lib/decompress_inflate.c kmemtrace, squashfs: fix slab.h dependency problem in squasfs kmemtrace, befs: fix slab.h dependency problem kmemtrace, security: fix linux/key.h header file dependencies kmemtrace, fs: fix linux/fdtable.h header file dependencies kmemtrace, fs: uninline simple_transaction_set() kmemtrace, fs, security: move alloc_secdata() and free_secdata() to linux/security.h
2009-04-06Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits) nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4 nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc nfsd41: Documentation/filesystems/nfs41-server.txt nfsd41: CREATE_EXCLUSIVE4_1 nfsd41: SUPPATTR_EXCLCREAT attribute nfsd41: support for 3-word long attribute bitmask nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify nfsd41: pass writable attrs mask to nfsd4_decode_fattr nfsd41: provide support for minor version 1 at rpc level nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap nfsd41: access_valid nfsd41: clientid handling nfsd41: check encode size for sessions maxresponse cached nfsd41: stateid handling nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op nfsd41: destroy_session operation nfsd41: non-page DRC for solo sequence responses nfsd41: Add a create session replay cache nfsd41: create_session operation ...
2009-04-06nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drcBenny Halevy
Fixes the following compiler error: fs/nfsd/nfssvc.c: In function 'set_max_drc': fs/nfsd/nfssvc.c:240: error: 'NFSD_DRC_SIZE_SHIFT' undeclared CONFIG_NFSD_V4 is not set Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-06block: switch sync_dirty_buffer() over to WRITE_SYNCJens Axboe
We should now have the logic in place to handle this properly without regressing on the write performance, so re-enable the sync writes. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06block: Add flag for telling the IO schedulers NOT to anticipate more IOJens Axboe
By default, CFQ will anticipate more IO from a given io context if the previously completed IO was sync. This used to be fine, since the only sync IO was reads and O_DIRECT writes. But with more "normal" sync writes being used now, we don't want to anticipate for those. Add a bio/request flag that informs the IO scheduler that this is a sync request that we should not idle for. Introduce WRITE_ODIRECT specifically for O_DIRECT writes, and make sure that the other sync writes set this flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06jbd2: use WRITE_SYNC_PLUG instead of WRITE_SYNCJens Axboe
When you are going to be submitting several sync writes, we want to give the IO scheduler a chance to merge some of them. Instead of using the implicitly unplugging WRITE_SYNC variant, use WRITE_SYNC_PLUG and rely on sync_buffer() doing the unplug when someone does a wait_on_buffer()/lock_buffer(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06jbd: use WRITE_SYNC_PLUG instead of WRITE_SYNCJens Axboe
When you are going to be submitting several sync writes, we want to give the IO scheduler a chance to merge some of them. Instead of using the implicitly unplugging WRITE_SYNC variant, use WRITE_SYNC_PLUG and rely on sync_buffer() doing the unplug when someone does a wait_on_buffer()/lock_buffer(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06block: fsync_buffers_list() should use SWRITE_SYNC_PLUGJens Axboe
Then it can submit all the buffers without unplugging for each one. We will kick off the pending IO if we come across a new address space. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-05Merge branch 'tracing-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits) tracing, net: fix net tree and tracing tree merge interaction tracing, powerpc: fix powerpc tree and tracing tree interaction ring-buffer: do not remove reader page from list on ring buffer free function-graph: allow unregistering twice trace: make argument 'mem' of trace_seq_putmem() const tracing: add missing 'extern' keywords to trace_output.h tracing: provide trace_seq_reserve() blktrace: print out BLK_TN_MESSAGE properly blktrace: extract duplidate code blktrace: fix memory leak when freeing struct blk_io_trace blktrace: fix blk_probes_ref chaos blktrace: make classic output more classic blktrace: fix off-by-one bug blktrace: fix the original blktrace blktrace: fix a race when creating blk_tree_root in debugfs blktrace: fix timestamp in binary output tracing, Text Edit Lock: cleanup tracing: filter fix for TRACE_EVENT_FORMAT events ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release() x86: kretprobe-booster interrupt emulation code fix ... Fix up trivial conflicts in arch/parisc/include/asm/ftrace.h include/linux/memory.h kernel/extable.c kernel/module.c
2009-04-04Make non-compat preadv/pwritev use native register sizeLinus Torvalds
Instead of always splitting the file offset into 32-bit 'high' and 'low' parts, just split them into the largest natural word-size - which in C terms is 'unsigned long'. This allows 64-bit architectures to avoid the unnecessary 32-bit shifting and masking for native format (while the compat interfaces will obviously always have to do it). This also changes the order of 'high' and 'low' to be "low first". Why? Because when we have it like this, the 64-bit system calls now don't use the "pos_high" argument at all, and it makes more sense for the native system call to simply match the user-mode prototype. This results in a much more natural calling convention, and allows the compiler to generate much more straightforward code. On x86-64, we now generate testq %rcx, %rcx # pos_l js .L122 #, movq %rcx, -48(%rbp) # pos_l, pos from the C source loff_t pos = pos_from_hilo(pos_h, pos_l); ... if (pos < 0) return -EINVAL; and the 'pos_h' register isn't even touched. It used to generate code like mov %r8d, %r8d # pos_low, pos_low salq $32, %rcx #, tmp71 movq %r8, %rax # pos_low, pos.386 orq %rcx, %rax # tmp71, pos.386 js .L122 #, movq %rax, -48(%rbp) # pos.386, pos which isn't _that_ horrible, but it does show how the natural word size is just a more sensible interface (same arguments will hold in the user level glibc wrapper function, of course, so the kernel side is just half of the equation!) Note: in all cases the user code wrapper can again be the same. You can just do #define HALF_BITS (sizeof(unsigned long)*4) __syscall(PWRITEV, fd, iov, count, offset, (offset >> HALF_BITS) >> HALF_BITS); or something like that. That way the user mode wrapper will also be nicely passing in a zero (it won't actually have to do the shifts, the compiler will understand what is going on) for the last argument. And that is a good idea, even if nobody will necessarily ever care: if we ever do move to a 128-bit lloff_t, this particular system call might be left alone. Of course, that will be the least of our worries if we really ever need to care, so this may not be worth really caring about. [ Fixed for lost 'loff_t' cast noticed by Andrew Morton ] Acked-by: Gerd Hoffmann <kraxel@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: Ingo Molnar <mingo@elte.hu> Cc: Ralf Baechle <ralf@linux-mips.org>> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-03nfsd41: CREATE_EXCLUSIVE4_1Benny Halevy
Implement the CREATE_EXCLUSIVE4_1 open mode conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 This mode allows the client to atomically create a file if it doesn't exist while setting some of its attributes. It must be implemented if the server supports persistent reply cache and/or pnfs. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: SUPPATTR_EXCLCREAT attributeBenny Halevy
Return bitmask for supported EXCLUSIVE4_1 create attributes. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: support for 3-word long attribute bitmaskAndy Adamson
Also, use client minorversion to generate supported attrs Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verifyBenny Halevy
_nfsd4_verify currently skips 3 words from the encoded buffer begining. With support for 3-word attr bitmaps in nfsd41, nfsd4_encode_fattr may encode 1, 2, or 3 words, and not always 2 as it used to be, hence we need to find out where to skip using the encoded bitmap length. Note: This patch may be applied over pre-nfsd41 nfsd. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: pass writable attrs mask to nfsd4_decode_fattrBenny Halevy
In preparation for EXCLUSIVE4_1 Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versionsBenny Halevy
Support enabling and disabling nfsv4.1 via /proc/fs/nfsd/versions by writing the strings "+4.1" or "-4.1" correspondingly. Use user mode nfs-utils (rpc.nfsd option) to enable. This will allow us to get rid of CONFIG_NFSD_V4_1 [nfsd41: disable support for minorversion by default] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmapAndy Adamson
Separate the access bits from the want bits and enable __set_bit to work correctly with st_access_bmap. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: access_validAndy Adamson
For nfs41, the open share flags are used also for delegation "wants" and "signals". Check that they are valid. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: clientid handlingAndy Adamson
Extract the clientid from sessionid to set the op_clientid on open. Verify that the clid for other stateful ops is zero for minorversion != 0 Do all other checks for stateful ops without sessions. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> [fixed whitespace indent] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 remove sl_session from nfsd4_open] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: check encode size for sessions maxresponse cachedAndy Adamson
Calculate the space the compound response has taken after encoding the current operation. pad: add on 8 bytes for the next operation's op_code and status so that there is room to cache a failure on the next operation. Compare this length to the session se_fmaxresp_cached and return nfserr_rep_too_big_to_cache if the length is too large. Our se_fmaxresp_cached will always be a multiple of PAGE_SIZE, and so will be at least a page and will therefore hold the xdr_buf head. Signed-off-by: Andy Adamson <andros@netapp.com> [nfsd41: non-page DRC for solo sequence responses] [fixed nfsd4_check_drc_limit cosmetics] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use cstate session in nfsd4_check_drc_limit] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: stateid handlingAndy Adamson
When sessions are used, stateful operation sequenceid and stateid handling are not used. When sessions are used, on the first open set the seqid to 1, mark state confirmed and skip seqid processing. When sessionas are used the stateid generation number is ignored when it is zero whereas without sessions bad_stateid or stale stateid is returned. Add flags to propagate session use to all stateful ops and down to check_stateid_generation. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> [nfsd4_has_session should return a boolean, not u32] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: pass nfsd4_compoundres * to nfsd4_process_open1] [nfsd41: calculate HAS_SESSION in nfs4_preprocess_stateid_op] [nfsd41: calculate HAS_SESSION in nfs4_preprocess_seqid_op] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_opBenny Halevy
Currently we only use cstate->current_fh, will also be used by nfsd41 code. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: destroy_session operationBenny Halevy
Implement the destory_session operation confoming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 [use sessionid_lock spin lock] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: non-page DRC for solo sequence responsesAndy Adamson
A session inactivity time compound (lease renewal) or a compound where the sequence operation has sa_cachethis set to FALSE do not require any pages to be held in the v4.1 DRC. This is because struct nfsd4_slot is already caching the session information. Add logic to the nfs41 server to not cache response pages for solo sequence responses. Return nfserr_replay_uncached_rep on the operation following the sequence operation when sa_cachethis is FALSE. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use cstate session in nfsd4_replay_cache_entry] [nfsd41: rename nfsd4_no_page_in_cache] [nfsd41 rename nfsd4_enc_no_page_replay] [nfsd41 nfsd4_is_solo_sequence] [nfsd41 change nfsd4_not_cached return] Signed-off-by: Andy Adamson <andros@netapp.com> [changed return type to bool] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 drop parens in nfsd4_is_solo_sequence call] Signed-off-by: Andy Adamson <andros@netapp.com> [changed "== 0" to "!"] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: Add a create session replay cacheAndy Adamson
Replace the nfs4_client cl_seqid field with a single struct nfs41_slot used for the create session replay cache. The CREATE_SESSION slot sets the sl_session pointer to NULL. Otherwise, the slot and it's replay cache are used just like the session slots. Fix unconfirmed create_session replay response by initializing the create_session slot sequence id to 0. A future patch will set the CREATE_SESSION cache when a SEQUENCE operation preceeds the CREATE_SESSION operation. This compound is currently only cached in the session slot table. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: revert portion of nfsd4_set_cache_entry] Signed-off-by: Andy Adamson <andros@netpp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: create_session operationAndy Adamson
Implement the create_session operation confoming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Look up the client id (generated by the server on exchange_id, given by the client on create_session). If neither a confirmed or unconfirmed client is found then the client id is stale If a confirmed cilent is found (i.e. we already received create_session for it) then compare the sequence id to determine if it's a replay or possibly a mis-ordered rpc. If the seqid is in order, update the confirmed client seqid and procedd with updating the session parameters. If an unconfirmed client_id is found then verify the creds and seqid. If both match move the client id to confirmed state and proceed with processing the create_session. Currently, we do not support persistent sessions, and RDMA. alloc_init_session generates a new sessionid and creates a session structure. NFSD_PAGES_PER_SLOT is used for the max response cached calculation, and for the counting of DRC pages using the hard limits set in struct srv_serv. A note on NFSD_PAGES_PER_SLOT: Other patches in this series allow for NFSD_PAGES_PER_SLOT + 1 pages to be cached in a DRC slot when the response size is less than NFSD_PAGES_PER_SLOT * PAGE_SIZE but xdr_buf pages are used. e.g. a READDIR operation will encode a small amount of data in the xdr_buf head, and then the READDIR in the xdr_buf pages. So, the hard limit calculation use of pages by a session is underestimated by the number of cached operations using the xdr_buf pages. Yet another patch caches no pages for the solo sequence operation, or any compound where cache_this is False. So the hard limit calculation use of pages by a session is overestimated by the number of these operations in the cache. TODO: improve resource pre-allocation and negotiate session parameters accordingly. Respect and possibly adjust backchannel attributes. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> [nfsd41: remove headerpadsz from channel attributes] Our client and server only support a headerpadsz of 0. [nfsd41: use DRC limits in fore channel init] [nfsd41: do not change CREATE_SESSION back channel attrs] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [use sessionid_lock spin lock] [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 remove sl_session from alloc_init_session] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [simplify nfsd4_encode_create_session error handling] [nfsd41: fix comment style in init_forechannel_attrs] [nfsd41: allocate struct nfsd4_session and slot table in one piece] [nfsd41: no need to INIT_LIST_HEAD in alloc_init_session just prior to list_add] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: clear DRC cache on free_sessionAndy Adamson
Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: nfsd DRC logicAndy Adamson
Replay a request in nfsd4_sequence. Add a minorversion to struct nfsd4_compound_state. Pass the current slot to nfs4svc_encode_compound res via struct nfsd4_compoundres to set an NFSv4.1 DRC entry. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use cstate session in nfs4svc_encode_compoundres] [nfsd41 replace nfsd4_set_cache_entry] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: hard page limit for DRCAndy Adamson
Use no more than 1/128th of the number of free pages at nfsd startup for the v4.1 DRC. This is an arbitrary default which should probably end up under the control of an administrator. Signed-off-by: Andy Adamson <andros@netapp.com> [moved added fields in struct svc_serv under CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [fix set_max_drc calculation of sv_drc_max_pages] [moved NFSD_DRC_SIZE_SHIFT's declaration up in header file] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-04-03nfsd41: DRC save, restore, and clear functionsAndy Adamson
Cache all the result pages, including the rpc header in rq_respages[0], for a request in the slot table cache entry. Cache the statp pointer from nfsd_dispatch which points into rq_respages[0] just past the rpc header. When setting a cache entry, calculate and save the length of the nfs data minus the rpc header for rq_respages[0]. When replaying a cache entry, replace the cached rpc header with the replayed request rpc result header, unless there is not enough room in the cached results first page. In that case, use the cached rpc header. The sessions fore channel maxresponse size cached is set to NFSD_PAGES_PER_SLOT * PAGE_SIZE. For compounds we are cacheing with operations such as READDIR that use the xdr_buf->pages to hold data, we choose to cache the extra page of data rather than copying data from xdr_buf->pages into the xdr_buf->head page. [nfsd41: limit cache to maxresponsesize_cached] [nfsd41: mv nfsd4_set_statp under CONFIG_NFSD_V4_1] [nfsd41: rename nfsd4_move_pages] [nfsd41: rename page_no variable] [nfsd41: rename nfsd4_set_cache_entry] [nfsd41: fix nfsd41_copy_replay_data comment] [nfsd41: add to nfsd4_set_cache_entry] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>