aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-09-14nilfs2: An unassigned variable is assigned to a never used structure memberZhang Qiang
nilfs2: In procedure 'nilfs_get_sb()', when a nilfs filesysttem is mounted for the first time, local variable 'nilfs->ns_last_cno' is used before loading the latest checkpoint number from disk (in 'nilfs_fill_super'). 'nilfs->ns_last_cno' is assigned to 'sd.cno', but 'sd.cno' has never been used in the procedure. Signed-off-by: Zhang Qiang <zhangqiang.buaa@gmail.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: use GFP_NOIO for bio_alloc instead of GFP_NOWAITRyusuke Konishi
Alberto Bertogli advised me about bio_alloc() use in nilfs: On Sat, 13 Jun 2009 22:52:40 -0300, Alberto Bertogli wrote: > By the way, those bio_alloc()s are using GFP_NOWAIT but it looks > like they could use at least GFP_NOIO or GFP_NOFS, since the caller > can (and sometimes do) sleep. The only caller is nilfs_submit_bh(), > which calls nilfs_submit_seg_bio() which can sleep calling > wait_for_completion(). This takes in the comment and replaces the use of GFP_NOWAIT flag with GFP_NOIO. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: stop using periodic write_super callbackJiro SEKIBA
This removes nilfs_write_super and commit super block in nilfs internal thread, instead of periodic write_super callback. VFS layer calls ->write_super callback periodically. However, it looks like that calling back is ommited when disk I/O is busy. And when cleanerd (nilfs GC) is runnig, disk I/O tend to be busy thus nilfs superblock is not synchronized as nilfs designed. To avoid it, syncing superblock by nilfs thread instead of pdflush. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: clean up nilfs_write_superJiro SEKIBA
Separate conditions that check if syncing super block and alternative super block are required as inline functions to reuse the conditions. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: fix disorder of nilfs_write_super in nilfs_sync_fsJiro SEKIBA
This fixes disorder of nilfs_write_super in nilfs_sync_fs. Commiting super block must be the end of the function so that every changes are reflected. ->sync_fs() is not called frequently so this makes nilfs_sync_fs call nilfs_commit_super instead of nilfs_write_super. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: remove redundant super block commitJiro SEKIBA
This removes redundant super block commit. nilfs_write_super will call nilfs_commit_super to store super block into block device. However, nilfs_put_super will call nilfs_commit_super right after calling nilfs_write_super. So calling nilfs_write_super in nilfs_put_super would be redundant. Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: implement nilfs_show_options to display mount options in /proc/mountsJiro SEKIBA
This is a patch to display mount options in procfs. Mount options will show up in the /proc/mounts as other fs does. ... /dev/sda6 /mnt nilfs2 ro,relatime,barrier=off,cp=3,order=strict 0 0 ... Signed-off-by: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: always lookup disk block address before reading metadata blockRyusuke Konishi
The current metadata file code skips disk address lookup for its data block if the buffer has a mapped flag. This has a potential risk to cause read request to be performed against the stale block address that GC moved, and it may lead to meta data corruption. The mapped flag is safe if the buffer has an uptodate flag, otherwise it may prevent necessary update of disk address in the next read. This will avoid the potential problem by ensuring disk address lookup before reading metadata block even for buffers with the mapped flag. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: use semaphore to protect pointer to a writable FS-instanceRyusuke Konishi
will get rid of nilfs_get_writer() and nilfs_put_writer() pair used to retain a writable FS-instance for a period. The pair functions were making up some kind of recursive lock with a mutex, but they became overkill since the commit 201913ed746c7724a40d33ee5a0b6a1fd2ef3193. Furthermore, they caused the following lockdep warning because the mutex can be released by a task which didn't lock it: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- kswapd0/422 is trying to release lock (&nilfs->ns_writer_mutex) at: [<c1359ff5>] mutex_unlock+0x8/0xa but there are no more locks to release! other info that might help us debug this: no locks held by kswapd0/422. stack backtrace: Pid: 422, comm: kswapd0 Not tainted 2.6.31-rc4-nilfs #51 Call Trace: [<c1358f97>] ? printk+0xf/0x18 [<c104fea7>] print_unlock_inbalance_bug+0xcc/0xd7 [<c11578de>] ? prop_put_global+0x3/0x35 [<c1050195>] lock_release+0xed/0x1dc [<c1359ff5>] ? mutex_unlock+0x8/0xa [<c1359f83>] __mutex_unlock_slowpath+0xaf/0x119 [<c1359ff5>] mutex_unlock+0x8/0xa [<d1284add>] nilfs_mdt_write_page+0xd8/0xe1 [nilfs2] [<c1092653>] shrink_page_list+0x379/0x68d [<c109171b>] ? isolate_pages_global+0xb4/0x18c [<c1092bd2>] shrink_list+0x26b/0x54b [<c10930be>] shrink_zone+0x20c/0x2a2 [<c10936b7>] kswapd+0x407/0x591 [<c1091667>] ? isolate_pages_global+0x0/0x18c [<c1040603>] ? autoremove_wake_function+0x0/0x33 [<c10932b0>] ? kswapd+0x0/0x591 [<c104033b>] kthread+0x69/0x6e [<c10402d2>] ? kthread+0x0/0x6e [<c1003e33>] kernel_thread_helper+0x7/0x1a This patch uses a reader/writer semaphore instead of the own lock and kills this warning. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: fix format string compile warning (ino_t)Heiko Carstens
Unlike on most other architectures ino_t is an unsigned int on s390. So add an explicit cast to avoid this compile warning: fs/nilfs2/recovery.c: In function 'recover_dsync_blocks': fs/nilfs2/recovery.c:555: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'ino_t' Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14nilfs2: fix ignored error code in __nilfs_read_inode()Ryusuke Konishi
The __nilfs_read_inode function is ignoring the error code returned from nilfs_read_inode_common(), and wrongly delivers a success code (zero) when it escapes from the function in erroneous cases. This adds the missing error handling. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-09-14GFS2: Whitespace fixesSteven Whitehouse
Reported-by: Daniel Walker <dwalker@fifo99.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-09-14block: use blkdev_issue_discard in blk_ioctl_discardChristoph Hellwig
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-14Seperate read and write statistics of in_flight requestsNikanth Karthikesan
Currently, there is a single in_flight counter measuring the number of requests in the request_queue. But some monitoring tools would like to know how many read requests and write requests are in progress. Split the current in_flight counter into two seperate counters for read and write. This information is exported as a sysfs attribute, as changing the currently available stat files would break the existing tools. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (87 commits) NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3' NFS: Allow the "nfs" file system type to support NFSv4 NFS: Move details of nfs4_get_sb() to a helper NFS: Refactor NFSv4 text-based mount option validation NFS: Mount option parser should detect missing "port=" NFS: out of date comment regarding O_EXCL above nfs3_proc_create() NFS: Handle a zero-length auth flavor list SUNRPC: Ensure that sunrpc gets initialised before nfs, lockd, etc... nfs: fix compile error in rpc_pipefs.h nfs: Remove reference to generic_osync_inode from a comment SUNRPC: cache must take a reference to the cache detail's module on open() NFS: Use the DNS resolver in the mount code. NFS: Add a dns resolver for use with NFSv4 referrals and migration SUNRPC: Fix a typo in cache_pipefs_files nfs: nfs4xdr: optimize low level decoding nfs: nfs4xdr: get rid of READ_BUF nfs: nfs4xdr: simplify decode_exchange_id by reusing decode_opaque_inline nfs: nfs4xdr: get rid of COPYMEM nfs: nfs4xdr: introduce decode_sessionid helper nfs: nfs4xdr: introduce decode_verifier helper ...
2009-09-11Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits) sched: Fix sched::sched_stat_wait tracepoint field sched: Disable NEW_FAIR_SLEEPERS for now sched: Keep kthreads at default priority sched: Re-tune the scheduler latency defaults to decrease worst-case latencies sched: Turn off child_runs_first sched: Ensure that a child can't gain time over it's parent after fork() sched: enable SD_WAKE_IDLE sched: Deal with low-load in wake_affine() sched: Remove short cut from select_task_rq_fair() sched: Turn on SD_BALANCE_NEWIDLE sched: Clean up topology.h sched: Fix dynamic power-balancing crash sched: Remove reciprocal for cpu_power sched: Try to deal with low capacity, fix update_sd_power_savings_stats() sched: Try to deal with low capacity sched: Scale down cpu_power due to RT tasks sched: Implement dynamic cpu_power sched: Add smt_gain sched: Update the cpu_power sum during load-balance sched: Add SD_PREFER_SIBLING ...
2009-09-11Merge branch 'nfs-for-2.6.32'Trond Myklebust
2009-09-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (377 commits) ASoC: au1x: PSC-AC97 bugfixes ALSA: dummy - Increase MAX_PCM_SUBSTREAMS to 128 ALSA: dummy - Add debug proc file ALSA: Add const prefix to proc helper functions ALSA: Re-export snd_pcm_format_name() function ALSA: hda - Use auto model for HP laptops with ALC268 codec ALSA: cs46xx - Fix minimum period size ASoC: Fix WM835x Out4 capture enumeration ALSA: Remove unneeded ifdef from sound/core.h ALSA: Remove struct snd_monitor_file from public sound/core.h ASoC: Remove unuused hw_read_t sound: oxygen: work around MCE when changing volume ALSA: dummy - Fake buffer allocations ALSA: hda/realtek: Added support for CLEVO M540R subsystem, 6 channel + digital ASoC: fix pxa2xx-ac97.c breakage ALSA: dummy - Fix the timer calculation in systimer mode ALSA: dummy - Add more description ALSA: dummy - Better jiffies handling ALSA: dummy - Support high-res timer mode ALSA: Release v1.0.21 ...
2009-09-11Merge branch 'writeback' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'writeback' of git://git.kernel.dk/linux-2.6-block: writeback: check for registered bdi in flusher add and inode dirty writeback: add name to backing_dev_info writeback: add some debug inode list counters to bdi stats writeback: get rid of pdflush completely writeback: switch to per-bdi threads for flushing data writeback: move dirty inodes from super_block to backing_dev_info writeback: get rid of generic_sync_sb_inodes() export
2009-09-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (57 commits) binfmt_elf: fix PT_INTERP bss handling TPM: Fixup boot probe timeout for tpm_tis driver sysfs: Add labeling support for sysfs LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information. VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx. KEYS: Add missing linux/tracehook.h #inclusions KEYS: Fix default security_session_to_parent() Security/SELinux: includecheck fix kernel/sysctl.c KEYS: security_cred_alloc_blank() should return int under all circumstances IMA: open new file for read KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] KEYS: Extend TIF_NOTIFY_RESUME to (almost) all architectures [try #6] KEYS: Do some whitespace cleanups [try #6] KEYS: Make /proc/keys use keyid not numread as file position [try #6] KEYS: Add garbage collection for dead, revoked and expired keys. [try #6] KEYS: Flag dead keys to induce EKEYREVOKED [try #6] KEYS: Allow keyctl_revoke() on keys that have SETATTR but not WRITE perm [try #6] KEYS: Deal with dead-type keys appropriately [try #6] CRED: Add some configurable debugging [try #6] selinux: Support for the new TUN LSM hooks ...
2009-09-11splice: update mtime and atime on filesMiklos Szeredi
Splice should update the modification and access times on regular files just like read and write. Not updating mtime will confuse backup tools, etc... This patch only adds the time updates for regular files. For pipes and other special files that splice touches the need for updating the times is less clear. Let's discuss and fix that separately. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11bio: first step in sanitizing the bio->bi_rw flag testingJens Axboe
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: check for registered bdi in flusher add and inode dirtyJens Axboe
Also a debugging aid. We want to catch dirty inodes being added to backing devices that don't do writeback. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: add name to backing_dev_infoJens Axboe
This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: get rid of pdflush completelyJens Axboe
It is now unused, so kill it off. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: switch to per-bdi threads for flushing dataJens Axboe
This gets rid of pdflush for bdi writeout and kupdated style cleaning. pdflush writeout suffers from lack of locality and also requires more threads to handle the same workload, since it has to work in a non-blocking fashion against each queue. This also introduces lumpy behaviour and potential request starvation, since pdflush can be starved for queue access if others are accessing it. A sample ffsb workload that does random writes to files is about 8% faster here on a simple SATA drive during the benchmark phase. File layout also seems a LOT more smooth in vmstat: r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42 0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44 1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37 0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58 0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34 0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37 0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44 0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38 0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41 0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45 where vanilla tends to fluctuate a lot in the creation phase: r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36 1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51 0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40 0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37 1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41 0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49 0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36 1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43 0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39 1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45 1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34 0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54 A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A SSD based writeback test on XFS performs over 20% better as well, with the throughput being very stable around 1GB/sec, where pdflush only manages 750MB/sec and fluctuates wildly while doing so. Random buffered writes to many files behave a lot better as well, as does random mmap'ed writes. A separate thread is added to sync the super blocks. In the long term, adding sync_supers_bdi() functionality could get rid of this thread again. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: move dirty inodes from super_block to backing_dev_infoJens Axboe
This is a first step at introducing per-bdi flusher threads. We should have no change in behaviour, although sb_has_dirty_inodes() is now ridiculously expensive, as there's no easy way to answer that question. Not a huge problem, since it'll be deleted in subsequent patches. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: get rid of generic_sync_sb_inodes() exportJens Axboe
This adds two new exported functions: - writeback_inodes_sb(), which only attempts to writeback dirty inodes on this super_block, for WB_SYNC_NONE writeout. - sync_inodes_sb(), which writes out all dirty inodes on this super_block and also waits for the IO to complete. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11Merge branch 'next' into for-linusJames Morris
2009-09-10Merge branch 'topic/soundcore-preclaim' into for-linusTakashi Iwai
* topic/soundcore-preclaim: sound: make OSS device number claiming optional and schedule its removal sound: request char-major-* module aliases for missing OSS devices chrdev: implement __[un]register_chrdev()
2009-09-10binfmt_elf: fix PT_INTERP bss handlingRoland McGrath
In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-09Merge branch 'lookup-permissions-cleanup'Linus Torvalds
* lookup-permissions-cleanup: jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()' ext[234]: move over to 'check_acl' permission model shmfs: use 'check_acl' instead of 'permission' Make 'check_acl()' a first-class filesystem op Simplify exec_permission_lite(), part 3 Simplify exec_permission_lite() further Simplify exec_permission_lite() logic Do not call 'ima_path_check()' for each path component
2009-09-09binfmt_elf: fix PT_INTERP bss handlingRoland McGrath
In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-10sysfs: Add labeling support for sysfsDavid P. Quigley
This patch adds a setxattr handler to the file, directory, and symlink inode_operations structures for sysfs. The patch uses hooks introduced in the previous patch to handle the getting and setting of security information for the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the sysfs_dirent structure has been replaced by a structure which contains the iattr, secdata and secdata length to allow the changes to persist in the event that the inode representing the sysfs_dirent is evicted. Because sysfs only stores this information when a change is made all the optional data is moved into one dynamically allocated field. This patch addresses an issue where SELinux was denying virtd access to the PCI configuration entries in sysfs. The lack of setxattr handlers for sysfs required that a single label be assigned to all entries in sysfs. Granting virtd access to every entry in sysfs is not an acceptable solution so fine grained labeling of sysfs is required such that individual entries can be labeled appropriately. [sds: Fixed compile-time warnings, coding style, and setting of inode security init flags.] Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-10VFS: Factor out part of vfs_setxattr so it can be called from the SELinux ↵David P. Quigley
hook for inode_setsecctx. This factors out the part of the vfs_setxattr function that performs the setting of the xattr and its notification. This is needed so the SELinux implementation of inode_setsecctx can handle the setting of the xattr while maintaining the proper separation of layers. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-09GFS2: Remove unused sysfs fileSteven Whitehouse
The /sys/fs/gfs2/<fsname>/lock_module/id file has been unused for some time now, so we can remove it. We still accept the mount option though, as userspace still sends that. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-09-08NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3'Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: Allow the "nfs" file system type to support NFSv4Chuck Lever
When mounting an "nfs" type file system, recognize "v4," "vers=4," or "nfsvers=4" mount options, and convert the file system to "nfs4" under the covers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [trondmy: fixed up binary mount code so it sets the 'version' field too] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: Move details of nfs4_get_sb() to a helperChuck Lever
Clean up: Refactor nfs4_get_sb() to allow its guts to be invoked by nfs_get_sb(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: Refactor NFSv4 text-based mount option validationChuck Lever
Clean up: Refactor the part of nfs4_validate_mount_options() that handles text-based options, so we can call it from the NFSv2/v3 option validation function. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: Mount option parser should detect missing "port="Chuck Lever
The meaning of not specifying the "port=" mount option is different for "-t nfs" and "-t nfs4" mounts. The default port value for NFSv2/v3 mounts is 0, but the default for NFSv4 mounts is 2049. To support "-t nfs -o vers=4", the mount option parser must detect when "port=" is missing so that the correct default port value can be set depending on which NFS version is requested. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08NFS: out of date comment regarding O_EXCL above nfs3_proc_create()Harshula Jayasuriya
Hi Trond, Recently we were observing the behaviour difference between a 2.4.x and 2.6.x kernel with respect to O_EXCL. A comment from 2.4.x era, "For now, we don't implement O_EXCL." seems inaccurate in TOT. If so, here's a patch to remove the comment. This patch is against: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08jffs2/jfs/xfs: switch over to 'check_acl' rather than 'permission()'Linus Torvalds
This avoids an indirect call in the VFS for each path component lookup. Well, at least as long as you own the directory in question, and the ACL check is unnecessary. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08ext[234]: move over to 'check_acl' permission modelLinus Torvalds
Don't implement per-filesystem 'extX_permission()' functions that have to be called for every path component operation, and instead just expose the actual ACL checking so that the VFS layer can now do it for us. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Make 'check_acl()' a first-class filesystem opLinus Torvalds
This is stage one in flattening out the callchains for the common permission testing. Rather than have most filesystem implement their own inode->i_op->permission function that just calls back down to the VFS layers 'generic_permission()' with the per-filesystem ACL checking function, the filesystem can just expose its 'check_acl' function directly, and let the VFS layer do everything for it. This is all just preparatory - no filesystem actually enables this yet. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Simplify exec_permission_lite(), part 3Linus Torvalds
Don't call down to the generic inode_permission() function just to call the inode-specific permission function - just do it directly. The generic inode_permission() code does things like checking MAY_WRITE and devcgroup_inode_permission(), neither of which are relevant for the light pathname walk permission checks (we always do just MAY_EXEC, and the inode is never a special device). Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Simplify exec_permission_lite() furtherLinus Torvalds
This function is only called for path components that are already known to be directories (they have a '->lookup' method). So don't bother doing that whole S_ISDIR() testing, the whole point of the 'lite()' version is that we know that we are looking at a directory component, and that we're only checking name lookup permission. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Simplify exec_permission_lite() logicLinus Torvalds
Instead of returning EAGAIN and having the caller do something special for that case, just do the special case directly. Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08Do not call 'ima_path_check()' for each path componentLinus Torvalds
Not only is that a supremely timing-critical path, but it's hopefully some day going to be lockless for the common case, and ima can't do that. Plus the integrity code doesn't even care about non-regular files, so it was always a total waste of time and effort. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08GFS2: Be extra careful about deallocating inodesSteven Whitehouse
There is a potential race in the inode deallocation code if two nodes try to deallocate the same inode at the same time. Most of the issue is solved by the iopen locking. There is still a small window which is not covered by the iopen lock. This patches fixes that and also makes the deallocation code more robust in the face of any errors in the rgrp bitmaps, or erroneous iopen callbacks from other nodes. This does introduce one extra disk read, but that is generally not an issue since its the same block that must be written to later in the deallocation process. The total disk accesses therefore stay the same, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>