aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2007-05-09NFS: Clean up NFSv4 XDR error messageChuck Lever
Make it more useful for debugging purposes. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09NFS: NFS client underestimates how large an NFSv4 SETATTR reply can beChuck Lever
The maximum size of an NFSv4 SETATTR compound reply should include the GETATTR operation that we send. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09NFS: Remove redundant check in nfs_check_verifier()Trond Myklebust
The check for nfs_attribute_timeout(dir) in nfs_check_verifier is redundant: nfs_lookup_revalidate() will already call nfs_revalidate_inode() on the parent dir when necessary. The only case where this is not done is the case of a negative dentry. Fix this case by moving up the revalidation code. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09NFS: Fix a jiffie wraparound issueTrond Myklebust
dentry verifiers are always set to the parent directory's cache_change_attribute. There is no reason to be testing for anything other than equality when we're trying to find out if the dentry has been checked since the last time the directory was modified. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09Merge git://git.infradead.org/mtd-2.6Linus Torvalds
* git://git.infradead.org/mtd-2.6: (21 commits) [MTD] [CHIPS] Remove MTD_OBSOLETE_CHIPS (jedec, amd_flash, sharp) [MTD] Delete allegedly obsolete "bank_size" field of mtd_info. [MTD] Remove unnecessary user space check from mtd.h. [MTD] [MAPS] Remove flash maps for no longer supported 405LP boards [MTD] [MAPS] Fix missing printk() parameter in physmap_of.c MTD driver [MTD] [NAND] platform NAND driver: add driver [MTD] [NAND] platform NAND driver: update header [JFFS2] Simplify and clean up jffs2_add_tn_to_tree() some more. [JFFS2] Remove another bogus optimisation in jffs2_add_tn_to_tree() [JFFS2] Remove broken insert_point optimisation in jffs2_add_tn_to_tree() [JFFS2] Remember to calculate overlap on nodes which replace older nodes [JFFS2] Don't advance c->wbuf_ofs to next eraseblock after wbuf flush [MTD] [NAND] at91_nand.c: CMDLINE_PARTS support [MTD] [NAND] Tidy up handling of page number in nand_block_bad() [MTD] block2mtd_paramline[] mustn't be __initdata [MTD] [NAND] Support multiple chips in CAFÉ driver [MTD] [NAND] Rename cafe.c to cafe_nand.c and remove the multi-obj magic [MTD] [NAND] Use rslib for CAFÉ ECC [RSLIB] Support non-canonical GF representations [JFFS2] Remove dead file histo_mips.h ...
2007-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) sound: convert "sound" subdirectory to UTF-8 MAINTAINERS: Add cxacru website/mailing list include files: convert "include" subdirectory to UTF-8 general: convert "kernel" subdirectory to UTF-8 documentation: convert the Documentation directory to UTF-8 Convert the toplevel files CREDITS and MAINTAINERS to UTF-8. remove broken URLs from net drivers' output Magic number prefix consistency change to Documentation/magic-number.txt trivial: s/i_sem /i_mutex/ fix file specification in comments drivers/base/platform.c: fix small typo in doc misc doc and kconfig typos Remove obsolete fat_cvf help text Fix occurrences of "the the " Fix minor typoes in kernel/module.c Kconfig: Remove reference to external mqueue library Kconfig: A couple of grammatical fixes in arch/i386/Kconfig Correct comments in genrtc.c to refer to correct /proc file. Fix more "deprecated" spellos. Fix "deprecated" typoes. ... Fix trivial comment conflict in kernel/relay.c.
2007-05-09Add suspend-related notifications for CPU hotplugRafael J. Wysocki
Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09reiserfs: use zero_user_pageNate Diller
Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09ext3: use zero_user_pageNate Diller
Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09affs: use zero_user_pageNate Diller
Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09fs: convert core functions to zero_user_pageNate Diller
It's very common for file systems to need to zero part or all of a page, the simplist way is just to use kmap_atomic() and memset(). There's actually a library function in include/linux/highmem.h that does exactly that, but it's confusingly named memclear_highpage_flush(), which is descriptive of *how* it does the work rather than what the *purpose* is. So this patchset renames the function to zero_user_page(), and calls it from the various places that currently open code it. This first patch introduces the new function call, and converts all the core kernel callsites, both the open-coded ones and the old memclear_highpage_flush() ones. Following this patch is a series of conversions for each file system individually, per AKPM, and finally a patch deprecating the old call. The diffstat below shows the entire patchset. [akpm@linux-foundation.org: fix a few things] Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09knfsd: avoid Oops if buggy userspace performs confusing filehandle->dentry ↵NeilBrown
mapping When a lookup request arrives, nfsd uses information provided by userspace (mountd) to find the right filesystem. It then assumes that the same filehandle type as the incoming filehandle can be used to create an outgoing filehandle. However if mountd is buggy, or maybe just being creative, the filesystem may not support that filesystem type, and the kernel could oops, particularly if 'ex_uuid' is NULL but a FSID_UUID* filehandle type is used. So add some proper checking that the fsid version/type from the incoming filehandle is actually supportable, and ignore that information if it isn't supportable. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09knfsd: various nfsd xdr cleanupsNeilBrown
1/ decode_sattr and decode_sattr3 never return NULL, so remove several checks for that. ditto for xdr_decode_hyper. 2/ replace some open coded XDR_QUADLEN calls with calls to XDR_QUADLEN 3/ in decode_writeargs, simply an 'if' to use a single calculation. .page_len is the length of that part of the packet that did not fit in the first page (the head). So the length of the data part is the remainder of the head, plus page_len. 3/ other minor cleanups. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09knfsd: trivial makefile cleanupChristoph Hellwig
kbuild directly interprets <modulename>-y as objects to build into a module, no need to assign it to the old foo-objs variable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09knfsd: avoid use of unitialised variables on error path when nfs exportsNeilBrown
We need to zero various parts of 'exp' before any 'goto out', otherwise when we go to free the contents... we die. Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09RPC: add wrapper for svc_reserve to account for checksumJeff Layton
When the kernel calls svc_reserve to downsize the expected size of an RPC reply, it fails to account for the possibility of a checksum at the end of the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O then you'll generally see messages similar to this in the server's ring buffer: RPC request reserved 164 but used 208 While I was never able to verify it, I suspect that this problem is also the root cause of some oopses I've seen under these conditions: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726 This is probably also a problem for other sec= types and for NFSv4. The large reserved size for NFSv4 compound packets seems to generally paper over the problem, however. This patch adds a wrapper for svc_reserve that accounts for the possibility of a checksum. It also fixes up the appropriate callers of svc_reserve to call the wrapper. For now, it just uses a hardcoded value that I determined via testing. That value may need to be revised upward as things change, or we may want to eventually add a new auth_op that attempts to calculate this somehow. Unfortunately, there doesn't seem to be a good way to reliably determine the expected checksum length prior to actually calculating it, particularly with schemes like spkm3. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09nfsd/nfs4state: remove unnecessary daemonize callEric W. Biederman
Acked-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09The NFSv2/NFSv3 server does not handle zero length WRITE requests correctlyPeter Staubach
The NFSv2 and NFSv3 servers do not handle WRITE requests for 0 bytes correctly. The specifications indicate that the server should accept the request, but it should mostly turn into a no-op. Currently, the server will return an XDR decode error, which it should not. Attached is a patch which addresses this issue. It also adds some boundary checking to ensure that the request contains as much data as was requested to be written. It also correctly handles an NFSv3 request which requests to write more data than the server has stated that it is prepared to handle. Previously, there was some support which looked like it should work, but wasn't quite right. Signed-off-by: Peter Staubach <staubach@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09remove nfs4_acl_add_ace()Adrian Bunk
nfs4_acl_add_ace() can now be removed. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Neil Brown <neilb@cse.unsw.edu.au> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09unify flush_work/flush_work_keventd and rename it to cancel_work_syncOleg Nesterov
flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09aio: use flush_work()Andrew Morton
Migrate AIO over to use flush_work(). Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09AFS: implement basic file write supportDavid Howells
Implement support for writing to regular AFS files, including: (1) write (2) truncate (3) fsync, fdatasync (4) chmod, chown, chgrp, utime. AFS writeback attempts to batch writes into as chunks as large as it can manage up to the point that it writes back 65535 pages in one chunk or it meets a locked page. Furthermore, if a page has been written to using a particular key, then should another write to that page use some other key, the first write will be flushed before the second is allowed to take place. If the first write fails due to a security error, then the page will be scrapped and reread before the second write takes place. If a page is dirty and the callback on it is broken by the server, then the dirty data is not discarded (same behaviour as NFS). Shared-writable mappings are not supported by this patch. [akpm@linux-foundation.org: fix a bunch of warnings] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09AFS: AFS fixupsDavid Howells
Make some miscellaneous changes to the AFS filesystem: (1) Assert RCU barriers on module exit to make sure RCU has finished with callbacks in this module. (2) Correctly handle the AFS server returning a zero-length read. (3) Split out data zapping calls into one function (afs_zap_data). (4) Rename some afs_file_*() functions to afs_*() where they apply to non-regular files too. (5) Be consistent about the presentation of volume ID:vnode ID in debugging output. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09fs: use path_walk in do_path_lookupJosef 'Jeff' Sipek
Since path_walk sets the total_link_count to 0 and calls link_path_walk, we can just call path_walk directly. Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09fs: fix indentation in do_path_lookupJosef 'Jeff' Sipek
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09use simple_read_from_buffer() in fs/Akinobu Mita
Cleanup using simple_read_from_buffer() in binfmt_misc, configfs, and sysfs. Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Joel Becker <joel.becker@oracle.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09fix file specification in commentsUwe Kleine-König
Many files include the filename at the beginning, serveral used a wrong one. Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09Remove obsolete fat_cvf help textAlexander E. Patrakov
The text removed by the following patch refers to functionality that never worked, to non-existing documentation file, and to mount options marked as obsolete in the module. Signed-off-by: Alexander E. Patrakov <patrakov@ums.usu.ru> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09Fix occurrences of "the the "Michael Opdenacker
Signed-off-by: Michael Opdenacker <michael@free-electrons.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09Fix misspellings collected by members of KJ list.Robert P. J. Day
Fix the misspellings of "propogate", "writting" and (oh, the shame :-) "kenrel" in the source tree. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09Style fix in fs/select.cWANG Cong
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09fs/libfs.c: >80 columns line break fixRonni Nielsen
Signed-off-by: Ronni Nielsen <theronni@gmail.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-08smaps: only define clear_refs for CONFIG_MMUDavid Rientjes
/proc/pid/clear_refs is only defined in the CONFIG_MMU case, so make sure we don't have any references to clear_refs_smap() in generic procfs code. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Remove suid/sgid bits on [f]truncate()Linus Torvalds
.. to match what we do on write(). This way, people who write to files by using [f]truncate + writable mmap have the same semantics as if they were using the write() family of system calls. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Merge git://oss.sgi.com:8090/xfs/xfs-2.6Linus Torvalds
* git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Add lockdep support for XFS [XFS] Fix race in xfs_write() b/w dmapi callout and direct I/O checks. [XFS] Get rid of redundant "required" in msg. [XFS] Export via a function xfs_buftarg_list for use by kdb/xfsidbg. [XFS] Remove unused ilen variable and references. [XFS] Fix to prevent the notorious 'NULL files' problem after a crash. [XFS] Fix race condition in xfs_write(). [XFS] Fix uquota and oquota enforcement problems. [XFS] propogate return codes from flush routines [XFS] Fix quotaon syscall failures for group enforcement requests. [XFS] Invalidate quotacheck when mounting without a quota type. [XFS] reducing the number of random number functions. [XFS] remove more misc. unused args [XFS] the "aendp" arg to xfs_dir2_data_freescan is always NULL, remove it. [XFS] The last argument "lsn" of xfs_trans_commit() is always called with
2007-05-08Merge branch 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-blockLinus Torvalds
* 'for-2.6.22' of git://git.kernel.dk/data/git/linux-2.6-block: [PATCH] ll_rw_blk: fix missing bounce in blk_rq_map_kern() [PATCH] splice: always call into page_cache_readahead() [PATCH] splice(): fix interaction with readahead
2007-05-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: JFS: Fix race waking up jfsIO kernel thread JFS: use __set_current_state() Copy i_flags to jfs inode flags on write JFS: document uid, gid, and umask mount options in jfs.txt
2007-05-08udf: possible null pointer dereference while load_partitionDmitriy Monakhov
sb_read may return NULL, let's explicitly check it. Signed-off-by: Dmitriy Monakhov <dmonakhov@openvz.org> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08udf: support files larger than 1GJan Kara
Make UDF work correctly for files larger than 1GB. As no extent can be longer than (1<<30)-blocksize bytes, we have to create several extents if a big hole is being created. As a side-effect, we now don't discard preallocated blocks when creating a hole. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08udf: add assertionsJan Kara
Add a few assertions into udf_discard_prealloc() to check that the file is sane (mostly helps debugging further patches ;). Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08udf: use get_bh()Jan Kara
Make UDF use get_bh() instead of directly accessing b_count and use brelse() instead of udf_release_data() which does just brelse()... Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08UDF: introduce struct extent_positionJan Kara
Introduce a structure extent_position to store a position of an extent and the corresponding buffer_head in one place. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08udf: use sector_t and loff_t for file offsetsJan Kara
Use sector_t and loff_t for file offsets in UDF filesystem. Otherwise an overflow may occur for long files. Also make inode_bmap() return offset in the extent in number of blocks instead of number of bytes - for most callers this is more convenient. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08nfs: fix congestion control: use atomic_longsPeter Zijlstra
Change the atomic_t in struct nfs_server to atomic_long_t in anticipation of machines that can handle 8+TB of (4K) pages under writeback. However I suspect other things in NFS will start going *bang* by then. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08utimensat implementationUlrich Drepper
Implement utimensat(2) which is an extension to futimesat(2) in that it a) supports nano-second resolution for the timestamps b) allows to selectively ignore the atime/mtime value c) allows to selectively use the current time for either atime or mtime d) supports changing the atime/mtime of a symlink itself along the lines of the BSD lutimes(3) functions For this change the internally used do_utimes() functions was changed to accept a timespec time value and an additional flags parameter. Additionally the sys_utime function was changed to match compat_sys_utime which already use do_utimes instead of duplicating the work. Also, the completely missing futimensat() functionality is added. We have such a function in glibc but we have to resort to using /proc/self/fd/* which not everybody likes (chroot etc). Test application (the syscall number will need per-arch editing): #include <errno.h> #include <fcntl.h> #include <time.h> #include <sys/time.h> #include <stddef.h> #include <syscall.h> #define __NR_utimensat 280 #define UTIME_NOW ((1l << 30) - 1l) #define UTIME_OMIT ((1l << 30) - 2l) int main(void) { int status = 0; int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666); if (fd == -1) error (1, errno, "failed to create test file \"ttt\""); struct stat64 st1; if (fstat64 (fd, &st1) != 0) error (1, errno, "fstat failed"); struct timespec t[2]; t[0].tv_sec = 0; t[0].tv_nsec = 0; t[1].tv_sec = 0; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); struct stat64 st2; if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0) { puts ("atim not reset to zero"); status = 1; } if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0) { puts ("mtim not reset to zero"); status = 1; } if (status != 0) goto out; t[0] = st1.st_atim; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_OMIT; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("atim not set"); status = 1; } if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0) { puts ("mtim changed from zero"); status = 1; } if (status != 0) goto out; t[0].tv_sec = 0; t[0].tv_nsec = UTIME_OMIT; t[1] = st1.st_mtim; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != st1.st_atim.tv_sec || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec) { puts ("mtim changed from original time"); status = 1; } if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec) { puts ("mtim not set"); status = 1; } if (status != 0) goto out; sleep (2); t[0].tv_sec = 0; t[0].tv_nsec = UTIME_NOW; t[1].tv_sec = 0; t[1].tv_nsec = UTIME_NOW; if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); struct timeval tv; gettimeofday(&tv,NULL); if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec || st2.st_atim.tv_sec > tv.tv_sec) { puts ("atim not set to NOW"); status = 1; } if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec || st2.st_mtim.tv_sec > tv.tv_sec) { puts ("mtim not set to NOW"); status = 1; } if (symlink ("ttt", "tttsym") != 0) error (1, errno, "cannot create symlink"); t[0].tv_sec = 0; t[0].tv_nsec = 0; t[1].tv_sec = 0; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0) error (1, errno, "utimensat failed"); if (lstat64 ("tttsym", &st2) != 0) error (1, errno, "lstat failed"); if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0) { puts ("symlink atim not reset to zero"); status = 1; } if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0) { puts ("symlink mtim not reset to zero"); status = 1; } if (status != 0) goto out; t[0].tv_sec = 1; t[0].tv_nsec = 0; t[1].tv_sec = 1; t[1].tv_nsec = 0; if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0) error (1, errno, "utimensat failed"); if (fstat64 (fd, &st2) != 0) error (1, errno, "fstat failed"); if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0) { puts ("atim not reset to one"); status = 1; } if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0) { puts ("mtim not reset to one"); status = 1; } if (status == 0) puts ("all OK"); out: close (fd); unlink ("ttt"); unlink ("tttsym"); return status; } [akpm@linux-foundation.org: add missing i386 syscall table entry] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Alexey Dobriyan <adobriyan@openvz.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08inode numbering: change libfs sb creation routines to avoid collisions with ↵Jeff Layton
their root inodes This patch makes it so that simple_fill_super and get_sb_pseudo assign their root inodes to be number 1. It also fixes up a couple of callers of simple_fill_super that were passing in files arrays that had an index at number 1, and adds a warning for any caller that sends in such an array. It would have been nice to have made it so that it wasn't possible to make such a collision, but some callers need to be able to control what inode number their entries get, so I think this is the best that can be done. Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08inode numbering: make static counters in new_inode and iunique be 32 bitsJeff Layton
The problems are: - on filesystems w/o permanent inode numbers, i_ino values can be larger than 32 bits, which can cause problems for some 32 bit userspace programs on a 64 bit kernel. We can't do anything for filesystems that have actual >32-bit inode numbers, but on filesystems that generate i_ino values on the fly, we should try to have them fit in 32 bits. We could trivially fix this by making the static counters in new_inode and iunique 32 bits, but... - many filesystems call new_inode and assume that the i_ino values they are given are unique. They are not guaranteed to be so, since the static counter can wrap. This problem is exacerbated by the fix for #1. - after allocating a new inode, some filesystems call iunique to try to get a unique i_ino value, but they don't actually add their inodes to the hashtable, and so they're still not guaranteed to be unique if that counter wraps. This patch set takes the simpler approach of simply using iunique and hashing the inodes afterward. Christoph H. previously mentioned that he thought that this approach may slow down lookups for filesystems that currently hash their inodes. The questions are: 1) how much would this slow down lookups for these filesystems? 2) is it enough to justify adding more infrastructure to avoid it? What might be best is to start with this approach and then only move to using IDR or some other scheme if these extra inodes in the hashtable prove to be problematic. I've done some cursory testing with this patch and the overhead of hashing and unhashing the inodes with pipefs is pretty low -- just a few seconds of system time added on to the creation and destruction of 10 million pipes (very similar to the overhead that the IDR approach would add). The hard thing to measure is what effect this has on other filesystems. I'm open to ways to try and gauge this. Again, I've only converted pipefs as an example. If this approach is acceptable then I'll start work on patches to convert other filesystems. With a pretty-much-worst-case microbenchmark provided by Eric Dumazet <dada1@cosmosbay.com>: hashing patch (pipebench): sys 1m15.329s sys 1m16.249s sys 1m17.169s unpatched (pipebench): sys 1m9.836s sys 1m12.541s sys 1m14.153s Which works out to 1.05642174294555027017. So ~5-6% slowdown. This patch: When a 32-bit program that was not compiled with large file offsets does a stat and gets a st_ino value back that won't fit in the 32 bit field, glibc (correctly) generates an EOVERFLOW error. We can't do anything about fs's with larger permanent inode numbers, but when we generate them on the fly, we ought to try and have them fit within a 32 bit field. This patch takes the first step toward this by making the static counters in these two functions be 32 bits. [jlayton@redhat.com: mention that it's only the case for 32bit, non-LFS stat] Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Invalid return value of execve() resulting in oopsesAlexey Kuznetsov
When elf loader fails to map executable (due to memory shortage or because binary is malformed), it can return 0. Normally, this is invisible because process is killed with SIGKILL and it never returns to user space. But if exec() is called from kernel thread (hotplug, whatever) consequences are more interesting and vary depending on architecture. i386. Nothing especially interesting, execve() just returns with "success" :-) x86_64. Fake zero frame is used on way to caller, RSP/RIP are loaded with zeros, ergo... double fault. ia64. Similar to i386, but r32...r95 are corrupted. Sometimes it oopses due to return to zero PC, sometimes it sees NaT in rXX and oopses due to NaT consumption. Signed-off-by: Alexey Kuznetsov <alexey@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08procfs: use simple_read_from_buffer()Akinobu Mita
Cleanup using simple_read_from_buffer() in procfs. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08Fix error handling in HDIO_GETGEO compat wrapperAndreas Schwab
Don't clobber error from sys_ioctl in HDIO_GETGEO compat wrapper. Signed-off-by: Andreas Schwab <schwab@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>