Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2006-12-15	[GFS2] Fix Kconfig	Steven Whitehouse
	Here is a patch to fix up the Kconfig so that we don't land up with problems when people disable the NET subsystem. Thanks for all the hints and suggestions that people have sent me regarding this. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Aleksandr Koltsoff <czr@iki.fi> Cc: Toralf Förster <toralf.foerster@gmx.de> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Adrian Bunk <bunk@stusta.de> Cc: Chris Zubrzycki <chris@middle--earth.org> Cc: Patrick Caulfield <pcaulfie@redhat.com>
2006-12-15	[DLM] fix compile warning	Patrick Caulfield
	This patch fixes a compile warning in lowcomms-tcp.c indicating that kmem_cache_t is deprecated. Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2006-12-13	DebugFS : file/directory removal fix	Mathieu Desnoyers
	Fix file and directory removal in debugfs. Add inotify support for file removal. The following scenario : create dir a create dir a/b cd a/b (some process goes in cwd a/b) rmdir a/b rmdir a fails due to the fact that "a" appears to be non empty. It is because the "b" dentry is not deleted from "a" and still in use. The same problem happens if "b" is a file. d_delete is nice enough to know when it needs to unhash and free the dentry if nothing else is using it or, if someone is using it, to remove it from the hash queues and wait for it to be deleted when it has no users. The nice side-effect of this fix is that it calls the file removal notification. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-13	DebugFS : more file/directory creation error handling	Mathieu Desnoyers
	Correct dentry count to handle creation errors. This patch puts a dput at the file creation instead of the file removal : lookup_one_len already returns a dentry with reference count of 1. Then, the dget() in simple_mknod increments it when the dentry is associated with a file. In a scenario where simple_create or simple_mkdir returns an error, this would lead to an unwanted increment of the reference counter, therefore making file removal impossible. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-13	DebugFS : file/directory creation error handling	Mathieu Desnoyers
	Fix error handling of file and directory creation in DebugFS. The error path should release the file system because no _remove will be called for this erroneous creation. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-13	DebugFS : coding style fixes	Mathieu Desnoyers
	Minor coding style fixes along the way : 80 cols and a white space. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-13	DebugFS : inotify create/mkdir support	Mathieu Desnoyers
	Add inotify create and mkdir events to DebugFS. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-13	[PATCH] getting rid of all casts of k[cmz]alloc() calls	Robert P. J. Day
	Run this: #!/bin/sh for f in $(grep -Erl "$[^$]\) k[cmz]alloc" ) ; do echo "De-casting $f..." perl -pi -e "s/ ?= ?$[^$]\) (k[cmz]alloc) \(/ = \1\(/" $f done And then go through and reinstate those cases where code is casting pointers to non-pointers. And then drop a few hunks which conflicted with outstanding work. Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Greg KH <greg@kroah.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Karsten Keil <kkeil@suse.de> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Ian Kent <raven@themaw.net> Cc: Steven French <sfrench@us.ibm.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: Fix up some bit-rot in exp_export	NeilBrown
	The nfsservctl system call isn't used but recent nfs-utils releases for exporting filesystems, and consequently the code that is uses - exp_export - has suffered some bitrot. Particular: - some newly added fields in 'struct svc_export' are being initialised properly. - the return value is now always -ENOMEM ... This patch fixes both these problems. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: simplify filehandle check	J.Bruce Fields
	Kill another big "if" clause. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: simplify migration op check	J.Bruce Fields
	I'm not too fond of these big if conditions. Replace them by checks of a flag in the operation descriptor. To my eye this makes the code a bit more self-documenting, and makes the complicated part of the code (proc_compound) a little more compact. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: reorganize compound ops	J.Bruce Fields
	Define an op descriptor struct, use it to simplify nfsd4_proc_compound(). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: make verify and nverify wrappers	J.Bruce Fields
	Make wrappers for verify and nverify, for consistency with other ops. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: don't inline nfsd4 compound op functions	J.Bruce Fields
	The inlining contributes to bloating the stack of nfsd4_compound, and I want to change the compound op functions to function pointers anyway. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: move replay_owner to cstate	J.Bruce Fields
	Tuck away the replay_owner in the cstate while we're at it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: remove spurious replay_owner check	J.Bruce Fields
	OK, this is embarassing--I've even looked back at the history, and cannot for the life of me figure out why I added this check. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: pass saved and current fh together into nfsd4 operations	J.Bruce Fields
	Pass the saved and current filehandles together into all the nfsd4 compound operations. I want a unified interface to these operations so we can just call them by pointer and throw out the huge switch statement. Also I'll eventually want a structure like this--that holds the state used during compound processing--for deferral. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd: don't drop silently on upcall deferral	J.Bruce Fields
	To avoid tying up server threads when nfsd makes an upcall (to mountd, to get export options, to idmapd, for nfsv4 name<->id mapping, etc.), we temporarily "drop" the request and save enough information so that we can revisit it later. Certain failures during the deferral process can cause us to really drop the request and never revisit it. This is often less than ideal, and is unacceptable in the NFSv4 case--rfc 3530 forbids the server from dropping a request without also closing the connection. As a first step, we modify the deferral code to return -ETIMEDOUT (which is translated to nfserr_jukebox in the v3 and v4 cases, and remains a drop in the v2 case). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: handling more nfsd_cross_mnt errors in nfsd4 readdir	J.Bruce Fields
	This patch on its own causes no change in behavior, since nfsd_cross_mnt() only returns -EAGAIN; but in the future I'd like it to also be able to return -ETIMEDOUT, so we may as well handle any possible error here. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd: simplify exp_pseudoroot	J.Bruce Fields
	Note there's no need for special handling of -EAGAIN here; nfserrno() does what we want already. So this is a pure cleanup with no change in functionality. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd: make exp_rootfh handle exp_parent errors	J.Bruce Fields
	Since exp_parent can fail by returning an error (-EAGAIN) in addition to by returning NULL, we should check for that case in exp_rootfh. (TODO: we should check that userland handles these errors too.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: clarify units of COMPOUND_SLACK_SPACE	J.Bruce Fields
	A comment here incorrectly states that "slack_space" is measured in words, not bytes. Remove the comment, and adjust a variable name and a few comments to clarify the situation. This is pure cleanup; there should be no change in functionality. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] knfsd: nfsd4: remove a dprink from nfsd4_lock	J.Bruce Fields
	This dprintk is printing the wrong error now, but it's probably an unnecessary dprintk anyway; just remove it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] one more EXPORT_UNUSED_SYMBOL removal	Adrian Bunk
	Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] update Tigran's email addresses	Tigran Aivazian
	As Adrian pointed out recently, there were still a couple of places where I should have fixed my email address. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] ncpfs: ensure we free wdog_pid on parse_option or fill_inode failure	Eric W. Biederman
	This took a little refactoring but now errors are handled cleanly. When this code used pid_t values this wasn't necessary because you can't leak a pid_t. Thanks to Peter Vandrovec for spotting this. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Peter Vandrovec <vandrove@vc.cvut.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] ncpfs: Use struct pid to track the userspace watchdog process	Eric W. Biederman
	This patch converts the tracking of the user space watchdog process from using a pid_t to use struct pid. This makes us safe from pid wrap around issues and prepares the way for the pid namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Petr Vandrovec <VANDROVE@vc.cvut.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] smbfs: Make conn_pid a struct pid	Eric W. Biederman
	smbfs keeps track of the user space server process in conn_pid. This converts that track to use a struct pid instead of pid_t. This keeps us safe from pid wrap around issues and prepares the way for the pid namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] lockd endianness annotations	Al Viro
	Annotated, all places switched to keeping status net-endian. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] Fix numerous kcalloc() calls, convert to kzalloc()	Robert P. J. Day
	All kcalloc() calls of the form "kcalloc(1,...)" are converted to the equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect ordering of the first two arguments are fixed. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Adam Belay <ambx1@neo.rr.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] Use activate_mm() in fs/aio.c:use_mm()	Jeremy Fitzhardinge
	activate_mm() is not the right thing to be using in use_mm(). It should be switch_mm(). On normal x86, they're synonymous, but for the Xen patches I'm adding a hook which assumes that activate_mm is only used the first time a new mm is used after creation (I have another hook for dealing with dup_mm). I think this use of activate_mm() is the only place where it could be used a second time on an mm. >From a quick look at the other architectures I think this is OK (most simply implement one in terms of the other), but some are doing some subtly different stuff between the two. Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] optimize o_direct on block devices	Chen, Kenneth W
	Implement block device specific .direct_IO method instead of going through generic direct_io_worker for block device. direct_io_worker() is fairly complex because it needs to handle O_DIRECT on file system, where it needs to perform block allocation, hole detection, extents file on write, and tons of other corner cases. The end result is that it takes tons of CPU time to submit an I/O. For block device, the block allocation is much simpler and a tight triple loop can be written to iterate each iovec and each page within the iovec in order to construct/prepare bio structure and then subsequently submit it to the block layer. This significantly speeds up O_D on block device. [akpm@osdl.org: small speedup] Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] ocfs2: relative atime support	Mark Fasheh
	Update ocfs2_should_update_atime() to understand the MNT_RELATIME flag and to test against mtime / ctime accordingly. [akpm@osdl.org: cleanups] Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Valerie Henson <val_henson@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] relative atime	Valerie Henson
	Add "relatime" (relative atime) support. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. A corresponding patch against mount(8) is available at http://userweb.kernel.org/~akpm/mount-relative-atime.txt Signed-off-by: Valerie Henson <val_henson@linux.intel.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Karel Zak <kzak@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] touch_atime() cleanup	Andrew Morton
	Simplify touch_atime() layout. Cc: Valerie Henson <val_henson@linux.intel.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13	[PATCH] constify pipe_buf_operations	Eric Dumazet
	- pipe/splice should use const pipe_buf_operations and file_operations - struct pipe_inode_info has an unused field "start" : get rid of it. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: Fix inotify maintainers entry Fix typo in new debug options. Jon needs a new shift key. fs: Convert kmalloc() + memset() to kzalloc() in fs/. configfs.h: Remove dead macro definitions. kconfig: Standardize "depends" -> "depends on" in Kconfig files e100: replace kmalloc with kcalloc um: replace kmalloc+memset with kzalloc fix typo in net/ipv4/ip_fragment.c include/linux/compiler.h: reject gcc 3 < gcc 3.2 Kconfig: fix spelling error in config KALLSYMS help text Remove duplicate "have to" in comment Fix small typo in drivers/serial/icom.c Use consistent casing in help message EXT{2,3,4}_FS: remove outdated part of the help text
2006-12-12	fs: Convert kmalloc() + memset() to kzalloc() in fs/.	Robert P. J. Day
	Convert the single available instance of kmalloc() + memset() to kzalloc() in the fs/ directory. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-12	kconfig: Standardize "depends" -> "depends on" in Kconfig files	Robert P. J. Day
	Standardize the miniscule percentage of occurrences of "depends" in Kconfig files to "depends on", and update kconfig-language.txt to reflect that. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-12	Merge branch 'upstream-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: [patch 3/3] OCFS2 Configurable timeouts - Protocol changes [patch 2/3] OCFS2 Configurable timeouts [patch 1/3] OCFS2 - Expose struct o2nm_cluster ocfs2: Synchronize feature incompat flags in ocfs2_fs.h ocfs2: update mount option documentation ocfs2: local mounts
2006-12-12	EXT{2,3,4}_FS: remove outdated part of the help text	Jan Engelhardt
	Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Acked-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-12-12	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6: JFS: Fix conflicting superblock flags
2006-12-11	[patch 3/3] OCFS2 Configurable timeouts - Protocol changes	Andrew Beekhof
	Modify the OCFS2 handshake to ensure essential timeouts are configured identically on all nodes. Only allow changes when there are no connected peers Improves the logic in o2net_advance_rx() which broke now that sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg) Included is the field for userspace-heartbeat timeout to avoid the need for further protocol changes. Uses a global spinlock to ensure the decisions to update configfs entries are made on the correct value. The region covered by the spinlock when incrementing the counter is much larger as this is the more critical case. Small cleanup contributed by Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Beekhof <abeekhof@suse.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-11	Make SLES9 "get_kernel_version" work on the kernel binary again	Linus Torvalds
	As reported by Andy Whitcroft, at least the SLES9 initrd build process depends on getting the kernel version from the kernel binary. It does that by simply trawling the binary and looking for the signature of the "linux_banner" string (the string "Linux version " to be exact. Which is really broken in itself, but whatever..) That got broken when the string was changed to allow /proc/version to change the UTS release information dynamically, and "get_kernel_version" thus returned "%s" (see commit a2ee8649ba6d71416712e798276bf7c40b64e6e5: "[PATCH] Fix linux banner utsname information"). This just restores "linux_banner" as a static string, which should fix the version finding. And /proc/version simply uses a different string. To avoid wasting even that miniscule amount of memory, the early boot string should really be marked __initdata, but that just causes the same bug in SLES9 to re-appear, since it will then find other occurrences of "Linux version " first. Cc: Andy Whitcroft <apw@shadowen.org> Acked-by: Herbert Poetzl <herbert@13thfloor.at> Cc: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@osdl.org> Cc: Steve Fox <drfickle@us.ibm.com> Acked-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] user of the jiffies rounding code: JBD	Arjan van de Ven
	This patch introduces a user: of the round_jiffies() function; the "5 second" ext3/jbd wakeup. While "every 5 seconds" doesn't sound as a problem, there can be many of these (and these timers do add up over all the kernel). The "5 second" wakeup isn't really timing sensitive; in addition even with rounding it'll still happen every 5 seconds (with the exception of the very first time, which is likely to be rounded up to somewhere closer to 6 seconds) Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] fdtable: Implement new pagesize-based fdtable allocator	Vadim Lobanov
	This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] fdtable: Remove the free_files field	Vadim Lobanov
	An fdtable can either be embedded inside a files_struct or standalone (after being expanded). When an fdtable is being discarded after all RCU references to it have expired, we must either free it directly, in the standalone case, or free the files_struct it is contained within, in the embedded case. Currently the free_files field controls this behavior, but we can get rid of it entirely, as all the necessary information is already recorded. We can distinguish embedded and standalone fdtables using max_fds, and if it is embedded we can divine the relevant files_struct using container_of(). Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] fdtable: Make fdarray and fdsets equal in size	Vadim Lobanov
	Currently, each fdtable supports three dynamically-sized arrays of data: the fdarray and two fdsets. The code allows the number of fds supported by the fdarray (fdtable->max_fds) to differ from the number of fds supported by each of the fdsets (fdtable->max_fdset). In practice, it is wasteful for these two sizes to differ: whenever we hit a limit on the smaller-capacity structure, we will reallocate the entire fdtable and all the dynamic arrays within it, so any delta in the memory used by the larger-capacity structure will never be touched at all. Rather than hogging this excess, we shouldn't even allocate it in the first place, and keep the capacities of the fdarray and the fdsets equal. This patch removes fdtable->max_fdset. As an added bonus, most of the supporting code becomes simpler. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] dio: lock refcount operations	Zach Brown
	The wait_for_more_bios() function name was poorly chosen. While looking to clean it up it I noticed that the dio struct refcounting between the bio completion and dio submission paths was racey. The bio submission path was simply freeing the dio struct if atomic_dec_and_test() indicated that it dropped the final reference. The aio bio completion path was dereferencing its dio struct pointer after dropping its reference based on the remaining number of references. These two paths could race and result in the aio bio completion path dereferencing a freed dio, though this was not observed in the wild. This moves the refcount under the bio lock so that bio completion can drop its reference and decide to wake all in one atomic step. Once testing and waking is locked dio_await_one() can test its sleeping condition and mark itself uninterruptible under the lock. It gets simpler and wait_for_more_bios() disappears. The addition of the interrupt masking spin lock acquiry in dio_bio_submit() looks alarming. This lock acquiry existed in that path before the recent dio completion patch set. We shouldn't expect significant performance regression from returning to the behaviour that existed before the completion clean up work. This passed 4k block ext3 O_DIRECT fsx and aio-stress on an SMP machine. Signed-off-by: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Suparna Bhattacharya <suparna@in.ibm.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: <xfs-masters@oss.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED	Zach Brown
	The only time it is safe to call aio_complete() is when the ->ki_retry function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has historically done this by relying on its caller to translate positive return codes into -EIOCBQUEUED for the aio case. It did this by trying to keep conditionals in sync. direct_io_worker() knew when finished_one_bio() was going to call aio_complete(). It would reverse the test and wait and free the dio in the cases it thought that finished_one_bio() wasn't going to. Not surprisingly, it ended up getting it wrong. 'ret' could be a negative errno from the submission path but it failed to communicate this to finished_one_bio(). direct_io_worker() would return < 0, it's callers wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the future finished_one_bio()'s tests wouldn't reflect this and aio_complete() would be called for a second time which can manifest as an oops. The previous cleanups have whittled the sync and async completion paths down to the point where we can collapse them and clearly reassert the invariant that we must only call aio_complete() after returning -EIOCBQUEUED. direct_io_worker() will only return -EIOCBQUEUED when it is not the last to drop the dio refcount and the aio bio completion path will only call aio_complete() when it is the last to drop the dio refcount. direct_io_worker() can ensure that it is the last to drop the reference count by waiting for bios to drain. It does this for sync ops, of course, and for partial dio writes that must fall back to buffered and for aio ops that saw errors during submission. This means that operations that end up waiting, even if they were issued as aio ops, will not call aio_complete() from dio. Instead we return the return code of the operation and let the aio core call aio_complete(). This is purposely done to fix a bug where AIO DIO file extensions would call aio_complete() before their callers have a chance to update i_size. Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers no longer have to translate for it. XFS needs to be careful not to free resources that will be used during AIO completion if -EIOCBQUEUED is returned. We maintain the previous behaviour of trying to write fs metadata for O_SYNC aio+dio writes. Signed-off-by: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Suparna Bhattacharya <suparna@in.ibm.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: <xfs-masters@oss.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>