Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2008-12-10	KSYM_SYMBOL_LEN fixes	Hugh Dickins
	Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked to my 966c8c12dc9e77f931e2281ba25d2f0244b06949 sprint_symbol(): use less stack exposing a bug in slub's list_locations() - kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was beyond the end of page provided. The 100 slop which list_locations() allows at end of page looks roughly enough for all the other stuff it might print after the symbol before it checks again: break out KSYM_SYMBOL_LEN earlier than before. Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies them. [akpm@linux-foundation.org: ftrace.h needs module.h] Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc Miles Lane <miles.lane@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10	inotify: fix IN_ONESHOT unmount event watcher	Dmitri Monakhov
	On umount two event will be dispatched to watcher: 1: inotify_dev_queue_event(.., IN_UNMOUNT,..) 2: remove_watch(watch, dev) ->inotify_dev_queue_event(.., IN_IGNORED, ..) But if watcher has IN_ONESHOT bit set then the watcher will be released inside first event. Which result in accessing invalid object later. IMHO it is not pure regression. This bug wasn't triggered while initial inotify interface testing phase because of another bug in IN_ONESHOT handling logic :) commit ac74c00e499ed276a965e5b5600667d5dc04a84a Author: Ulisses Furquim <ulissesf@gmail.com> Date: Fri Feb 8 04:18:16 2008 -0800 inotify: fix check for one-shot watches before destroying them As the IN_ONESHOT bit is never set when an event is sent we must check it in the watch's mask and not in the event's mask. TESTCASE: mkdir mnt mount -ttmpfs none mnt mkdir mnt/d ./inotify mnt/d& umount mnt ## << lockup or crash here TESTSOURCE: /* gcc -oinotify inotify.c / #include <stdio.h> #include <stdlib.h> #include <sys/inotify.h> int main(int argc, char argv) { char buf[1024]; struct inotify_event ie; char p; int i; ssize_t l; p = argv[1]; i = inotify_init(); inotify_add_watch(i, p, ~0); l = read(i, buf, sizeof(buf)); printf("read %d bytes\n", l); ie = (struct inotify_event ) buf; printf("event mask: %d\n", ie->mask); return 0; } Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Cc: John McCutchan <ttb@tentacle.dhs.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Robert Love <rlove@google.com> Cc: Ulisses Furquim <ulissesf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10	pagemap: fix 32-bit pagemap regression	Matt Mackall
	The large pages fix from bcf8039ed45 broke 32-bit pagemap by pulling the pagemap entry code out into a function with the wrong return type. Pagemap entries are 64 bits on all systems and unsigned long is only 32 bits on 32-bit systems. Signed-off-by: Matt Mackall <mpm@selenic.com> Reported-by: Doug Graham <dgraham@nortel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@kernel.org> [2.6.26.x, 2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10	revert "percpu_counter: new function percpu_counter_sum_and_set"	Andrew Morton
	Revert commit e8ced39d5e8911c662d4d69a342b9d053eaaac4e Author: Mingming Cao <cmm@us.ibm.com> Date: Fri Jul 11 19:27:31 2008 -0400 percpu_counter: new function percpu_counter_sum_and_set As described in revert "percpu counter: clean up percpu_counter_sum_and_set()" the new percpu_counter_sum_and_set() is racy against updates to the cpu-local accumulators on other CPUs. Revert that change. This means that ext4 will be slow again. But correct. Reported-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10	revert "percpu counter: clean up percpu_counter_sum_and_set()"	Andrew Morton
	Revert commit 1f7c14c62ce63805f9574664a6c6de3633d4a354 Author: Mingming Cao <cmm@us.ibm.com> Date: Thu Oct 9 12:50:59 2008 -0400 percpu counter: clean up percpu_counter_sum_and_set() Before this patch we had the following: percpu_counter_sum(): return the percpu_counter's value percpu_counter_sum_and_set(): return the percpu_counter's value, copying that value into the central value and zeroing the per-cpu counters before returning. After this patch, percpu_counter_sum_and_set() has gone, and percpu_counter_sum() gets the old percpu_counter_sum_and_set() functionality. Problem is, as Eric points out, the old percpu_counter_sum_and_set() functionality was racy and wrong. It zeroes out counters on "other" cpus, without holding any locks which will prevent races agaist updates from those other CPUS. This patch reverts 1f7c14c62ce63805f9574664a6c6de3633d4a354. This means that percpu_counter_sum_and_set() still has the race, but percpu_counter_sum() does not. Note that this is not a simple revert - ext4 has since started using percpu_counter_sum() for its dirty_blocks counter as well. Note that this revert patch changes percpu_counter_sum() semantics. Before the patch, a call to percpu_counter_sum() will bring the counter's central counter mostly up-to-date, so a following percpu_counter_read() will return a close value. After this patch, a call to percpu_counter_sum() will leave the counter's central accumulator unaltered, so a subsequent call to percpu_counter_read() can now return a significantly inaccurate result. If there is any code in the tree which was introduced after e8ced39d5e8911c662d4d69a342b9d053eaaac4e was merged, and which depends upon the new percpu_counter_sum() semantics, that code will break. Reported-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-10	Btrfs: Delete csum items when freeing extents	Chris Mason
	This finishes off the new checksumming code by removing csum items for extents that are no longer in use. The trick is doing it without racing because a single csum item may hold csums for more than one extent. Extra checks are added to btrfs_csum_file_blocks to make sure that we are using the correct csum item after dropping locks. A new btrfs_split_item is added to split a single csum item so it can be split without dropping the leaf lock. This is used to remove csum bytes from the middle of an item. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-09	tracehook: exec double-reporting fix	Roland McGrath
	The patch 6341c39 "tracehook: exec" introduced a small regression in 2.6.27 regarding binfmt_misc exec event reporting. Since the reporting is now done in the common search_binary_handler() function, an exec of a misc binary will result in two (or possibly multiple) exec events being reported, instead of just a single one, because the misc handler contains a recursive call to search_binary_handler. To add to the confusion, if PTRACE_O_TRACEEXEC is not active, the multiple SIGTRAP signals will in fact cause only a single ptrace intercept, as the signals are not queued. However, if PTRACE_O_TRACEEXEC is on, the debugger will actually see multiple ptrace intercepts (PTRACE_EVENT_EXEC). The test program included below demonstrates the problem. This change fixes the bug by calling tracehook_report_exec() only in the outermost search_binary_handler() call (bprm->recursion_depth == 0). The additional change to restore bprm->recursion_depth after each binfmt load_binary call is actually superfluous for this bug, since we test the value saved on entry to search_binary_handler(). But it keeps the use of of the depth count to its most obvious expected meaning. Depending on what binfmt handlers do in certain cases, there could have been false-positive tests for recursion limits before this change. /* Test program using PTRACE_O_TRACEEXEC. This forks and exec's the first argument with the rest of the arguments, while ptrace'ing. It expects to see one PTRACE_EVENT_EXEC stop and then a successful exit, with no other signals or events in between. Test for kernel doing two PTRACE_EVENT_EXEC stops for a binfmt_misc exec: $ gcc -g traceexec.c -o traceexec $ sudo sh -c 'echo :test:M::foobar::/bin/cat: > /proc/sys/fs/binfmt_misc/register' $ echo 'foobar test' > ./foobar $ chmod +x ./foobar $ ./traceexec ./foobar; echo $? ==> good <== foobar test 0 $ ==> bad <== foobar test unexpected status 0x4057f != 0 3 $ / #include <stdio.h> #include <sys/types.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <unistd.h> #include <signal.h> #include <stdlib.h> static void wait_for (pid_t child, int expect) { int status; pid_t p = wait (&status); if (p != child) { perror ("wait"); exit (2); } if (status != expect) { fprintf (stderr, "unexpected status %#x != %#x\n", status, expect); exit (3); } } int main (int argc, char argv) { pid_t child = fork (); if (child < 0) { perror ("fork"); return 127; } else if (child == 0) { ptrace (PTRACE_TRACEME); raise (SIGUSR1); execv (argv[1], &argv[1]); perror ("execve"); _exit (127); } wait_for (child, W_STOPCODE (SIGUSR1)); if (ptrace (PTRACE_SETOPTIONS, child, 0L, (void ) (long) PTRACE_O_TRACEEXEC) != 0) { perror ("PTRACE_SETOPTIONS"); return 4; } if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0) { perror ("PTRACE_CONT"); return 5; } wait_for (child, W_STOPCODE (SIGTRAP \| (PTRACE_EVENT_EXEC << 8))); if (ptrace (PTRACE_CONT, child, 0L, 0L) != 0) { perror ("PTRACE_CONT"); return 6; } wait_for (child, W_EXITCODE (0, 0)); return 0; } Reported-by: Arnd Bergmann <arnd@arndb.de> CC: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Roland McGrath <roland@redhat.com>
2008-12-08	EXPORTFS: handle NULL returns from fh_to_dentry()/fh_to_parent()	J. Bruce Fields
	While 440037287c5 "[PATCH] switch all filesystems over to d_obtain_alias" removed some cases where fh_to_dentry() and fh_to_parent() could return NULL, there are still a few NULL returns left in individual filesystems. Thus it was a mistake for that commit to remove the handling of NULL returns in the callers. Revert those parts of 440037287c5 which removed the NULL handling. (We could, alternatively, modify all implementations to return -ESTALE instead of NULL, but that proves to require fixing a number of filesystems, and in some cases it's arguably more natural to return NULL.) Thanks to David for original patch and Linus, Christoph, and Hugh for review. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-08	Btrfs: Fix compressed checksum fsync log copies	Chris Mason
	The fsync logging code makes sure to onl copy the relevant checksum for each extent based on the file extent pointers it finds. But for compressed extents, it needs to copy the checksum for the entire extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-08	Btrfs: Add inode sequence number for NFS and reserved space in a few structs	Chris Mason
	This adds a sequence number to the btrfs inode that is increased on every update. NFS will be able to use that to detect when an inode has changed, without relying on inaccurate time fields. While we're here, this also: Puts reserved space into the super block and inode Adds a log root transid to the super so we can pick the newest super based on the fsync log as well as the main transaction ID. For now the log root transid is always zero, but that'll get fixed. Adds a starting offset to the dev_item. This will let us do better alignment calculations if we know the start of a partition on the disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-08	Btrfs: Use map_private_extent_buffer during generic_bin_search	Chris Mason
	It is possible that generic_bin_search will be called on a tree block that has not been locked. This happens because cache_block_block skips locking on the tree blocks. Since the tree block isn't locked, we aren't allowed to change the extent_buffer->map_token field. Using map_private_extent_buffer avoids any changes to the internal extent buffer fields. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-08	Btrfs: superblock duplication	Yan Zheng
	This patch implements superblock duplication. Superblocks are stored at offset 16K, 64M and 256G on every devices. Spaces used by superblocks are preserved by the allocator, which uses a reverse mapping function to find the logical addresses that correspond to superblocks. Thank you, Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
2008-12-08	Btrfs: move data checksumming into a dedicated tree	Chris Mason
	Btrfs stores checksums for each data block. Until now, they have been stored in the subvolume trees, indexed by the inode that is referencing the data block. This means that when we read the inode, we've probably read in at least some checksums as well. But, this has a few problems: * The checksums are indexed by logical offset in the file. When compression is on, this means we have to do the expensive checksumming on the uncompressed data. It would be faster if we could checksum the compressed data instead. * If we implement encryption, we'll be checksumming the plain text and storing that on disk. This is significantly less secure. * For either compression or encryption, we have to get the plain text back before we can verify the checksum as correct. This makes the raid layer balancing and extent moving much more expensive. * It makes the front end caching code more complex, as we have touch the subvolume and inodes as we cache extents. * There is potentitally one copy of the checksum in each subvolume referencing an extent. The solution used here is to store the extent checksums in a dedicated tree. This allows us to index the checksums by phyiscal extent start and length. It means: * The checksum is against the data stored on disk, after any compression or encryption is done. * The checksum is stored in a central location, and can be verified without following back references, or reading inodes. This makes compression significantly faster by reducing the amount of data that needs to be checksummed. It will also allow much faster raid management code in general. The checksums are indexed by a key with a fixed objectid (a magic value in ctree.h) and offset set to the starting byte of the extent. This allows us to copy the checksum items into the fsync log tree directly (or any other tree), without having to invent a second format for them. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-05	Fix a race condition in FASYNC handling	Jonathan Corbet
	Changeset a238b790d5f99c7832f9b73ac8847025815b85f7 (Call fasync() functions without the BKL) introduced a race which could leave file->f_flags in a state inconsistent with what the underlying driver/filesystem believes. Revert that change, and also fix the same races in ioctl_fioasync() and ioctl_fionbio(). This is a minimal, short-term fix; the real fix will not involve the BKL. Reported-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-04	Merge branch 'for-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: [PATCH] fix bogus argument of blkdev_put() in pktcdvd [PATCH 2/2] documnt FMODE_ constants [PATCH 1/2] kill FMODE_NDELAY_NOW [PATCH] clean up blkdev_get a little bit [PATCH] Fix block dev compat ioctl handling [PATCH] kill obsolete temporary comment in swsusp_close()
2008-12-05	[XFS] Fix hang after disallowed rename across directory quota domains	Dave Chinner
	When project quota is active and is being used for directory tree quota control, we disallow rename outside the current directory tree. This requires a check to be made after all the inodes involved in the rename are locked. We fail to unlock the inodes correctly if we disallow the rename when the target is outside the current directory tree. This results in a hang on the next access to the inodes involved in failed rename. Reported-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Dave Chinner <david@fromorbit.com> Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-12-04	[PATCH 1/2] kill FMODE_NDELAY_NOW	Christoph Hellwig
	Update FMODE_NDELAY before each ioctl call so that we can kill the magic FMODE_NDELAY_NOW. It would be even better to do this directly in setfl(), but for that we'd need to have FMODE_NDELAY for all files, not just block special files. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-04	[PATCH] clean up blkdev_get a little bit	Christoph Hellwig
	The way the bd_claim for the FMODE_EXCL case is implemented is rather confusing. Clean it up to the most logical style. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-03	Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux	Linus Torvalds
	* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux: NLM: client-side nlm_lookup_host() should avoid matching on srcaddr nfsd: use of unitialized list head on error exit in nfs4recover.c Add a reference to sunrpc in svc_addsock nfsd: clean up grace period on early exit
2008-12-02	Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6	Linus Torvalds
	* 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: pre-allocate bulk-read buffer UBIFS: do not allocate too much UBIFS: do not print scary memory allocation warnings UBIFS: allow for gaps when dirtying the LPT UBIFS: fix compilation warnings MAINTAINERS: change UBI/UBIFS git tree URLs UBIFS: endian handling fixes and annotations UBIFS: remove printk
2008-12-02	Btrfs: Fix sparse endian warnings in struct-funcs.c	Chris Mason
	The btrfs macros to access individual struct members on disk were sending the same variable to functions that expected different types of endianness. This fix explicitly creates a variable of the correct type instead of abusing a single variable for mixed purposes. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-02	Btrfs: rev the disk format for the inode compat and csum selection changes	Chris Mason
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-02	Btrfs: delete unused function: btrfs_invalidate_dcache_root	Chris Mason
	Snapshot and subvolume creation no longer need this helper. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-02	Btrfs: add support for multiple csum algorithms	Josef Bacik
	This patch gives us the space we will need in order to have different csum algorithims at some point in the future. We save the csum algorithim type in the superblock, and use those instead of define's. Signed-off-by: Josef Bacik <jbacik@redhat.com>
2008-12-02	Btrfs: fix panic on error during mount	Josef Bacik
	This needs to be applied on top of my previous patches, but is needed for more than just my new stuff. We're going to the wrong label when we have an error, we try to stop the workers, but they are started below all of this code. This fixes it so we go to the right error label and not panic when we fail one of these cases. Signed-off-by: Josef Bacik <jbacik@redhat.com>
2008-12-02	Btrfs: add support for compat flags to btrfs	Josef Bacik
	This adds the necessary disk format for handling compatibility flags in the future to handle disk format changes. We have a compat_flags, compat_ro_flags and incompat_flags set for the super block. Compat flags will be to hold the features that are compatible with older versions of btrfs, compat_ro flags have features that are compatible with older versions of btrfs if the fs is mounted read only, and incompat_flags has features that are incompatible with older versions of btrfs. This also axes the compat_flags field for the inode and just makes the flags field a 64bit field, and changes the root item flags field to 64bit. Signed-off-by: Josef Bacik <jbacik@redhat.com>
2008-12-02	Btrfs: btrfs: pass void __user * to btrfs_ioctl_clone_range	Christoph Hellwig
	Cleans the code up a little and also avoids a sparse warning due to the incorrect cast in the current version of the code. Signed-off-by: Christoph Hellwig <hch@lst.de>
2008-12-02	Btrfs: clean up btrfs_ioctl a little bit	Christoph Hellwig
	Provide a void __user *argp pointer so that we can avoid duplicating the cast for various sub-command calls. Signed-off-by: Christoph Hellwig <hch@lst.de>
2008-12-02	Btrfs: corret fmode_t annotations	Christoph Hellwig
	Make sure to propagate fmode_t properly and use the right constants for it. Signed-off-by: Christoph Hellwig <hch@lst.de>
2008-12-02	Btrfs: fix shadowed variable declarations	Christoph Hellwig
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-02	Btrfs: make things static and include the right headers	Christoph Hellwig
	Shut up various sparse warnings about symbols that should be either static or have their declarations in scope. Signed-off-by: Christoph Hellwig <hch@lst.de>
2008-12-02	Btrfs: remove unneeded btrfs_start_delalloc_inodes call	Sage Weil
	It is called by btrfs_sync_fs. Signed-off-by: Sage Weil <sage@newdream.net>
2008-12-02	Btrfs: remove unneeded total_trans	Sage Weil
	Remove unneeded debugging sanity check. It gets corrupted anyway when multiple btrfs file systems are mounted, throwing bad warnings along the way. Signed-off-by: Sage Weil <sage@newdream.net>
2008-12-02	Btrfs: sparse lock verification annotations for wait_on_state	Christoph Hellwig
	Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-12-01	ntfs: don't fool kernel-doc	Randy Dunlap
	kernel-doc handles macros now (it has for quite some time), so change the ntfs_debug() macro's kernel-doc to be just before the macro instead of before a phony function prototype. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01	epoll: introduce resource usage limits	Davide Libenzi
	It has been thought that the per-user file descriptors limit would also limit the resources that a normal user can request via the epoll interface. Vegard Nossum reported a very simple program (a modified version attached) that can make a normal user to request a pretty large amount of kernel memory, well within the its maximum number of fds. To solve such problem, default limits are now imposed, and /proc based configuration has been introduced. A new directory has been created, named /proc/sys/fs/epoll/ and inside there, there are two configuration points: max_user_instances = Maximum number of devices - per user max_user_watches = Maximum number of "watched" fds - per user The current default for "max_user_watches" limits the memory used by epoll to store "watches", to 1/32 of the amount of the low RAM. As example, a 256MB 32bit machine, will have "max_user_watches" set to roughly 90000. That should be enough to not break existing heavy epoll users. The default value for "max_user_instances" is set to 128, that should be enough too. This also changes the userspace, because a new error code can now come out from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already listed, so that should be ok. [akpm@linux-foundation.org: use get_current_user()] Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <stable@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Reported-by: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01	Btrfs: Fix cow semantic in run_delalloc_nocow()	Liu Hui
	The file preallocation code reversed the logic to force nodatacow. This fixes it.
2008-12-01	ocfs2: fix regression in ocfs2_read_blocks_sync()	Mark Fasheh
	We're panicing in ocfs2_read_blocks_sync() if a jbd-managed buffer is seen. At first glance, this seems ok but in reality it can happen. My test case was to just run 'exorcist'. A struct inode is being pushed out of memory but is then re-read at a later time, before the buffer has been checkpointed by jbd. This causes a BUG to be hit in ocfs2_read_blocks_sync(). Reviewed-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01	ocfs2: fix return value set in init_dlmfs_fs()	Coly Li
	In init_dlmfs_fs(), if calling kmem_cache_create() failed, the code will use return value from calling bdi_init(). The correct behavior should be set status as -ENOMEM before going to "bail:". Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01	ocfs2: fix wake_up in unlock_ast	David Teigland
	In ocfs2_unlock_ast(), call wake_up() on lockres before releasing the spin lock on it. As soon as the spin lock is released, the lockres can be freed. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01	ocfs2: initialize stack_user lvbptr	David Teigland
	The locking_state dump, ocfs2_dlm_seq_show, reads the lvb on locks where it has not yet been initialized by a lock call. Signed-off-by: David Teigland <teigland@redhat.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-12-01	ocfs2: comments typo fix	Coly Li
	This patch fixes two typos in comments of ocfs2. Signed-off-by: Coly Li <coyli@suse.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-11-30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] fix regression in cifs_write_begin/cifs_write_end
2008-11-27	udf: Fix BUG_ON() in destroy_inode()	Jan Kara
	udf_clear_inode() can leave behind buffers on mapping's i_private list (when we truncated preallocation). Call invalidate_inode_buffers() so that the list is properly cleaned-up before we return from udf_clear_inode(). This is ugly and suggest that we should cleanup preallocation earlier than in clear_inode() but currently there's no such call available since drop_inode() is called under inode lock and thus is unusable for disk operations. Signed-off-by: Jan Kara <jack@suse.cz>
2008-11-26	[CIFS] fix regression in cifs_write_begin/cifs_write_end	Jeff Layton
	The conversion to write_begin/write_end interfaces had a bug where we were passing a bad parameter to cifs_readpage_worker. Rather than passing the page offset of the start of the write, we needed to pass the offset of the beginning of the page. This was reliably showing up as data corruption in the fsx-linux test from LTP. It also became evident that this code was occasionally doing unnecessary read calls. Optimize those away by using the PG_checked flag to indicate that the unwritten part of the page has been initialized. CC: Nick Piggin <npiggin@suse.de> Acked-by: Dave Kleikamp <shaggy@us.ibm.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2008-11-24	NLM: client-side nlm_lookup_host() should avoid matching on srcaddr	Chuck Lever
	Since commit c98451bd, the loop in nlm_lookup_host() unconditionally compares the host's h_srcaddr field to the incoming source address. For client-side nlm_host entries, both are always AF_UNSPEC, so this check is unnecessary. Since commit 781b61a6, which added support for AF_INET6 addresses to nlm_cmp_addr(), nlm_cmp_addr() now returns FALSE for AF_UNSPEC addresses, which causes nlm_lookup_host() to create a fresh nlm_host entry every time it is called on the client. These extra entries will eventually expire once the server is unmounted, so the impact of this regression, introduced with lockd IPv6 support in 2.6.28, should be minor. We could fix this by adding an arm in nlm_cmp_addr() for AF_UNSPEC addresses, but really, nlm_lookup_host() shouldn't be matching on the srcaddr field for client-side nlm_host lookups. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-11-24	nfsd: use of unitialized list head on error exit in nfs4recover.c	J. Bruce Fields
	Thanks to Matthew Dodd for this bug report: A file label issue while running SELinux in MLS mode provoked the following bug, which is a result of use before init on a 'struct list_head'. In nfsd4_list_rec_dir() if the call to dentry_open() fails the 'goto out' skips INIT_LIST_HEAD() which results in the normally improbable case where list_entry() returns NULL. Trace follows. NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory SELinux: Context unconfined_t:object_r:var_lib_nfs_t:s0 is not valid (left unmapped). type=1400 audit(1227298063.609:282): avc: denied { read } for pid=1890 comm="rpc.nfsd" name="v4recovery" dev=dm-0 ino=148726 scontext=system_u:system_r:nfsd_t:s0-s15:c0.c1023 tcontext=system_u:object_r:unlabeled_t:s15:c0.c1023 tclass=dir BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<c050894e>] list_del+0x6/0x60 pde = 0d9ce067 pte = 00000000 Oops: 0000 [#1] SMP Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc ipv6 dm_multipath scsi_dh ppdev parport_pc sg parport floppy ata_piix pata_acpi ata_generic libata pcnet32 i2c_piix4 mii pcspkr i2c_core dm_snapshot dm_zero dm_mirror dm_log dm_mod BusLogic sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 1890, comm: rpc.nfsd Not tainted (2.6.27.5-37.fc9.i686 #1) EIP: 0060:[<c050894e>] EFLAGS: 00010217 CPU: 0 EIP is at list_del+0x6/0x60 EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: cd99e480 ESI: cf9caed8 EDI: 00000000 EBP: cf9caebc ESP: cf9caeb8 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rpc.nfsd (pid: 1890, ti=cf9ca000 task=cf4de580 task.ti=cf9ca000) Stack: 00000000 cf9caef0 d0a9f139 c0496d04 d0a9f217 fffffff3 00000000 00000000 00000000 00000000 cf32b220 00000000 00000008 00000801 cf9caefc d0a9f193 00000000 cf9caf08 d0a9b6ea 00000000 cf9caf1c d0a874f2 cf9c3004 00000008 Call Trace: [<d0a9f139>] ? nfsd4_list_rec_dir+0xf3/0x13a [nfsd] [<c0496d04>] ? do_path_lookup+0x12d/0x175 [<d0a9f217>] ? load_recdir+0x0/0x26 [nfsd] [<d0a9f193>] ? nfsd4_recdir_load+0x13/0x34 [nfsd] [<d0a9b6ea>] ? nfs4_state_start+0x2a/0xc5 [nfsd] [<d0a874f2>] ? nfsd_svc+0x51/0xff [nfsd] [<d0a87f2d>] ? write_svc+0x0/0x1e [nfsd] [<d0a87f48>] ? write_svc+0x1b/0x1e [nfsd] [<d0a87854>] ? nfsctl_transaction_write+0x3a/0x61 [nfsd] [<c04b6a4e>] ? sys_nfsservctl+0x116/0x154 [<c04975c1>] ? putname+0x24/0x2f [<c04975c1>] ? putname+0x24/0x2f [<c048d49f>] ? do_sys_open+0xad/0xb7 [<c048d337>] ? filp_close+0x50/0x5a [<c048d4eb>] ? sys_open+0x1e/0x26 [<c0403cca>] ? syscall_call+0x7/0xb [<c064007b>] ? init_cyrix+0x185/0x490 ======================= Code: 75 e1 8b 53 08 8d 4b 04 8d 46 04 e8 75 00 00 00 8b 53 10 8d 4b 0c 8d 46 0c e8 67 00 00 00 5b 5e 5f 5d c3 90 90 55 89 e5 53 89 c3 <8b> 40 04 8b 00 39 d8 74 16 50 53 68 3e d6 6f c0 6a 30 68 78 d6 EIP: [<c050894e>] list_del+0x6/0x60 SS:ESP 0068:cf9caeb8 ---[ end trace a89c4ad091c4ad53 ]--- Cc: Matthew N. Dodd <Matthew.Dodd@spart.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-11-24	nfsd: clean up grace period on early exit	J. Bruce Fields
	If nfsd was shut down before the grace period ended, we could end up with a freed object still on grace_list. Thanks to Jeff Moyer for reporting the resulting list corruption warnings. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Tested-by: Jeff Moyer <jmoyer@redhat.com>
2008-11-21	UBIFS: pre-allocate bulk-read buffer	Artem Bityutskiy
	To avoid memory allocation failure during bulk-read, pre-allocate a bulk-read buffer, so that if there is only one bulk-reader at a time, it would just use the pre-allocated buffer and would not do any memory allocation. However, if there are more than 1 bulk- reader, then only one reader would use the pre-allocated buffer, while the other reader would allocate the buffer for itself. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2008-11-21	UBIFS: do not allocate too much	Artem Bityutskiy
	Bulk-read allocates 128KiB or more using kmalloc. The allocation starts failing often when the memory gets fragmented. UBIFS still works fine in this case, because it falls-back to standard (non-optimized) read method, though. This patch teaches bulk-read to allocate exactly the amount of memory it needs, instead of allocating 128KiB every time. This patch is also a preparation to the further fix where we'll have a pre-allocated bulk-read buffer as well. For example, now the @bu object is prepared in 'ubifs_bulk_read()', so we could path either pre-allocated or allocated information to 'ubifs_do_bulk_read()' later. Or teaching 'ubifs_do_bulk_read()' not to allocate 'bu->buf' if it is already there. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>