aboutsummaryrefslogtreecommitdiff
path: root/fs/ext3
AgeCommit message (Collapse)Author
2009-01-04fs: symlink write_begin allocation context fixNick Piggin
With the write_begin/write_end aops, page_symlink was broken because it could no longer pass a GFP_NOFS type mask into the point where the allocations happened. They are done in write_begin, which would always assume that the filesystem can be entered from reclaim. This bug could cause filesystem deadlocks. The funny thing with having a gfp_t mask there is that it doesn't really allow the caller to arbitrarily tinker with the context in which it can be called. It couldn't ever be GFP_ATOMIC, for example, because it needs to take the page lock. The only thing any callers care about is __GFP_FS anyway, so turn that into a single flag. Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on this flag in their write_begin function. Change __grab_cache_page to accept a nofs argument as well, to honour that flag (while we're there, change the name to grab_cache_page_write_begin which is more instructive and does away with random leading underscores). This is really a more flexible way to go in the end anyway -- if a filesystem happens to want any extra allocations aside from the pagecache ones in ints write_begin function, it may now use GFP_KERNEL (rather than GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a random example). [kosaki.motohiro@jp.fujitsu.com: fix ubifs] [kosaki.motohiro@jp.fujitsu.com: fix fuse] Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Cleaned up the calling convention: just pass in the AOP flags untouched to the grab_cache_page_write_begin() function. That just simplifies everybody, and may even allow future expansion of the logic. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-31nfsd race fixes: ext3Al Viro
ext3 analog of the previous patch Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31ext3: ensure fast symlinks are NUL-terminatedDuane Griffin
Ensure fast symlink targets are NUL-terminated, even if corrupted on-disk. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Tweedie <sct@redhat.com> Cc: linux-ext4@vger.kernel.org Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-11-14Merge branch 'master' into nextJames Morris
Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14CRED: Wrap task credential accesses in the Ext3 filesystemDavid Howells
Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: adilger@sun.com Cc: linux-ext4@vger.kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2008-11-12ext3: Clean up outdated and incorrect comment for ext3_write_super()Theodore Tso
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06ext3: wait on all pending commits in ext3_sync_fsArthur Jones
In ext3_sync_fs, we only wait for a commit to finish if we started it, but there may be one already in progress which will not be synced. In the case of a data=ordered umount with pending long symlinks which are delayed due to a long list of other I/O on the backing block device, this causes the buffer associated with the long symlinks to not be moved to the inode dirty list in the second phase of fsync_super. Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are never written to the backing block device, causing long symlink corruption and exposing new or previously freed block data to userspace. This can be reproduced with a script created by Eric Sandeen <sandeen@redhat.com>: #!/bin/bash umount /mnt/test2 mount /dev/sdb4 /mnt/test2 rm -f /mnt/test2/* dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512 touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename /mnt/test2/link umount /mnt/test2 mount /dev/sdb4 /mnt/test2 ls /mnt/test2/ umount /mnt/test2 To ensure all commits are synced, we flush all journal commits now when sync_fs'ing ext3. Signed-off-by: Arthur Jones <ajones@riverbed.com> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> [2.6.everything] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-27ext3: fix a bug accessing freed memory in ext3_abortHidehiro Kawai
Vegard Nossum reported a bug which accesses freed memory (found via kmemcheck). When journal has been aborted, ext3_put_super() calls ext3_abort() after freeing the journal_t object, and then ext3_abort() accesses it. This patch fix it. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-10-25ext3: Fix duplicate entries returned from getdents() system callTheodore Ts'o
Fix a regression caused by commit 6a897cf4, "ext3: fix ext3_dx_readdir hash collision handling", where deleting files in a large directory (requiring more than one getdents system call), results in some filenames being returned twice. This was caused by a failure to update info->curr_hash and info->curr_minor_hash, so that if the directory had gotten modified since the last getdents() system call (as would be the case if the user is running "rm -r" or "git clean"), a directory entry would get returned twice to the userspace. This patch fixes the bug reported by Markus Trippelsdorf at: http://bugzilla.kernel.org/show_bug.cgi?id=11844 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
2008-10-23ext3 quota support: fix compile failureLinus Torvalds
This one was due to a merge error: we added a use of nd.path in commit 2d7c820e56ce83b23daee9eb5343730fb309418e ("ext3: add checks for errors from jbd"), and concurrently we got rid of 'nd' and used a naked 'path' in commit 8264613def2e5c4f12bc3167713090fd172e6055 ("[PATCH] switch quota_on-related stuff to kern_path()"). That all merged cleanly, but it didn't actually _work_. This should fix it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdevLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/bdev: (66 commits) [PATCH] kill the rest of struct file propagation in block ioctls [PATCH] get rid of struct file use in blkdev_ioctl() BLKBSZSET [PATCH] get rid of blkdev_locked_ioctl() [PATCH] get rid of blkdev_driver_ioctl() [PATCH] sanitize blkdev_get() and friends [PATCH] remember mode of reiserfs journal [PATCH] propagate mode through swsusp_close() [PATCH] propagate mode through open_bdev_excl/close_bdev_excl [PATCH] pass fmode_t to blkdev_put() [PATCH] kill the unused bsize on the send side of /dev/loop [PATCH] trim file propagation in block/compat_ioctl.c [PATCH] end of methods switch: remove the old ones [PATCH] switch sr [PATCH] switch sd [PATCH] switch ide-scsi [PATCH] switch tape_block [PATCH] switch dcssblk [PATCH] switch dasd [PATCH] switch mtd_blkdevs [PATCH] switch mmc ...
2008-10-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (46 commits) [PATCH] fs: add a sanity check in d_free [PATCH] i_version: remount support [patch] vfs: make security_inode_setattr() calling consistent [patch 1/3] FS_MBCACHE: don't needlessly make it built-in [PATCH] move executable checking into ->permission() [PATCH] fs/dcache.c: update comment of d_validate() [RFC PATCH] touch_mnt_namespace when the mount flags change [PATCH] reiserfs: add missing llseek method [PATCH] fix ->llseek for more directories [PATCH vfs-2.6 6/6] vfs: add LOOKUP_RENAME_TARGET intent [PATCH vfs-2.6 5/6] vfs: remove LOOKUP_PARENT from non LOOKUP_PARENT lookup [PATCH vfs-2.6 4/6] vfs: remove unnecessary fsnotify_d_instantiate() [PATCH vfs-2.6 3/6] vfs: add __d_instantiate() helper [PATCH vfs-2.6 2/6] vfs: add d_ancestor() [PATCH vfs-2.6 1/6] vfs: replace parent == dentry->d_parent by IS_ROOT() [PATCH] get rid of on-stack dentry in udf [PATCH 2/2] anondev: switch to IDA [PATCH 1/2] anondev: init IDR statically [JFFS2] Use d_splice_alias() not d_add() in jffs2_lookup() [PATCH] Optimise NFS readdir hack slightly. ...
2008-10-23ext3: add checks for errors from jbdHidehiro Kawai
If the journal has aborted due to a checkpointing failure, we have to keep the contents of the journal space. Otherwise, the filesystem will lose uncheckpointed metadata completely and become inconsistent. To avoid this, we need to keep needs_recovery flag if checkpoint has failed. With this patch, ext3_put_super() detects a checkpointing failure from the return value of journal_destroy(), then it invokes ext3_abort() to make the filesystem read only and keep needs_recovery flag. Errors from journal_flush() are also handled by this patch in some places. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Jan Kara <jack@ucw.cz> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-23[PATCH] get rid of on-stack fake dentry in ext3_get_parent()Al Viro
Better pass parent and qstr to ext3_find_entry() explicitly than use such kludges, especially since the stack footprint is nasty enough and we have every chance to be deep in call chain. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23[PATCH] switch all filesystems over to d_obtain_aliasChristoph Hellwig
Switch all users of d_alloc_anon to d_obtain_alias. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23[PATCH] switch quota_on-related stuff to kern_path()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-21[PATCH] pass fmode_t to blkdev_put()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-20fs/Kconfig: move ext2, ext3, ext4, JBD, JBD2 outAlexey Dobriyan
Use fs/*/Kconfig more, which is good because everything related to one filesystem is in one place and fs/Kconfig is quite fat. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20ext3: avoid printk floods in the face of directory corruptionEric Sandeen
A very large directory with many read failures (either due to storage problems, or due to invalid size & blocks from corruption) will generate a printk storm as the filesystem continues to try to read all the blocks. This flood of messages can tie up the box until it is complete - which may be a very long time, especially for very large corrupted values. This is fixed by only reporting the corruption once each time we try to read the directory. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20ext3: truncate block allocated on a failed ext3_write_beginAneesh Kumar K.V
For blocksize < pagesize we need to remove blocks that got allocated in block_write_begin() if we fail with ENOSPC for later blocks. block_write_begin() internally does this if it allocated page locally. This makes sure we don't have blocks outside inode.i_size during ENOSPC. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20ext3: fix ext3_dx_readdir hash collision handlingEugene Dashevsky
This fixes a bug where readdir() would return a directory entry twice if there was a hash collision in an hash tree indexed directory. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eugene Dashevsky <eugene@ibrix.com> Signed-off-by: Mike Snitzer <msnitzer@ibrix.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20ext3: add an option to control error handling on file dataHidehiro Kawai
If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount a ext3 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20ext3: fix ext3 block reservation early ENOSPC issueMingming Cao
We could run into ENOSPC error on ext3, even when there is free blocks on the filesystem. The problem is triggered in the case the goal block group has 0 free blocks , and the rest block groups are skipped due to the check of "free_blocks < windowsz/2". Current code could fall back to non reservation allocation to prevent early ENOSPC after examing all the block groups with reservation on , but this code was bypassed if the reservation window is turned off already, which is true in this case. This patch fixed two issues: 1) We don't need to turn off block reservation if the goal block group has 0 free blocks left and continue search for the rest of block groups. Current code the intention is to turn off the block reservation if the goal allocation group has a few (some) free blocks left (not enough for make the desired reservation window),to try to allocation in the goal block group, to get better locality. But if the goal blocks have 0 free blocks, it should leave the block reservation on, and continues search for the next block groups,rather than turn off block reservation completely. 2) we don't need to check the window size if the block reservation is off. The problem was originally found and fixed in ext4. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20ext3: don't try to resize if there are no reserved gdt blocks leftJosef Bacik
When trying to resize a ext3 fs and you run out of reserved gdt blocks, you get an error that doesn't actually tell you what went wrong, it just says that the gdb it picked is not correct, which is the case since you don't have any reserved gdt blocks left. This patch adds a check to make sure you have reserved gdt blocks to use, and if not prints out a more relevant error. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Cc: Andreas Dilger <adilger@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-13vfs: Use const for kernel parser tableSteven Whitehouse
This is a much better version of a previous patch to make the parser tables constant. Rather than changing the typedef, we put the "const" in all the various places where its required, allowing the __initconst exception for nfsroot which was the cause of the previous trouble. This was posted for review some time ago and I believe its been in -mm since then. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Alexander Viro <aviro@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-03generic block based fiemap implementationJosef Bacik
Any block based fs (this patch includes ext3) just has to declare its own fiemap() function and then call this generic function with its own get_block_t. This works well for block based filesystems that will map multiple contiguous blocks at one time, but will work for filesystems that only map one block at a time, you will just end up with an "extent" for each block. One gotcha is this will not play nicely where there is hole+data after the EOF. This function will assume its hit the end of the data as soon as it hits a hole after the EOF, so if there is any data past that it will not pick that up. AFAIK no block based fs does this anyway, but its in the comments of the function anyway just in case. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
2008-08-01[PATCH] fix races and leaks in vfs_quota_on() usersAl Viro
* new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the pathname resolution. * callers of vfs_quota_on() that do their own pathname resolution and checks based on it are switched to vfs_quota_on_path(); that way we avoid the races. * reiserfs leaked dentry/vfsmount references on several failure exits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-28vfs: pagecache usage optimization for pagesize!=blocksizeHisashi Hifumi
When we read some part of a file through pagecache, if there is a pagecache of corresponding index but this page is not uptodate, read IO is issued and this page will be uptodate. I think this is good for pagesize == blocksize environment but there is room for improvement on pagesize != blocksize environment. Because in this case a page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate. So I suggest that when all buffers which correspond to a part of a file that we want to read are uptodate, use this pagecache and copy data from this pagecache to user buffer even if a page is not uptodate. This can reduce read IO and improve system throughput. I wrote a benchmark program and got result number with this program. This benchmark do: 1: mount and open a test file. 2: create a 512MB file. 3: close a file and umount. 4: mount and again open a test file. 5: pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes). 6: measure time of preading randomly 100000 times on a test file. The result was: 2.6.26 330 sec 2.6.26-patched 226 sec Arch:i386 Filesystem:ext3 Blocksize:1024 bytes Memory: 1GB On ext3/4, a file is written through buffer/block. So random read/write mixed workloads or random read after random write workloads are optimized with this patch under pagesize != blocksize environment. This test result showed this. The benchmark program is as follows: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <stdlib.h> #include <string.h> #include <sys/mount.h> #define LEN 1024 #define LOOP 1024*512 /* 512MB */ main(void) { unsigned long i, offset, filesize; int fd; char buf[LEN]; time_t t1, t2; if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } memset(buf, 0, LEN); fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC); if (fd < 0) { perror("cannot open file\n"); exit(1); } for (i = 0; i < LOOP; i++) write(fd, buf, LEN); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } fd = open("/root/test1/testfile", O_RDWR); if (fd < 0) { perror("cannot open file\n"); exit(1); } filesize = LEN * LOOP; for (i = 0; i < 300000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pwrite(fd, buf, LEN, offset); } printf("start test\n"); time(&t1); for (i = 0; i < 100000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pread(fd, buf, LEN, offset); } time(&t2); printf("%ld sec\n", t2-t1); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } } Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@ucw.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26[PATCH] sanitize ->permission() prototypeAl Viro
* kill nameidata * argument; map the 3 bits in ->flags anybody cares about to new MAY_... ones and pass with the mask. * kill redundant gfs2_iop_permission() * sanitize ecryptfs_permission() * fix remaining places where ->permission() instances might barf on new MAY_... found in mask. The obvious next target in that direction is permission(9) folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26SL*B: drop kmem cache argument from constructorAlexey Dobriyan
Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: validate directory entry data before useDuane Griffin
ext3_dx_find_entry uses ext3_next_entry without verifying that the entry is valid. If its rec_len == 0 this causes an infinite loop. Refactor the loop to check the validity of entries before checking whether they match and moving onto the next one. There are other uses of ext3_next_entry in this file which also look problematic. They should be reviewed and fixed if/when we have a test-case that triggers them. This patch fixes the first case (image hdb.25.softlockup.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: kill 2 useless magic numbersLi Zefan
dx_root_limit() will never return 20, and I can't figure out what 20 stands for. This function has never changed since htree directory indexing was merged. Similar for dx_node_limit() and the magic 22. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: handle deleting corrupted indirect blocksDuane Griffin
While freeing indirect blocks we attach a journal head to the parent buffer head, free the blocks, then journal the parent. If the indirect block list is corrupted and points to the parent the journal head will be detached when the block is cleared, causing an OOPS. Check for that explicitly and handle it gracefully. This patch fixes the third case (image hdb.20000057.nullderef.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. Immediately above the change, in the ext3_free_data function, we call ext3_clear_blocks to clear the indirect blocks in this parent block. If one of those blocks happens to actually be the parent block it will clear b_private / BH_JBD. I did the check at the end rather than earlier as it seemed more elegant. I don't think there should be much practical difference, although it is possible the FS may not be quite so badly corrupted if we did it the other way (and didn't clear the block at all). To be honest, I'm not convinced there aren't other similar failure modes lurking in this code, although I couldn't find any with a quick review. [akpm@linux-foundation.org: fix printk warning] Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: don't read inode block if the buffer has a write errorHidehiro Kawai
A transient I/O error can corrupt inode data. Here is the scenario: (1) update inode_A at the block_B (2) pdflush writes out new inode_A to the filesystem, but it results in write I/O error, at this point, BH_Uptodate flag of the buffer for block_B is cleared and BH_Write_EIO is set (3) create new inode_C which located at block_B, and __ext3_get_inode_loc() tries to read on-disk block_B because the buffer is not uptodate (4) if it can read on-disk block_B successfully, inode_A is overwritten by old data This patch makes __ext3_get_inode_loc() not read the inode block if the buffer has BH_Write_EIO flag. In this case, the buffer should have the latest information, so setting the uptodate flag to the buffer (this avoids WARN_ON_ONCE() in mark_buffer_dirty().) According to this change, we would need to test BH_Write_EIO flag for the error checking. Currently nobody checks write I/O errors on metadata buffers, but it will be done in other patches I'm working on. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: sugita <yumiko.sugita.yf@hitachi.com> Cc: Satoshi OSHIMA <satoshi.oshima.fk@hitachi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Jan Kara <jack@ucw.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: handle corrupted orphan list at mountDuane Griffin
If the orphan node list includes valid, untruncatable nodes with nlink > 0 the ext3_orphan_cleanup loop which attempts to delete them will not do so, causing it to loop forever. Fix by checking for such nodes in the ext3_orphan_get function. This patch fixes the second case (image hdb.20000009.softlockup.gz) reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: printk warning fix] Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: remove double definitions of xattr macrosShen Feng
remove the definitions of macros: XATTR_TRUSTED_PREFIX XATTR_USER_PREFIX since they are defined in linux/xattr.h Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: improve some code in rb tree part of dir.cShen Feng
- remove unnecessary code in free_rb_tree_fname - rename free_rb_tree_fname to ext3_htree_create_dir_info since it and ext3_htree_free_dir_info are a pair - replace kmalloc with kzalloc in ext3_htree_free_dir_info Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: correct mount option parsing to detect when quota options can be changedJan Kara
We should not allow user to change quota mount options when quota is just suspended. I would make mount options and internal quota state inconsistent. Also we should not allow user to change quota format when quota is turned on. On the other hand we can just silently ignore when some option is set to the value it already has (mount does this on remount). Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: fix typos in messages and comments (journalled -> journaled)Jan Kara
Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25ext3: fix synchronization of quota files in journal=data modeJan Kara
In journal=data mode, it is not enough to do write_inode_now as done in vfs_quota_on() to write all data to their final location (which is needed for quota_read to work correctly). Calling journal_flush() does its job. Reported-by: Nick <gentuu@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-04ext3: add missing unlock to error path in ext3_quota_write()Jan Kara
When write in ext3_quota_write() fails, we have to properly release i_mutex. One error path has been missing the unlock... Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-06ext3: fix online resize bugJosef Bacik
There is a bug when we are trying to verify that the reserve inode's double indirect blocks point back to the primary gdt blocks. The fix is obvious, we need to mod the gdb count by the addr's per block. You can verify this with the following test case dd if=/dev/zero of=disk1 seek=1024 count=1 bs=100M losetup /dev/loop1 disk1 pvcreate /dev/loop1 vgcreate loopvg1 /dev/loop1 lvcreate -l 100%VG loopvg1 -n looplv1 mkfs.ext3 -J size=64 -b 1024 /dev/loopvg1/looplv1 mount /dev/loopvg1/looplv1 /mnt/loop dd if=/dev/zero of=disk2 seek=1024 count=1 bs=50M losetup /dev/loop2 disk2 pvcreate /dev/loop2 vgextend loopvg1 /dev/loop2 lvextend -l 100%VG /dev/loopvg1/looplv1 resize2fs /dev/loopvg1/looplv1 without this patch the resize2fs fails, with it the resize2fs succeeds. Signed-off-by: Josef Bacik <jbacik@redhat.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-14ext3/4: fix uninitialized bs in ext3/4_xattr_set_handle()Tiger Yang
This fix the uninitialized bs when we try to replace a xattr entry in ibody with the new value which require more than free space. This situation only happens we format ext3/4 with inode size more than 128 and we have put xattr entries both in ibody and block. The consequences about this bug is we will lost the xattr block which pointed by i_file_acl with all xattr entires in it. We will alloc a new xattr block and put that large value entry in it. The old xattr block will become orphan block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Cc: <linux-ext4@vger.kernel.org> Cc: Andreas Gruenbacher <agruen@suse.de> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ext3: fix test ext_generic_write_end() copied return valueRoel Kluin
'copied' is unsigned, whereas 'ret2' is not. The test (copied < 0) fails Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-28ext3: replace remaining __FUNCTION__ occurrencesHarvey Harrison
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28ext3: fix mount messages when quota disabledJan Kara
When quota is disabled, we should not print 'journaled quota not supported' when user tried to mount non-journaled quota. Also fix typo in the message. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28ext3: retry block allocation if new blocks are allocated from system zoneAneesh Kumar K.V
If the block allocator gets blocks out of system zone ext3 calls ext3_error. But if the file system is mounted with errors=continue retry block allocation. We need to mark the system zone blocks as in use to make sure retry don't pick them again System zone is the block range mapping block bitmap, inode bitmap and inode table. [akpm@linux-foundation.org: fix typo in comment] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28ext3: fix hang on umount with quotas when journal is abortedJan Kara
Call dquot_drop() from ext3_dquot_drop() even if we fail to start a transaction. Otherwise we never get to dropping references to quota structures from the inode and umount will hang indefinitely. Thanks to Payphone LIOU for spotting the problem. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Payphone LIOU <lioupayphone@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28ext3: fix update of mtime and ctime on renameJan Kara
Make ext3 update mtime and ctime of the directory into which we move file even if the directory entry already exists. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28fs/ext3: use BUG_ONJulia Lawall
if (...) BUG(); should be replaced with BUG_ON(...) when the test has no side-effects to allow a definition of BUG_ON that drops the code completely. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @ disable unlikely @ expression E,f; @@ ( if (<... f(...) ...>) { BUG(); } | - if (unlikely(E)) { BUG(); } + BUG_ON(E); ) @@ expression E,f; @@ ( if (<... f(...) ...>) { BUG(); } | - if (E) { BUG(); } + BUG_ON(E); ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>