aboutsummaryrefslogtreecommitdiff
path: root/fs/ext4/mballoc.c
AgeCommit message (Collapse)Author
2008-07-14ext4: delayed allocation ENOSPC handlingMingming Cao
This patch does block reservation for delayed allocation, to avoid ENOSPC later at page flush time. Blocks(data and metadata) are reserved at da_write_begin() time, the freeblocks counter is updated by then, and the number of reserved blocks is store in per inode counter. At the writepage time, the unused reserved meta blocks are returned back. At unlink/truncate time, reserved blocks are properly released. Updated fix from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> to fix the oldallocator block reservation accounting with delalloc, added lock to guard the counters and also fix the reservation for meta blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-11ext4: fix online resize with mballocFrederic Bohe
Update group infos when updating a group's descriptor. Add group infos when adding a group's descriptor. Refresh cache pages used by mb_alloc when changes occur. This will probably need modifications when META_BG resizing will be allowed. Signed-off-by: Frederic Bohe <frederic.bohe@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2008-07-11ext4: mballoc avoid use root reserved blocks for non root allocationMingming Cao
mballoc allocation missed check for blocks reserved for root users. Add ext4_has_free_blocks() check before allocation. Also modified ext4_has_free_blocks() to support multiple block allocation request. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: cleanup block allocatorAneesh Kumar K.V
Move the code related to block allocation to a single function and add helper funtions to differient allocation for data and meta data blocks Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: remove quota allocation when ext4_mb_new_blocks failsShen Feng
Quota allocation is not removed when ext4_mb_new_blocks calls kmem_cache_alloc failed. Also make sure the allocation context is freed on the error path. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: New inode allocation for FLEX_BG meta-data groups.Jose R. Santos
This patch mostly controls the way inode are allocated in order to make ialloc aware of flex_bg block group grouping. It achieves this by bypassing the Orlov allocator when block group meta-data are packed toghether through mke2fs. Since the impact on the block allocator is minimal, this patch should have little or no effect on other block allocation algorithms. By controlling the inode allocation, it can basically control where the initial search for new block begins and thus indirectly manipulate the block allocator. This allocator favors data and meta-data locality so the disk will gradually be filled from block group zero upward. This helps improve performance by reducing seek time. Since the group of inode tables within one flex_bg are treated as one giant inode table, uninitialized block groups would not need to partially initialize as many inode table as with Orlov which would help fsck time as the filesystem usage goes up. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-13ext4: fix error processing in mb_free_blocksShen Feng
The error processing of the return value of mb_free_blocks is meanless because it only returns 0. This fix includes - make mb_free_blocks return void - remove the error processing part in callers - unlock group before calling ext4_error in mb_free_blocks Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-13ext4: error proc entry creation when the fs/ext4 is not correctly createdShen Feng
When the directory fs/ext4 is not correctly created under proc, the entry under this directory should not be created. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2008-07-11ext4: Rename read_block_bitmap() to ext4_read_block_bitmap()Theodore Ts'o
Since this a non-static function, make it be ext4 specific to avoid conflicts with potentially other filesystems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: miscellaneous error checks and coding cleanups for mballocShen Feng
ext4_mb_seq_history_open(): check if sbi->s_mb_history is NULL ext4_mb_history_init(): replace kmalloc and memset with kzalloc ext4_mb_init_backend(): remove memset since kzalloc is used ext4_mb_init(): the return value of ext4_mb_init_backend is int, but i is unsigned, replace it with a new int variable. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: add error processing when calling ext4_mb_init_cache in mballocShen Feng
Add error processing for ext4_mb_load_buddy when it calls ext4_mb_init_cache. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: Fix ext4_mb_init_cache return errorMingming Cao
ext4_mb_init_cache() incorrectly always return EIO on success. This causes the caller of ext4_mb_init_cache() fail when it checks the return value. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: switch to seq_filesAlexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-07-11ext4: start searching for the right extent from the goal group.Aneesh Kumar K.V
With mballoc we search for the best extent using different criteria. We should always use the goal group when we are starting with a new criteria. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-07-11ext4: Fix mb_find_next_bit not to return larger than maxAneesh Kumar K.V
Some architectures implement ext4_find_next_bit and ext4_find_next_zero_bit in such a way that they return greater than max for some input values. Make sure mb_find_next_bit and mb_find_next_zero_bit return the right values. On 2.6.25 we have include/asm-x86/bitops_32.h static inline unsigned find_first_bit(const unsigned long *addr, unsigned size) { unsigned x = 0; while (x < size) { unsigned long val = *addr++; if (val) return __ffs(val) + x; x += (sizeof(*addr)<<3); } return x; } This can return value greater than size. Reported and fixed here for lustre https://bugzilla.lustre.org/show_bug.cgi?id=15932 https://bugzilla.lustre.org/attachment.cgi?id=17205 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-06-05ext4: Fix use of uninitialized data with debug enabled.Aneesh Kumar K.V
Fix use of uninitialized data with debug enabled. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-05-15ext4: Retry block allocation if new blocks are allocated from system zone.Aneesh Kumar K.V
If the block allocator gets blocks out of system zone ext4 calls ext4_error. But if the file system is mounted with errors=continue retry block allocation. We need to mark the system zone blocks as in use to make sure retry don't pick them again System zone is the block range mapping block bitmap, inode bitmap and inode table. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-05-13ext4: mballoc fix mb_normalize_request algorithm for 1KB block size filesystemsValerie Clement
In case of inode preallocation, the number of blocks to allocate depends on the file size and it is calculated in ext4_mb_normalize_request(). Each group in the filesystem is then checked to find one that can be used for allocation; this is done in ext4_mb_good_group(). When a file bigger than 4MB is created, the requested number of blocks to preallocate, calculated by ext4_mb_normalize_request is 4096. However for a filesystem with 1KB block size, the maximum size of the block buddies used by the multiblock allocator is 2048, so none of groups in the filesystem satisfies the search criteria in ext4_mb_good_group(). Scanning all the filesystem groups impacts performance. This was demonstrated by using a freshly created, 70GB, 1k block filesystem, with caches dropped write before the test via /proc/sys/vm/drop_caches, and with the filesystem mounted with nodelalloc and nodealloc,nomballoc. The time to write an 8 megabyte file using "dd if=/dev/zero of=/mnt/test/fo bs=8k count=1k conv=fsync" took 35.5091 seconds (236kB/s) with nodellaloc, and 0.233754 seconds (35.9 MB/s) with the nodelloc,nomballoc options. With a 1TB partition, it took several minutes to write 8MB! This patch modifies the algorithm in ext4_mb_normalize_group_request to calculate the number of blocks to allocate by taking into account the maximum size of free blocks chunks handled by the multiblock allocator. It has also been tested for filesystems with 2KB and 4KB block sizes to ensure that those cases don't regress. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-05-13Fix misuses of bdevname()Jean Delvare
bdevname() fills the buffer that it is given as a parameter, so calling strcpy() or snprintf() on the returned value is redundant (and probably not guaranteed to work - I don't think strcpy and snprintf support overlapping buffers.) Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Stephen Tweedie <sct@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29ext4: fix hot spins in mballoc after err_freebuddy and err_freemetaRoel Kluin
In ext4_mb_init_backend() 'i' is of type ext4_group_t. Since unsigned, i >= 0 is always true, so fix hot spins after err_freebuddy: and -meta: and prevent decrements when zero. Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29ext4: Move mballoc headers/structures to a seperate header file mballoc.hMingming Cao
Move function and structure definiations out of mballoc.c and put it under a new header file mballoc.h Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29ext4: cleanup for compiling mballoc with verification and debugging #definesSolofo Ramangalahy
This patch allows compiling mballoc with: #define AGGRESSIVE_CHECK #define DOUBLE_CHECK #define MB_DEBUG It fixes: Compilation errors: fs/ext4/mballoc.c: In function '__mb_check_buddy': fs/ext4/mballoc.c:605: error: 'struct ext4_prealloc_space' has no member named 'group_list' fs/ext4/mballoc.c:606: error: 'struct ext4_prealloc_space' has no member named 'pstart' fs/ext4/mballoc.c:608: error: 'struct ext4_prealloc_space' has no member named 'len' Compilation warnings: fs/ext4/mballoc.c: In function 'ext4_mb_normalize_group_request': fs/ext4/mballoc.c:2863: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int' fs/ext4/mballoc.c: In function 'ext4_mb_use_inode_pa': fs/ext4/mballoc.c:3103: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int' Sparse check: fs/ext4/mballoc.c:3818:2: warning: context imbalance in 'ext4_mb_show_ac' - different lock contexts for basic block Signed-off-by: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29ext4: Don't do GFP_NOFS allocations after taking ext4_lock_groupAneesh Kumar K.V
We can't do GFP_NOFS allocation after taking ext4_lock_group BUG: sleeping function called from invalid context at mm/slab.c:3054 in_atomic():1, irqs_disabled():0 1 lock held by vi/2426: #0: (&ei->i_data_sem){----}, at: [<c01cf665>] ext4_release_file+0x23/0x66 Pid: 2426, comm: vi Not tainted 2.6.25-rc7 #24 [<c011a3dc>] __might_sleep+0xbe/0xc5 [<c01620c9>] kmem_cache_alloc+0x22/0xa6 [<c01e382a>] ext4_mb_release_inode_pa+0x73/0x1b3 [<c01e6adf>] ext4_mb_discard_inode_preallocations+0x22d/0x2d4 [<c013000a>] ? param_set_ushort+0x32/0x39 [<c01ceba1>] ext4_discard_reservation+0x27/0x6a [<c01cf66c>] ext4_release_file+0x2a/0x66 [<c0165bd6>] __fput+0xae/0x155 [<c0165e46>] fput+0x17/0x19 [<c0163756>] filp_close+0x50/0x5a [<c01647c0>] sys_close+0x71/0xad [<c0104aba>] sysenter_past_esp+0x5f/0xa5 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29ext4: move headers out of include/linuxChristoph Hellwig
Move ext4 headers out of include/linux. This is just the trivial move, there's some more thing that could be done later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17ext4: replace remaining __FUNCTION__ occurrencesHarvey Harrison
__FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17ext4: remove extra define of ext4_new_blocks_old from mballoc.cMingming Cao
The function prototype of ext4_new_blocks_old() is defined in ext4_fs.h, so we don't need the extra function prototype in mballoc.c Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-17ext4: le*_add_cpu conversionMarcin Slusarz
replace all: little_endian_variable = cpu_to_leX(leX_to_cpu(little_endian_variable) + expression_in_cpu_byteorder); with: leX_add_cpu(&little_endian_variable, expression_in_cpu_byteorder); generated with semantic patch Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: sct@redhat.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: adilger@clusterfs.com Cc: Mingming Cao <cmm@us.ibm.com>
2008-04-17ext4: Convert list_for_each_rcu() to list_for_each_entry_rcu()Aneesh Kumar K.V
The list_for_each_entry_rcu() primitive should be used instead of list_for_each_rcu(), as the former is easier to use and provides better type safety. http://groups.google.com/group/linux.kernel/browse_thread/thread/45749c83451cebeb/0633a65759ce7713?lnk=raot Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29ext4: reduce mballoc stack usage with noinline_for_stackEric Sandeen
mballoc.c is a whole lot of static functions, which gcc seems to really like to inline. With the changes below, on x86, I can at least get from: 432 ext4_mb_new_blocks 240 ext4_mb_free_blocks 208 ext4_mb_discard_group_preallocations 188 ext4_mb_seq_groups_show 164 ext4_mb_init_cache 152 ext4_mb_release_inode_pa 136 ext4_mb_seq_history_show ... to 220 ext4_mb_free_blocks 188 ext4_mb_seq_groups_show 176 ext4_mb_regular_allocator 164 ext4_mb_init_cache 156 ext4_mb_new_blocks 152 ext4_mb_release_inode_pa 136 ext4_mb_seq_history_show 124 ext4_mb_release_group_pa ... which still has some big functions in there, but not 432 bytes! Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-04-29ext4: use non-racy method for proc entries creationDenis V. Lunev
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data be setup before gluing PDE to main tree. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: <linux-ext4@vger.kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29proc: remove proc_root_fsAlexey Dobriyan
Use creation by full path instead: "fs/foo". Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-23ext4: ext4_find_next_zero_bit needs an aligned address on some archAneesh Kumar K.V
ext4_find_next_zero_bit and ext4_find_next_bit needs a long aligned address on x8_64. Add mb_find_next_zero_bit and mb_find_next_bit and use them in the mballoc. Fix: https://bugzilla.redhat.com/show_bug.cgi?id=433286 Eric Sandeen debugged the problem and suggested the fix. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-15ext4: Don't claim block from group which has corrupt bitmapAneesh Kumar K.V
In ext4_mb_complex_scan_group, if the extent length of the newly found extentet is greater than than the total free blocks counted in group info, break without claiming the block. Document different ext4_error usage, explaining the state with which we continue if we mount with errors=continue Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-15ext4: Fix kernel BUG at fs/ext4/mballoc.c:910!Valerie Clement
With the flex_bg feature enabled, a large file creation oopses the kernel. The BUG_ON is: BUG_ON(len >= EXT4_BLOCKS_PER_GROUP(sb)); As the allocation of the bitmaps and the inode table can be done outside the block group with flex_bg, this allows to allocate up to EXT4_BLOCKS_PER_GROUP blocks in a group. This patch fixes the oops. Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10ext4: Don't panic in case of corrupt bitmapAneesh Kumar K.V
Multiblock allocator calls BUG_ON in many case if the free and used blocks count obtained looking at the bitmap is different from what the allocator internally accounted for. Use ext4_error in such case and don't panic the system. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10ext4: allocate struct ext4_allocation_context from a kmem cacheEric Sandeen
struct ext4_allocation_context is rather large, and this bloats the stack of many functions which use it. Allocating it from a named slab cache will alleviate this. For example, with this change (on top of the noinline patch sent earlier): -ext4_mb_new_blocks 200 +ext4_mb_new_blocks 40 -ext4_mb_free_blocks 344 +ext4_mb_free_blocks 168 -ext4_mb_release_inode_pa 216 +ext4_mb_release_inode_pa 40 -ext4_mb_release_group_pa 192 +ext4_mb_release_group_pa 24 Most of these stack-allocated structs are actually used only for mballoc history; and in those cases often a smaller struct would do. So changing that may be another way around it, at least for those functions, if preferred. For now, in those cases where the ac is only for history, an allocation failure simply skips the history recording, and does not cause any other failures. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2008-02-10ext4: Fix null bh pointer dereference in mballocAneesh Kumar K.V
Repoted by Adrian Bunk <bunk@kernel.org>: The Coverity checker spotted the following NULL dereference: static int ext4_mb_mark_diskspace_used { ... if (!bitmap_bh) goto out_err; ... out_err: sb->s_dirt = 1; put_bh(bitmap_bh); ... Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2008-01-29ext4: Add multi block allocator for ext4Alex Tomas
Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>