aboutsummaryrefslogtreecommitdiff
path: root/fs/ext4/ext4.h
AgeCommit message (Collapse)Author
2010-01-15ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flagAneesh Kumar K.V
We should update reserve space if it is delalloc buffer and that is indicated by EXT4_GET_BLOCKS_DELALLOC_RESERVE flag. So use EXT4_GET_BLOCKS_DELALLOC_RESERVE in place of EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2010-01-25ext4: Fix quota accounting error with fallocateAneesh Kumar K.V
When we fallocate a region of the file which we had recently written, and which is still in the page cache marked as delayed allocated blocks we need to make sure we don't do the quota update on writepage path. This is because the needed quota updated would have already be done by fallocate. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2010-01-01ext4: Calculate metadata requirements more accuratelyTheodore Ts'o
In the past, ext4_calc_metadata_amount(), and its sub-functions ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount() badly over-estimated the number of metadata blocks that might be required for delayed allocation blocks. This didn't matter as much when functions which managed the reserved metadata blocks were more aggressive about dropping reserved metadata blocks as delayed allocation blocks were written, but unfortunately they were too aggressive. This was fixed in commit 0637c6f, but as a result the over-estimation by ext4_calc_metadata_amount() would lead to reserving 2-3 times the number of pending delayed allocation blocks as potentially required metadata blocks. So if there are 1 megabytes of blocks which have been not yet been allocation, up to 3 megabytes of space would get reserved out of the user's quota and from the file system free space pool until all of the inode's data blocks have been allocated. This commit addresses this problem by much more accurately estimating the number of metadata blocks that will be required. It will still somewhat over-estimate the number of blocks needed, since it must make a worst case estimate not knowing which physical blocks will be needed, but it is much more accurate than before. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-23ext4: Convert to generic reserved quota's space management.Dmitry Monakhov
This patch also fixes write vs chown race condition. Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
2009-12-08ext4: Wait for proper transaction commit on fsyncJan Kara
We cannot rely on buffer dirty bits during fsync because pdflush can come before fsync is called and clear dirty bits without forcing a transaction commit. What we do is that we track which transaction has last changed the inode and which transaction last changed allocation and force it to disk on fsync. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-23ext4: call ext4_forget() from ext4_free_blocks()Theodore Ts'o
Add the facility for ext4_forget() to be called from ext4_free_blocks(). This simplifies the code in a large number of places, and centralizes most of the work of calling ext4_forget() into a single place. Also fix a bug in the extents migration code; it wasn't calling ext4_forget() when releasing the indirect blocks during the conversion. As a result, if the system cashed during or shortly after the extents migration, and the released indirect blocks get reused as data blocks, the journal replay would corrupt the data blocks. With this new patch, fixing this bug was as simple as adding the EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
2009-11-22ext4: fold ext4_free_blocks() and ext4_mb_free_blocks()Theodore Ts'o
ext4_mb_free_blocks() is only called by ext4_free_blocks(), and the latter function doesn't really do much. So merge the two functions together, such that ext4_free_blocks() is now found in fs/ext4/mballoc.c. This saves about 200 bytes of compiled text space. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22ext4: move ext4_forget() to ext4_jbd2.cTheodore Ts'o
The ext4_forget() function better belongs in ext4_jbd2.c. This will allow us to do some cleanup of the ext4_journal_revoke() and ext4_journal_forget() functions, as well as giving us better error reporting since we can report the caller of ext4_forget() when things go wrong. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-19ext4: make trim/discard optional (and off by default)Eric Sandeen
It is anticipated that when sb_issue_discard starts doing real work on trim-capable devices, we may see issues. Make this mount-time optional, and default it to off until we know that things are working out OK. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-10ext4: skip conversion of uninit extents after direct IO if there isn't anyMingming
At the end of direct I/O operation, ext4_ext_direct_IO() always called ext4_convert_unwritten_extents(), regardless of whether there were any unwritten extents involved in the I/O or not. This commit adds a state flag so that ext4_ext_direct_IO() only calls ext4_convert_unwritten_extents() when necessary. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-02Revert "ext4: Remove journal_checksum mount option and enable it by default"Linus Torvalds
This reverts commit d0646f7b636d067d715fab52a2ba9c6f0f46b0d7, as requested by Eric Sandeen. It can basically cause an ext4 filesystem to miss recovery (and thus get mounted with errors) if the journal checksum does not match. Quoth Eric: "My hand-wavy hunch about what is happening is that we're finding a bad checksum on the last partially-written transaction, which is not surprising, but if we have a wrapped log and we're doing the initial scan for head/tail, and we abort scanning on that bad checksum, then we are essentially running an unrecovered filesystem. But that's hand-wavy and I need to go look at the code. We lived without journal checksums on by default until now, and at this point they're doing more harm than good, so we should revert the default-changing commit until we can fix it and do some good power-fail testing with the fixes in place." See http://bugzilla.kernel.org/show_bug.cgi?id=14354 for all the gory details. Requested-by: Eric Sandeen <sandeen@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Alexey Fisher <bug-track@fisher-privat.net> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Mathias Burén <mathias.buren@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-30ext4: Fix time encoding with extra epoch bitsTheodore Ts'o
"Looking at ext4.h, I think the setting of extra time fields forgets to mask the epoch bits so the epoch part overwrites nsec part. The second change is only for coherency (2 -> EXT4_EPOCH_BITS)." Thanks to Damien Guibouret for pointing out this problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-30ext4: Use tracepoints for mb_history trace fileTheodore Ts'o
The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a number of problems: it required a largish amount of memory to be allocated for each ext4 filesystem, and the s_mb_history_lock introduced a CPU contention problem. By ripping out the mb_history code and replacing it with ftrace tracepoints, and we get more functionality: timestamps, event filtering, the ability to correlate mballoc history with other ext4 tracepoints, etc. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-28ext4: async direct IO for holes and fallocate supportMingming Cao
For async direct IO that covers holes or fallocate, the end_io callback function now queued the convertion work on workqueue but don't flush the work rightaway as it might take too long to afford. But when fsync is called after all the data is completed, user expects the metadata also being updated before fsync returns. Thus we need to flush the conversion work when fsync() is called. This patch keep track of a listed of completed async direct io that has a work queued on workqueue. When fsync() is called, it will go through the list and do the conversion. Signed-off-by: Mingming Cao <cmm@us.ibm.com>
2009-09-28ext4: Use end_io callback to avoid direct I/O fallback to buffered I/OMingming Cao
Currently the DIO VFS code passes create = 0 when writing to the middle of file. It does this to avoid block allocation for holes, so as not to expose stale data out when there is a parallel buffered read (which does not hold the i_mutex lock). Direct I/O writes into holes falls back to buffered IO for this reason. Since preallocated extents are treated as holes when doing a get_block() look up (buffer is not mapped), direct IO over fallocate also falls back to buffered IO. Thus ext4 actually silently falls back to buffered IO in above two cases, which is undesirable. To fix this, this patch creates unitialized extents when a direct I/O write into holes in sparse files, and registering an end_io callback which converts the uninitialized extent to an initialized extent after the I/O is completed. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-28ext4: Split uninitialized extents for direct I/OMingming Cao
When writing into an unitialized extent via direct I/O, and the direct I/O doesn't exactly cover the unitialized extent, split the extent into uninitialized and initialized extents before submitting the I/O. This avoids needing to deal with an ENOSPC error in the end_io callback that gets used for direct I/O. When the IO is complete, the written extent will be marked as initialized. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-29ext4: Adjust ext4_da_writepages() to write out larger contiguous chunksTheodore Ts'o
Work around problems in the writeback code to force out writebacks in larger chunks than just 4mb, which is just too small. This also works around limitations in the ext4 block allocator, which can't allocate more than 2048 blocks at a time. So we need to defeat the round-robin characteristics of the writeback code and try to write out as many blocks in one inode before allowing the writeback code to move on to another inode. We add a a new per-filesystem tunable, max_writeback_mb_bump, which caps this to a default of 128mb per inode. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-17ext4: replace MAX_DEFRAG_SIZE with EXT_MAX_BLOCKEric Sandeen
There's no reason to redefine the maximum allowable offset in an extent-based file just for defrag; EXT_MAX_BLOCK already does this. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-17ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flagsTheodore Ts'o
EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag, and the hex value assigned to it collides with FS_DIRECTIO_FL (which is also stored in i_flags). There's no reason for the EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use i_state instead. Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-16ext4: limit block allocations for indirect-block files to < 2^32Eric Sandeen
Today, the ext4 allocator will happily allocate blocks past 2^32 for indirect-block files, which results in the block numbers getting truncated, and corruption ensues. This patch limits such allocations to < 2^32, and adds BUG_ONs if we do get blocks larger than that. This should address RH Bug 519471, ext4 bitmap allocator must limit blocks to < 2^32 * ext4_find_goal() is modified to choose a goal < UINT_MAX, so that our starting point is in an acceptable range. * ext4_xattr_block_set() is modified such that the goal block is < UINT_MAX, as above. * ext4_mb_regular_allocator() is modified so that the group search does not continue into groups which are too high * ext4_mb_use_preallocated() has a check that we don't use preallocated space which is too far out * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs No attempt has been made to limit inode locations to < 2^32, so we may wind up with blocks far from their inodes. Doing this much already will lead to some odd ENOSPC issues when the "lower 32" gets full, and further restricting inodes could make that even weirder. For high inodes, choosing a goal of the original, % UINT_MAX, may be a bit odd, but then we're in an odd situation anyway, and I don't know of a better heuristic. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-05ext4: Remove journal_checksum mount option and enable it by defaultTheodore Ts'o
There's no real cost for the journal checksum feature, and we should make sure it is enabled all the time. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-31ext4: Add new tracepoint: trace_ext4_da_write_pages()Theodore Ts'o
Add a new tracepoint which shows the pages that will be written using write_cache_pages() by ext4_da_writepages(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-25ext4: use ext4_grpblk_t more extensivelyEric Sandeen
unsigned short is potentially too small to track blocks within a group; today it is safe due to restrictions in e2fsprogs but we have _lo / _hi bits for group blocks with the intent to go up to 32 bits, so clean this up now. There are many more places where we use unsigned/int/unsigned int to contain a group block but this should at least fix all the short types. I added a few comments to the struct ext4_group_info definition as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-17ext4: open-code ext4_mb_update_group_infoEric Sandeen
ext4_mb_update_group_info is only called in one place, and it's extremely simple. There's no reason to have it in a separate function in a separate file as far as I can tell, it just obfuscates what's really going on. Perhaps it was intended to keep the grp->bb_* manipulation local to mballoc.c but we're already accessing other grp-> fields in balloc.c directly so this seems ok. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-17ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks()Jan Kara
During truncate we are sometimes forced to start a new transaction as the amount of blocks to be journaled is both quite large and hard to predict. So far we restarted a transaction while holding i_data_sem and that violates lock ordering because i_data_sem ranks below a transaction start (and it can lead to a real deadlock with ext4_get_blocks() mapping blocks in some page while having a transaction open). We fix the problem by dropping the i_data_sem before restarting the transaction and acquire it afterwards. It's slightly subtle that this works: 1) By the time ext4_truncate() is called, all the page cache for the truncated part of the file is dropped so get_block() should not be called on it (we only have to invalidate extent cache after we reacquire i_data_sem because some extent from not-truncated part could extend also into the part we are going to truncate). 2) Writes, migrate or defrag hold i_mutex so they are stopped for all the time of the truncate. This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-09-18ext4: Avoid group preallocation for closed filesTheodore Ts'o
Currently the group preallocation code tries to find a large (512) free block from which to do per-cpu group allocation for small files. The problem with this scheme is that it leaves the filesystem horribly fragmented. In the worst case, if the filesystem is unmounted and remounted (after a system shutdown, for example) we forget the fact that wee were using a particular (now-partially filled) 512 block extent. So the next time we try to allocate space for a small file, we will find *another* completely free 512 block chunk to allocate small files. Given that there are 32,768 blocks in a block group, after 64 iterations of "mount, write one 4k file in a directory, unmount", the block group will have 64 files, each separated by 511 blocks, and the block group will no longer have any free 512 completely free chunks of blocks for group preallocation space. So if we try to allocate blocks for a file that has been closed, such that we know the final size of the file, and the filesystem is not busy, avoid using group preallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-09ext4: Fix bugs in mballoc's stream allocation modeTheodore Ts'o
The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all screwed up. These fields were getting unconditionally all the time, set even when stream allocation had not taken place, and if they were being used when the file was smaller than s_mb_stream_request, which is when the allocation should _not_ be doing stream allocation. Fix this by determining whether or not we stream allocation should take place once, in ext4_mb_group_or_file(), and setting a flag which gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found(). This simplifies the code and assures that we are consistently using (or not using) the stream allocation logic. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-08-09ext4: Display the mballoc flags in mb_history in hex instead of decimalTheodore Ts'o
Displaying the flags in base 16 makes it easier to see which flags have been set. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-07-13ext4: naturally align struct ext4_allocation_requestEric Sandeen
As Ted noted, the ext4_allocation_request isn't well aligned. Looking at it with pahole we're wasting space on 64-bit arches: struct ext4_allocation_request { struct inode * inode; /* 0 8 */ ext4_lblk_t logical; /* 8 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t goal; /* 16 8 */ ext4_lblk_t lleft; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t pleft; /* 32 8 */ ext4_lblk_t lright; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ ext4_fsblk_t pright; /* 48 8 */ unsigned int len; /* 56 4 */ unsigned int flags; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 9 */ /* sum members: 52, holes: 3, sum holes: 12 */ }; Grouping 32-bit members together closes these holes and shrinks the structure by 12 bytes. which is important since ext4 can get on the hairy edge of stack overruns. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-24switch ext4 to inode->i_aclAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-15ext4: Fix 64-bit block type problem on 32-bit platformsTheodore Ts'o
The function ext4_mb_free_blocks() was using an "unsigned long" to pass a block number; this will cause 64-bit block numbers to get truncated on x86 and other 32-bit platforms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2009-06-13ext4: teach the inode allocator to use a goal inode numberAndreas Dilger
Enhance the inode allocator to take a goal inode number as a paremeter; if it is specified, it takes precedence over Orlov or parent directory inode allocation algorithms. The extents migration function uses the goal inode number so that the extent trees allocated the migration function use the correct flex_bg. In the future, the goal inode functionality will also be used to allocate an adjacent inode for the extended attributes. Also, for testing purposes the goal inode number can be specified via /sys/fs/{dev}/inode_goal. This can be useful for testing inode allocation beyond 2^32 blocks on very large filesystems. Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-13ext4: Use a hash of the topdir directory name for the Orlov parent groupTheodore Ts'o
Instead of using a random number to determine the goal parent grop for the Orlov top directories, use a hash of the directory name. This allows for repeatable results when trying to benchmark filesystem layout algorithms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-13ext4: move the abort flag from s_mount_opts to s_mount_flagsTheodore Ts'o
We're running out of space in the mount options word, and EXT4_MOUNT_ABORT isn't really a mount option, but a run-time flag. So move it to become EXT4_MF_FS_ABORTED in s_mount_flags. Also remove bogus ext2_fs.h / ext4.h simultaneous #include protection, which can never happen. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-13ext4: update the s_last_mounted field in the superblockTheodore Ts'o
This field can be very helpful when a system administrator is trying to sort through large numbers of block devices or filesystem images. What is stored in this field can be ambiguous if multiple filesystem namespaces are in play; what we store in practice is the mountpoint interpreted by the process's namespace which first opens a file in the filesystem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-13ext4: change s_mount_opt to be an unsigned intTheodore Ts'o
We can only fit 32 options in s_mount_opt because an unsigned long is 32-bits on a x86 machine. So use an unsigned int to save space on 64-bit platforms. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-17ext4: online defrag -- Add EXT4_IOC_MOVE_EXT ioctlAkira Fujita
The EXT4_IOC_MOVE_EXT exchanges the blocks between orig_fd and donor_fd, and then write the file data of orig_fd to donor_fd. ext4_mext_move_extent() is the main fucntion of ext4 online defrag, and this patch includes all functions related to ext4 online defrag. Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: Takashi Sato <t-sato@yk.jp.nec.com> Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-04ext4: Change all super.c messages to print the deviceEric Sandeen
This patch changes ext4 super.c to include the device name with all warning/error messages, by using a new utility function ext4_msg. It's a rather large patch, but very mechanic. I left debug printks alone. This is a straightforward port of a patch which Andi Kleen did for ext3. Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-06-09ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle()Jan Kara
Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle(). This seems to be a relict from some old days and setting disksize in this function does not make much sense. Currently it was set only by ext4_getblk(). Since the parameter has some effect only if create == 1, it is easy to check by grepping through the sources that the three callers which end up calling ext4_getblk() with create == 1 (ext4_append, ext4_quota_write, ext4_mkdir) do the right thing and set disksize themselves. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-17ext4: Add a comprehensive block validity check to ext4_get_blocks()Theodore Ts'o
To catch filesystem bugs or corruption which could lead to the filesystem getting severly damaged, this patch adds a facility for tracking all of the filesystem metadata blocks by contiguous regions in a red-black tree. This allows quick searching of the tree to locate extents which might overlap with filesystem metadata blocks. This facility is also used by the multi-block allocator to assure that it is not allocating blocks out of the system zone, as well as by the routines used when reading indirect blocks and extents information from disk to make sure their contents are valid. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-14ext4: Clean up ext4_get_blocks() so it does not depend on bh_result->b_stateTheodore Ts'o
The ext4_get_blocks() function was depending on the value of bh_result->b_state as an input parameter to decide whether or not update the delalloc accounting statistics by calling ext4_da_update_reserve_space(). We now use a separate flag, EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE, to requests this update, so that all callers of ext4_get_blocks() can clear map_bh.b_state before calling ext4_get_blocks() without worrying about any consistency issues. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-14ext4: Define a new set of flags for ext4_get_blocks()Theodore Ts'o
The functions ext4_get_blocks(), ext4_ext_get_blocks(), and ext4_ind_get_blocks() used an ad-hoc set of integer variables used as boolean flags passed in as arguments. Use a single flags parameter and a setandard set of bitfield flags instead. This saves space on the call stack, and it also makes the code a bit more understandable. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-14ext4: Rename ext4_get_blocks_wrap() to be ext4_get_blocks()Theodore Ts'o
Another function rename for clarity's sake. The _wrap prefix simply confuses people, and didn't add much people trying to follow the code paths. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-15ext4: Fix spinlock assertions on UP systemsVincent Minet
On UP systems without DEBUG_SPINLOCK, ext4_is_group_locked always fails which triggers a BUG_ON() call. This patch fixes it by using assert_spin_locked instead. Signed-off-by: Vincent Minet <vincent@vincent-minet.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-02ext4: Convert ext4_lock_group to use sb_bgl_lockAneesh Kumar K.V
We have sb_bgl_lock() and ext4_group_info.bb_state bit spinlock to protech group information. The later is only used within mballoc code. Consolidate them to use sb_bgl_lock(). This makes the mballoc.c code much simpler and also avoid confusion with two locks protecting same info. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-01ext4: Move fs/ext4/group.h into ext4.hTheodore Ts'o
Move the function prototypes in group.h into ext4.h so they are all defined in one place. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-01ext4: Move fs/ext4/namei.h into ext4.hTheodore Ts'o
The fs/ext4/namei.h header file had only a single function declaration, and should have never been a standalone file. Move it into ext4.h, where should have been from the beginning. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-03ext4: Move the ext4_sb.h header file into ext4.hTheodore Ts'o
There is no longer a reason for a separate ext4_sb.h header file, so move it into ext4.h just to make life easier for developers to find the relevant data structures and typedefs. Should also speed up compiles slightly, too. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-01ext4: Move the ext4_i.h header file into ext4.hTheodore Ts'o
There is no longer a reason for a separate ext4_i.h header file, so move it into ext4.h just to make life easier for developers to find the relevant data structures and typedefs. Should also speed up compiles slightly, too. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-05-01ext4: Avoid races caused by on-line resizing and SMP memory reorderingTheodore Ts'o
Ext4's on-line resizing adds a new block group and then, only at the last step adjusts s_groups_count. However, it's possible on SMP systems that another CPU could see the updated the s_group_count and not see the newly initialized data structures for the just-added block group. For this reason, it's important to insert a SMP read barrier after reading s_groups_count and before reading any (for example) the new block group descriptors allowed by the increased value of s_groups_count. Unfortunately, we rather blatently violate this locking protocol documented in fs/ext4/resize.c. Fortunately, (1) on-line resizes happen relatively rarely, and (2) it seems rare that the filesystem code will immediately try to use just-added block group before any memory ordering issues resolve themselves. So apparently problems here are relatively hard to hit, since ext3 has been vulnerable to the same issue for years with no one apparently complaining. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>