aboutsummaryrefslogtreecommitdiff
path: root/fs/gfs2/ops_address.c
AgeCommit message (Collapse)Author
2009-03-24GFS2: Pagecache usage optimization on GFS2Hisashi Hifumi
I introduced "is_partially_uptodate" aops for GFS2. A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. I did a performance test using the sysbench. #sysbench --num-threads=16 --max-requests=200000 --test=fileio --file-num=1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --file-rw-ratio=1 run -2.6.29-rc6 Test execution summary: total time: 202.6389s total number of events: 200000 total time taken by event execution: 2580.0480 per-request statistics: min: 0.0000s avg: 0.0129s max: 49.5852s approx. 95 percentile: 0.0462s -2.6.29-rc6-patched Test execution summary: total time: 177.8639s total number of events: 200000 total time taken by event execution: 2419.0199 per-request statistics: min: 0.0000s avg: 0.0121s max: 52.4306s approx. 95 percentile: 0.0444s arch: ia64 pagesize: 16k blocksize: 4k Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24GFS2: Merge lock_dlm module into GFS2Steven Whitehouse
This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-03-24GFS2: change gfs2_quota_scan into a shrinkerAbhijith Das
Deallocation of gfs2_quota_data objects now happens on-demand through a shrinker instead of routinely deallocating through the quotad daemon. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-07GFS2: Set GFP_NOFS when allocating page on writeSteven Whitehouse
We need to ensure that we always set GFP_NOFS in this one particular case when allocating pages for write. Reported-by: Fabio M. Di Nitto <fdinitto@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Streamline alloc calculations for writesSteven Whitehouse
This patch removes some unused code, and make the calculation of the number of blocks required conditional in order to reduce the number of times this (potentially expensive) calculation is done. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksizeSteven Whitehouse
This patch moved the i_size field from the gfs2_dinode_host and following the ext3 convention renames it i_disksize. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-05GFS2: Fix up jdata writepage/delete_inodeSteven Whitehouse
There is a bug in writepage and delete_inode which allows jdata files to invalidate pages from the address space without being in a transaction at the time. This causes problems in case the pages are in the journal. This patch fixes that case and prevents the resulting oops. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2009-01-04fs: symlink write_begin allocation context fixNick Piggin
With the write_begin/write_end aops, page_symlink was broken because it could no longer pass a GFP_NOFS type mask into the point where the allocations happened. They are done in write_begin, which would always assume that the filesystem can be entered from reclaim. This bug could cause filesystem deadlocks. The funny thing with having a gfp_t mask there is that it doesn't really allow the caller to arbitrarily tinker with the context in which it can be called. It couldn't ever be GFP_ATOMIC, for example, because it needs to take the page lock. The only thing any callers care about is __GFP_FS anyway, so turn that into a single flag. Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on this flag in their write_begin function. Change __grab_cache_page to accept a nofs argument as well, to honour that flag (while we're there, change the name to grab_cache_page_write_begin which is more instructive and does away with random leading underscores). This is really a more flexible way to go in the end anyway -- if a filesystem happens to want any extra allocations aside from the pagecache ones in ints write_begin function, it may now use GFP_KERNEL (rather than GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a random example). [kosaki.motohiro@jp.fujitsu.com: fix ubifs] [kosaki.motohiro@jp.fujitsu.com: fix fuse] Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Cleaned up the calling convention: just pass in the AOP flags untouched to the grab_cache_page_write_begin() function. That just simplifies everybody, and may even allow future expansion of the logic. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-18GFS2: high time to take some time over atimeSteven Whitehouse
Until now, we've used the same scheme as GFS1 for atime. This has failed since atime is a per vfsmnt flag, not a per fs flag and as such the "noatime" flag was not getting passed down to the filesystems. This patch removes all the "special casing" around atime updates and we simply use the VFS's atime code. The net result is that GFS2 will now support all the same atime related mount options of any other filesystem on a per-vfsmnt basis. We do lose the "lazy atime" updates, but we gain "relatime". We could add lazy atime to the VFS at a later date, if there is a requirement for that variant still - I suspect relatime will be enough. Also we lose about 100 lines of code after this patch has been applied, and I have a suspicion that it will speed things up a bit, even when atime is "on". So it seems like a nice clean up as well. From a user perspective, everything stays the same except the loss of the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very least, and to be honest I don't think anybody ever used it) and that a number of options which were ignored before now work correctly. Please let me know if you've got any comments. I'm pushing this out early so that you can all see what my plans are. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-09-15GFS2: Direct IO write at end of file errorBob Peterson
This patch fixes a problem whereby a direct_io write doesn't fall back to buffered write properly at end of file. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] Revise readpage lockingSteven Whitehouse
The previous attempt to fix the locking in readpage failed due to the use of a "try lock" which resulted in occasional high cpu usage during testing (due to repeated tries) and also it did not resolve all the ordering problems wrt the transaction lock (although it did solve all the inode lock ordering problems). This patch avoids the problem by unlocking the page and getting the locks in the correct order. This means that we have to retest the page to ensure that it hasn't changed when we relock the page. This now passes the tests which were previously failing. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-06-27[GFS2] Clean up the glock coreSteven Whitehouse
This patch implements a number of cleanups to the core of the GFS2 glock code. As a result a lot of code is removed. It looks like a really big change, but actually a large part of this patch is either removing or moving existing code. There are some new bits too though, such as the new run_queue() function which is considerably streamlined. Highlights of this patch include: o Fixes a cluster coherency bug during SH -> EX lock conversions o Removes the "glmutex" code in favour of a single bit lock o Removes the ->go_xmote_bh() for inodes since it was duplicating ->go_lock() o We now only use the ->lm_lock() function for both locks and unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED) o The fast path is considerably shortly, giving performance gains especially with lock_nolock o The glock_workqueue is now used for all the callbacks from the DLM which allows us to simplify the lock_dlm module (see following patch) o The way is now open to make further changes such as eliminating the two threads (gfs2_glockd and gfs2_scand) in favour of a more efficient scheme. This patch has undergone extensive testing with various test suites so it should be pretty stable by now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
2008-04-28mm: remove nopageNick Piggin
Nothing in the tree uses nopage any more. Remove support for it in the core mm code and documentation (and a few stray references to it in comments). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-31[GFS2] Streamline quota lock/check for no-quota caseSteven Whitehouse
This patch streamlines the quota checking in the "no quota" case by making the check inline in the calling function, thus reducing the number of function calls. Eventually we might be able to remove the checks from the gfs2_quota_lock() and gfs2_quota_check() functions, but currently we can't as there are a very few places in the code which need to call these functions directly still. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2008-03-31[GFS2] gfs2_adjust_quota has broken unstuffing codeAbhijith Das
This patch combines the 2 patches in bug 434736 to correct the lock ordering in the unstuffing of the quota inode in gfs2_adjust_quota and adjusting the number of revokes in gfs2_write_jdata_pagevec Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] possible null pointer dereference fixupCyrill Gorcunov
gfs2_alloc_get may fail so we have to check it to prevent NULL pointer dereference. Signed-off-by: Cyrill Gorcunov <gorcunov@gamil.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Fix a page lock / glock deadlockSteven Whitehouse
We've previously been using a "try lock" in readpage on the basis that it would prevent deadlocks due to the inverted lock ordering (our normal lock ordering is glock first and then page lock). Unfortunately tests have shown that this isn't enough. If the glock has a demote request queued such that run_queue() in the glock code tries to do a demote when its called under readpage then it will try and write out all the dirty pages which requires locking them. This then deadlocks with the page locked by readpage. The solution is to always require two calls into readpage. The first unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the second does the actual readpage and unlocks the glock & page as required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-03-31[GFS2] Misc fixupsBob Peterson
This patch contains two small fixups that didn't fit elsewhere. They are: (1) get rid of temp variable in find_metapath. (2) Remove vestigial "ret" variable from gfs2_writepage_common. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-02-05Pagecache zeroing: zero_user_segment, zero_user_segments and zero_userChristoph Lameter
Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-25[GFS2] Reduce inode size by moving i_alloc out of lineSteven Whitehouse
It is possible to reduce the size of GFS2 inodes by taking the i_alloc structure out of the gfs2_inode. This patch allocates the i_alloc structure whenever its needed, and frees it afterward. This decreases the amount of low memory we use at the expense of requiring a memory allocation for each page or partial page that we write. A quick test with postmark shows that the overhead is not measurable and I also note that OCFS2 use the same approach. In the future I'd like to solve the problem by shrinking down the size of the members of the i_alloc structure, but for now, this reduces the immediate problem of using too much low-memory on x86 and doesn't add too much overhead. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Fix problems relating to execution of files on GFS2Steven Whitehouse
This patch fixes a couple of problems which affected the execution of files on GFS2. The first is that there was a corner case where inodes were not always uptodate at the point at which permissions checks were being carried out, this was resulting in refusal of execute permission, but only on the first lookup, subsequent requests worked correctly. The second was a problem relating to incorrect updating of file sizes which was introduced with the write_begin/end code for GFS2 a little while back. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
2008-01-25[GFS2] Allow page migration for writeback and ordered pagesSteven Whitehouse
To improve performance on NUMA, we use the VM's standard page migration for writeback and ordered pages. Probably we could also do the same for journaled data, but that would need a careful audit of the code, so will be the subject of a later patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Remove function gfs2_get_blockBob Peterson
This patch is just a cleanup. Function gfs2_get_block() just calls function gfs2_block_map reversing the last two parameters. By reversing the parameters, gfs2_block_map() may be called directly and function gfs2_get_block may be eliminated altogether. Since this function is done for every block operation, this streamlines the code and makes it a little bit more efficient. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Use correct include file in ops_address.cSteven Whitehouse
Something changed in the upstream kernel, and it needs this one-liner to allow ops_address.c to build. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Don't hold page lock when starting transactionSteven Whitehouse
This is an addendum to the new AOPs work which moves the point at which we take the page lock so that we don't get it until the last possible moment. This resolves a conflict between starting transactions and the page lock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Add writepages for GFS2 jdataSteven Whitehouse
This patch resolves a lock ordering issue where we had been getting a transaction lock in the wrong order with respect to the page lock. By using writepages rather than just writepage, it is then possible to start a transaction before locking the page, and thus matching the locking order elsewhere in the code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Split gfs2_writepage into three casesSteven Whitehouse
This patch splits gfs2_writepage into separate functions for each of the three cases: writeback, ordered and journalled. As a result it becomes a lot easier to see what each one is doing. The common code is moved into gfs2_writepage_common. This fixes a performance bug where we were doing more work than strictly required in the ordered write case. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Introduce gfs2_set_aops()Steven Whitehouse
Just like ext3 we now have three sets of address space operations to cover the cases of writeback, ordered and journalled data writes. This means that the individual operations can now become less complicated as we are able to remove some of the tests for file data mode from the code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Add gfs2_is_writeback()Steven Whitehouse
This adds a function "gfs2_is_writeback()" along the lines of the existing "gfs2_is_jdata()" in order to clean up the code and make the various tests for the inode mode more obvious. It also fixes the PageChecked() logic where we were resetting the flag too early in the case of an error path. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Remove useless i_cache from inodesSteven Whitehouse
The i_cache was designed to keep references to the indirect blocks used during block mapping so that they didn't have to be looked up continually. The idea failed because there are too many places where the i_cache needs to be freed, and this has in the past been the cause of many bugs. In addition there was no performance benefit being gained since the disk blocks in question were cached anyway. So this patch removes it in order to simplify the code to prepare for other changes which would otherwise have had to add further support for this feature. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Use ->page_mkwrite() for mmap()Steven Whitehouse
This cleans up the mmap() code path for GFS2 by implementing the page_mkwrite function for GFS2. We are thus able to use the generic filemap_fault function for our ->fault() implementation. This now means that shared writable mappings will be much more efficiently shared across the cluster if there is a reasonable proportion of read activity (the greater proportion, the better). As a side effect, it also reduces the size of the code, removes special cases from readpage and readpages, and makes the code path easier to follow. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2008-01-25[GFS2] Clean up internal read functionSteven Whitehouse
As requested by Christoph, this patch cleans up GFS2's internal read function so that it no longer uses the do_generic_mapping_read function. This function is obsolete and GFS2 is the last user of it. As a side effect the internal read code gets smaller and easier to read and gfs2_readpage is split into two. One function has the locking and the other function has the rest of the logic. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org>
2007-10-16gfs2: convert to new aopsSteven Whitehouse
Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-10[GFS2] Don't try to remove buffers that don't existSteven Whitehouse
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Data corruption fixWendy Cheng
* GFS2 has been using i_cache array to store its indirect meta blocks. Its flush routine doesn't correctly clean up all the entries. The problem would show while multiple nodes do simultaneous writes to the same file. Upon glock exclusive lock transfer, if the file is a sparse file with large file size where the indirect meta blocks span multiple array entries with "zero" entries in between. The flush routine prematurely stops the flushing that leaves old (stale) entries around. This leads to several nasty issues, including data corruption. * Fix gfs2_get_block_noalloc checking to correctly return EIO upon unmapped buffer. Signed-off-by: Wendy Cheng <wcheng@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Clean up journaled data writingSteven Whitehouse
This patch cleans up the code for writing journaled data into the log. It also removes the need to allocate a small "tag" structure for each block written into the log. Instead we just keep count of the outstanding I/O so that we can be sure that its all been written at the correct time. Another result of this patch is that a number of ll_rw_block() calls have become submit_bh() calls, closing some races at the same time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Clean up ordered write codeSteven Whitehouse
The following patch removes the ordered write processing from databuf_lo_before_commit() and moves it to log.c. This has the effect of greatly simplyfying databuf_lo_before_commit() and well as potentially making the ordered write code more efficient. As a side effect of this, its now possible to remove ordered buffers from the ordered buffer list at any time, so we now make use of this in invalidatepage and releasepage to ensure timely release of these buffers. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-10-10[GFS2] Clean up invalidatepage/releasepageSteven Whitehouse
This patch fixes some bugs relating to journaled data files by cleaning up the gfs2_invalidatepage() and gfs2_releasepage() functions. We now never block during gfs2_releasepage(), instead we always either release or refuse to release depending on the status of the buffers. This fixes Red Hat bugzillas #248969 and #252392. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
2007-08-14[GFS2] Fix incorrect error path in prepare_write()Steven Whitehouse
The error path in prepare_write() was incorrect in the (very rare) event that the transaction fails to start. The following prevents a NULL pointer dereference, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-19mm: merge populate and nopage into fault (fixes nonlinear)Nick Piggin
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-09[GFS2] Use zero_user_page() in stuffed_readpage()Steven Whitehouse
As suggested by Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Robert P. J. Day <rpjday@mindspring.com>
2007-07-09[GFS2] Journaled file write/unstuff bugRobert Peterson
This patch is for bugzilla bug 283162, which uncovered a number of bugs pertaining to writing to files that have the journaled bit on. These bugs happen most often when writing to the meta_fs because the files are always journaled. So operations like gfs2_grow were particularly vulnerable, although many of the problems could be recreated with normal files after setting the journaled bit on. The problems fixed are: -GFS2 wasn't ever writing unstuffed journaled data blocks to their in-place location on disk. Now it does. -If you unmounted too quickly after doing IO to a journaled file, GFS2 was crashing because you would discard a buffer whose bufdata was still on the active items list. GFS2 now deals with this gracefully. -GFS2 was losing track of the bufdata for journaled data blocks, and it wasn't getting freed, causing an error when you tried to unmount the module. GFS2 now frees all the bufdata structures. -There was a memory corruption occurring because GFS2 wrote twice as many log entries for journaled buffers. -It was occasionally trying to write journal headers in buffers that weren't currently mapped. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[GFS2] fix jdata issuesBenjamin Marzinski
This is a patch for the first three issues of RHBZ #238162 The first issue is that when you allocate a new page for a file, it will not start off uptodate. This makes sense, since you haven't written anything to that part of the file yet. Unfortunately, gfs2_pin() checks to make sure that the buffers are uptodate. The solution to this is to mark the buffers uptodate in gfs2_commit_write(), after they have been zeroed out and have the data written into them. I'm pretty confident with this fix, although it's not completely obvious that there is no problem with marking the buffers uptodate here. The second issue is simply that you can try to pin a data buffer that is already on the incore log, and thus, already pinned. This patch checks to see if this buffer is already on the log, and exits databuf_lo_add() if it is, just like buf_lo_add() does. The third issue is that gfs2_log_flush() doesn't do it's block accounting correctly. Both metadata and journaled data are logged, but gfs2_log_flush() only compares the number of metadata blocks with the number of blocks to commit to the ondisk journal. This patch also counts the journaled data blocks. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[GFS2] Clean up inode number handlingSteven Whitehouse
This patch cleans up the inode number handling code. The main difference is that instead of looking up the inodes using a struct gfs2_inum_host we now use just the no_addr member of this structure. The tests relating to no_formal_ino can then be done by the calling code. This has advantages in that we want to do different things in different code paths if the no_formal_ino doesn't match. In the NFS patch we want to return -ESTALE, but in the ->lookup() path, its a bug in the fs if the no_formal_ino doesn't match and thus we can withdraw in this case. In order to later fix bz #201012, we need to be able to look up an inode without knowing no_formal_ino, as the only information that is known to us is the on-disk location of the inode in question. This patch will also help us to fix bz #236099 at a later date by cleaning up a lot of the code in that area. There are no user visible changes as a result of this patch and there are no changes to the on-disk format either. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[GFS2] Addendum patch 2 for gfs2_growRobert Peterson
This addendum patch 2 corrects three things: 1. It fixes a stupid mistake in the previous addendum that broke gfs2. Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html 2. It fixes a problem that Dave Teigland pointed out regarding the external declarations in ops_address.h being in the wrong place. 3. It recasts a couple more %llu printks to (unsigned long long) as requested by Steve Whitehouse. I would have loved to put this all in one revised patch, but there was a rush to get some patches for RHEL5. Therefore, the previous patches were applied to the git tree "as is" and therefore, I'm posting another addendum. Sorry. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[GFS2] Kernel changes to support new gfs2_grow command (part 2)Robert Peterson
To avoid code redundancy, I separated out the operational "guts" into a new function called read_rindex_entry. Then I made two functions: the closer-to-original gfs2_ri_update (without the special condition checks) and gfs2_ri_update_special that's designed with that condition in mind. (I don't like the name, but if you have a suggestion, I'm all ears). Oh, and there's an added benefit: we don't need all the ugly gotos anymore. ;) This patch has been tested with gfs2_fsck_hellfire (which runs for three and a half hours, btw). Signed-off-By: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-07-09[GFS2] kernel changes to support new gfs2_grow commandRobert Peterson
This is another revision of my gfs2 kernel patch that allows gfs2_grow to function properly. Steve Whitehouse expressed some concerns about the previous patch and I restructured it based on his comments. The previous patch was doing the statfs_change at file close time, under its own transaction. The current patch does the statfs_change inside the gfs2_commit_write function, which keeps it under the umbrella of the inode transaction. I can't call ri_update to re-read the rindex file during the transaction because the transaction may have outstanding unwritten buffers attached to the rgrps that would be otherwise blown away. So instead, I created a new function, gfs2_ri_total, that will re-read the rindex file just to total the file system space for the sake of the statfs_change. The ri_update will happen later, when gfs2 realizes the version number has changed, as it happened before my patch. Since the statfs_change is happening at write_commit time and there may be multiple writes to the rindex file for one grow operation. So one consequence of this restructuring is that instead of getting one kernel message to indicate the change, you may see several. For example, before when you did a gfs2_grow, you'd get a single message like: GFS2: File system extended by 247876 blocks (968MB) Now you get something like: GFS2: File system extended by 207896 blocks (812MB) GFS2: File system extended by 39980 blocks (156MB) This version has also been successfully run against the hours-long "gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck while interjecting file system damage. It does this repeatedly under a variety Resource Group conditions. Signed-off-By: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01[GFS2] Patch to fix mmap of stuffed filesSteven Whitehouse
If a stuffed file is mmaped and a page fault is generated at some offset above the initial page, we need to create a zero page to hang the buffer heads off before we can unstuff the file. This is a fix for bz #236087 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-05-01[GFS2] Fix bz 231380, unlock page before dequeing glocks in gfs2_commit_writeJosef Whiter
If we are writing a file, and in the middle of writing the file another node attempts to get a shared lock on that file (by doing a du for example) the process doing the writing will hang waiting on lock_page. The reason for this is because when we have waiters on a exclusive glock, we will go through and flush out all dirty pages associated with that inode and release the lock. The problem is that when we flush the dirty pages, we could hit a page that we have locked durring the generic_file_buffered_write part of this operation. This patch unlocks the page before we go to dequeue the lock and locks it immediatly afterwards, since generic_file_buffered_write needs the page locked when the commit_write is completed. This patch resolves the problem, however if somebody sees a better way to do this please don't hesistate to yell. Signed-off-by: Josef Whiter <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-03-07[GFS2] fix hangup when multiple processes are trying to write to the same fileJosef Whiter
This fixes a problem I encountered while running bonnie++. When you have one thread that opens a file and starts to write to it, and then another thread that tries to open and write to the same file, the second thread will loop forever trying to grab the inode lock for that inode. Basically we come in through generic_buffered_file_write, which calls gfs2_prepare_write, which then attempts to grab the glock. Because we don't own the lock, gfs2_prepare_write gets GLR_TRYFAILED, which returns AOP_TRUNCATED_PAGE to generic_buffered_file_write. At this point generic_buffered_file_write loops around again and immediately retries the prepare_write. This means that the second process never gets off of the processor in order to allow the process that holds the lock to finish its work and let go of the lock. This patch makes gfs2_glock_nq schedule() if it gets back a GLR_TRYFAILED, which resolves this problem. Signed-off-by: Josef Whiter <jwhiter@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>