Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2008-07-16	[PATCH] ocfs2: fix oops in mmap_truncate testing	Coly Li
	This patch fixes a mmap_truncate bug which was found by ocfs2 test suite. In an ocfs2 cluster more than 1 node, run program mmap_truncate, which races mmap writes and truncates from multiple processes. While the test is running, a stat from another node forces writeout, causing an oops in ocfs2_get_block() because it sees a buffer to write which isn't allocated. This patch fixed the bug by clear dirty and uptodate bits in buffer, leave the buffer unmapped and return. Fix is suggested by Mark Fasheh, and I code up the patch. Signed-off-by: Coly Li <coyli@suse.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-04-18	fs/ocfs2/aops.c: test for IS_ERR rather than 0	Julia Lawall
	The function ocfs2_start_trans always returns either a valid pointer or a value made with ERR_PTR, so its result should be tested with IS_ERR, not with a test for 0. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2008-03-03	[PATCH] fs/ocfs2/aops.c: Correct use of ! and &	Julia Lawall
	In commit e6bafba5b4765a5a252f1b8d31cbf6d2459da337, a bug was fixed that involved converting !x & y to !(x & y). The code below shows the same pattern, and thus should perhaps be fixed in the same way. This is not tested and clearly changes the semantics, so it is only something to consider. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-02-05	Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user	Christoph Lameter
	Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-25	ocfs2: Safer read_inline_data()	Jan Kara
	In ocfs2_read_inline_data() we should store file size in loff_t. Although the file size should fit in 32 bits we cannot be sure in case filesystem is corrupted. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25	ocfs2: Readpages support	Mark Fasheh
	Add ->readpages support to Ocfs2. This is rather trivial - all it required is a small update to ocfs2_get_block (for mapping full extents via b_size) and an ocfs2_readpages() function which partially mirrors ocfs2_readpage(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25	ocfs2: Rename ocfs2_meta_[un]lock	Mark Fasheh
	Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2008-01-25	ocfs2: Remove data locks	Mark Fasheh
	The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-27	ocfs2: Fix comparison in ocfs2_size_fits_inline_data()	Mark Fasheh
	This was causing us to prematurely push out inline data by one byte. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-11-06	ocfs2: fix write() performance regression	Mark Fasheh
	On file systems which don't support sparse files, Ocfs2_map_page_blocks() was reading blocks on appending writes. This caused write performance to suffer dramatically. Fix this by detecting an appending write on a nonsparse fs and skipping the read. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-10-16	ocfs2: convert to new aops	Nick Piggin
	Plug ocfs2 into the ->write_begin and ->write_end aops. A bunch of custom code is now gone - the iovec iteration stuff during write and the ocfs2 splice write actor. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-12	ocfs2: Write support for inline data	Mark Fasheh
	This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline inode data. For the most part, the changes to the core write code can be relied on to do the heavy lifting. Any code calling ocfs2_write_begin (including shared writeable mmap) can count on it doing the right thing with respect to growing inline data to an extent tree. Size reducing truncates, including UNRESVP can simply zero that portion of the inode block being removed. Size increasing truncatesm, including RESVP have to be a little bit smarter and grow the inode to an extent tree if necessary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12	ocfs2: Read support for inline data	Mark Fasheh
	This hooks up ocfs2_readpage() to populate a page with data from an inode block. Direct IO reads from inline data are modified to fall back to buffered I/O. Appropriate checks are also placed in the extent map code to avoid reading an extent list when inline data might be stored. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12	ocfs2: Small refactor of truncate zeroing code	Mark Fasheh
	We'll want to reuse most of this when pushing inline data back out to an extent. Keeping this part as a seperate patch helps to keep the upcoming changes for write support uncluttered. The core portion of ocfs2_zero_cluster_pages() responsible for making sure a page is mapped and properly dirtied is abstracted out into it's own function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change, though zeroing becomes optional. We also turn part of ocfs2_free_write_ctxt() into a common function for unlocking and freeing a page array. This operation is very common (and uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense to keep the code in one place. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-10-12	ocfs2: move nonsparse hole-filling into ocfs2_write_begin()	Mark Fasheh
	By doing this, we can remove any higher level logic which has to have knowledge of btree functionality - any callers of ocfs2_write_begin() can now expect it to do anything necessary to prepare the inode for new data. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
2007-09-20	ocfs2: Don't double set write parameters	Mark Fasheh
	The target page offsets were being incorrectly set a second time in ocfs2_prepare_page_for_write(), which was causing problems on a 16k page size kernel. Additionally, ocfs2_write_failure() was incorrectly using those parameters instead of the parameters for the individual page being cleaned up. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-20	ocfs2: Fix pos/len passed to ocfs2_write_cluster	Mark Fasheh
	This was broken for file systems whose cluster size is greater than page size. Pos needs to be incremented as we loop through the descriptors, and len needs to be capped to the size of a single cluster. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-09-11	[PATCH] ocfs2: Fix a wrong cluster calculation.	tao.ma@oracle.com
	In ocfs2_alloc_write_write_ctxt, the written clusters length is calculated by the byte length only. This may cause some problems if we start to write at some position in the end of one cluster and last to a second cluster while the "len" is smaller than a cluster size. In that case, we have to write 2 clusters actually. So we have to take the start position into consideration also. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-19	mm: merge populate and nopage into fault (fixes nonlinear)	Nick Piggin
	Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-10	[PATCH] ocfs2: zero_user_page conversion	Eric Sandeen
	Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10	ocfs2: Support creation of unwritten extents	Mark Fasheh
	This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10	ocfs2: support writing of unwritten extents	Mark Fasheh
	Update the write code to detect when the user is asking to write to an unwritten extent. Like writing to a hole, we must zero the region between the write and the cluster boundaries. Most of the existing cluster zeroing logic can be re-used with some additional checks for the unwritten flag on extent records. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10	ocfs2: small cleanup of ocfs2_write_begin_nolock()	Mark Fasheh
	We can easily seperate out the write descriptor setup and manipulation into helper functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10	ocfs2: plug truncate into cached dealloc routines	Mark Fasheh
	Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10	ocfs2: harden buffer check during mapping of page blocks	Mark Fasheh
	We don't want to submit buffer_new blocks for read i/o. This actually won't happen right now because those requests during an allocating write are all nicely aligned. It's probably a good idea to provide an explicit check though. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10	ocfs2: shared writeable mmap	Mark Fasheh
	Implement cluster consistent shared writeable mappings using the ->page_mkwrite() callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10	ocfs2: factor out write aops into nolock variants	Mark Fasheh
	ocfs2_mkwrite() will want this so that it can add some mmap specific checks before asking for a write. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10	ocfs2: rework ocfs2_buffered_write_cluster()	Mark Fasheh
	Use some ideas from the new-aops patch series and turn ocfs2_buffered_write_cluster() into a 2 stage operation with the caller copying data in between. The code now understands multiple cluster writes as a result of having to deal with a full page write for greater than 4k pages. This sets us up to easily call into the write path during ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-06-06	ocfs2: Fix invalid assertion during write on 64k pages	Mark Fasheh
	The write path code intends to bug if a math error (or unhandled case) results in a write outside of the current cluster boundaries. The actual BUG_ON() statements however are incorrect, leading to a crash on kernels with 64k page size. Fix those by checking against the right variables. Also, move the assertions higher up within the functions so that they trip before the code starts to mark buffers. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25	[PATCH] ocfs2: use zero_user_page	Nate Diller
	Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-25	ocfs2: trylock in ocfs2_readpage()	Mark Fasheh
	Similarly to the page lock / cluster lock inversion in ocfs2_readpage, we can deadlock on ip_alloc_sem. We can down_read_trylock() instead and just return AOP_TRUNCATED_PAGE if the operation fails. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02	ocfs2: Force use of GFP_NOFS in ocfs2_write()	Mark Fasheh
	We can otherwise recurse into the file system. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02	ocfs2: fix sparse warnings in fs/ocfs2	Mark Fasheh
	None of these are actually harmful, but the noise makes looking for real problems difficult. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-05-02	[PATCH] fs/ocfs2/: make 3 functions static	Adrian Bunk
	This patch makes the following needlessly global functions static: - aops.c: ocfs2_write_data_page() - dlmglue.c: ocfs2_dump_meta_lvb_info() - file.c: ocfs2_set_inode_size() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: Remember rw lock level during direct io	Mark Fasheh
	Cluster locking might have been redone because a direct write won't complete, so this needs to be reflected in the iocb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: Fix up i_blocks calculation to know about holes	Mark Fasheh
	Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: Fix extent lookup to return true size of holes	Mark Fasheh
	Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: Read from an unwritten extent returns zeros	Mark Fasheh
	Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: Use own splice write actor	Mark Fasheh
	We need to fill holes during a splice write. Provide our own splice write actor which can call ocfs2_file_buffered_write() with a splice-specific callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: zero tail of sparse files on truncate	Mark Fasheh
	Since we don't zero on extend anymore, truncate needs to be fixed up to zero the part of a file between i_size and and end of it's cluster. Otherwise a subsequent extend could expose bad data. This introduced a new helper, which can be used in ocfs2_write(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: Teach ocfs2_get_block() about holes	Mark Fasheh
	ocfs2_get_block() didn't understand sparse files, fix that. Also remove some code that isn't really useful anymore. We can fix up ocfs2_direct_IO_get_blocks() at the same time. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write()	Mark Fasheh
	These are no longer used, and can't handle file systems with sparse file allocation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: teach ocfs2_file_aio_write() about sparse files	Mark Fasheh
	Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock() because allocating writes will require zeroing of pages adjacent to the I/O for cluster sizes greater than page size. Implement a custom file write here, which can order page locks for zeroing. This also has the advantage that cluster locks can easily be ordered outside of the page locks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-04-26	ocfs2: temporarily remove extent map caching	Mark Fasheh
	The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-03-14	ocfs2: add some missing address space callbacks	Joel Becker
	Under load, OCFS2 would crash in invalidate_inode_pages2_range() because invalidate_complete_page2() was unable to invalidate a page. It would appear that JBD is holding on to the page. ext3 has a specific ->releasepage() handler to cover this case. Steal ext3's ->releasepage(), ->invalidatepage(), and ->migratepage(), as they appear completely appropriate for OCFS2. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-28	ocfs2: Allow direct I/O read past end of file	Mark Fasheh
	ocfs2_direct_IO_get_blocks() was incorrectly returning -EIO for a direct I/O read whose start block was past the end of the file allocation tree. Fix things so that we return a hole instead. do_direct_IO() will then notice that the range start is past eof and return a short read. While there, remove the unused vbo_max variable. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-08	[PATCH] struct path: convert ocfs2	Josef Sipek
	Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-01	ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t	Mark Fasheh
	This is mostly a search and replace as ocfs2_journal_handle is now no more than a container for a handle_t pointer. ocfs2_commit_trans() becomes very straight forward, and we remove some out of date comments / code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01	ocfs2: remove handle argument to ocfs2_start_trans()	Mark Fasheh
	All callers either pass in NULL directly, or a local variable that is already set to NULL. The internals of ocfs2_start_trans() get a nice cleanup as a result. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2006-12-01	ocfs2: pass ocfs2_super * into ocfs2_commit_trans()	Mark Fasheh
	This sets us up to remove handle->journal. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>