aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2009-12-11Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: Fix error return for fallocate() on XFS xfs: cleanup dmapi macros in the umount path xfs: remove incorrect sparse annotation for xfs_iget_cache_miss xfs: kill the STATIC_INLINE macro xfs: uninline xfs_get_extsz_hint xfs: rename xfs_attr_fetch to xfs_attr_get_int xfs: simplify xfs_buf_get / xfs_buf_read interfaces xfs: remove IO_ISAIO xfs: Wrapped journal record corruption on read at recovery xfs: cleanup data end I/O handlers xfs: use WRITE_SYNC_PLUG for synchronous writeout xfs: reset the i_iolock lock class in the reclaim path xfs: I/O completion handlers must use NOFS allocations xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocks xfs: simplify inode teardown
2009-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (27 commits) Driver core: fix race in dev_driver_string Driver Core: Early platform driver buffer sysfs: sysfs_setattr remove unnecessary permission check. sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dir sysfs: Propagate renames to the vfs on demand sysfs: Gut sysfs_addrm_start and sysfs_addrm_finish sysfs: In sysfs_chmod_file lazily propagate the mode change. sysfs: Implement sysfs_getattr & sysfs_permission sysfs: Nicely indent sysfs_symlink_inode_operations sysfs: Update s_iattr on link and unlink. sysfs: Fix locking and factor out sysfs_sd_setattr sysfs: Simplify iattr time assignments sysfs: Simplify sysfs_chmod_file semantics sysfs: Use dentry_ops instead of directly playing with the dcache sysfs: Rename sysfs_d_iput to sysfs_dentry_iput sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutex debugfs: fix create mutex racy fops and private data Driver core: Don't remove kobjects in device_shutdown. firmware_class: make request_firmware_nowait more useful Driver-Core: devtmpfs - set root directory mode to 0755 ...
2009-12-11xfs: Fix error return for fallocate() on XFSJason Gunthorpe
Noticed that through glibc fallocate would return 28 rather than -1 and errno = 28 for ENOSPC. The xfs routines uses XFS_ERROR format positive return error codes while the syscalls use negative return codes. Fixup the two cases in xfs_vn_fallocate syscall to convert to negative. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: cleanup dmapi macros in the umount pathChristoph Hellwig
Stop the flag saving as we never mangle those in the unmount path, and hide all the weird arguents to the dmapi code inside the XFS_SEND_PREUNMOUNT / XFS_SEND_UNMOUNT macros. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: remove incorrect sparse annotation for xfs_iget_cache_missChristoph Hellwig
xfs_iget_cache_miss does not get called with the pag_ici_lock held, so the __releases annotation is incorrect. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: kill the STATIC_INLINE macroChristoph Hellwig
Remove our own STATIC_INLINE macro. For small function inside implementation files just use STATIC and let gcc inline it, and for those in headers do the normal static inline - they are all small enough to be inlined for debug builds, too. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: uninline xfs_get_extsz_hintChristoph Hellwig
This function is too large to efficiently be inlined. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: rename xfs_attr_fetch to xfs_attr_get_intChristoph Hellwig
Using a totally different name for the low-level get operation does not fit the _int convention used in the rest of the attr code, so rename it. While we're at it also fix the prototype to use the normal convention and mark it static as it's never used outside of xfs_attr.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: simplify xfs_buf_get / xfs_buf_read interfacesChristoph Hellwig
Currently the low-level buffer cache interfaces are highly confusing as we have a _flags variant of each that does actually respect the flags, and one without _flags which has a flags argument that gets ignored and overriden with a default set. Given that very few places use the default arguments get rid of the duplication and convert all callers to pass the flags explicitly. Also remove the now confusing _flags postfix. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: remove IO_ISAIOChristoph Hellwig
We set the IO_ISAIO flag for all read/write I/O since early Linux 2.6.x. Remove it as it has lost it's purpose long ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: Wrapped journal record corruption on read at recoveryAndy Poling
Summary of problem: If a journal record wraps at the physical end of the journal, it has to be read in two parts in xlog_do_recovery_pass(): a read at the physical end and a read at the physical beginning. If xlog_bread() has to re-align the first read, the second read request does not take that re-alignment into account. If the first read was re-aligned, the second read over-writes the end of the data from the first read, effectively corrupting it. This can happen either when reading the record header or reading the record data. The first sanity check in xlog_recover_process_data() is to check for a valid clientid, so that is the error reported. Summary of fix: If there was a first read at the physical end, XFS_BUF_PTR() returns where the data was requested to begin. Conversely, because it is the result of xlog_align(), offset indicates where the requested data for the first read actually begins - whether or not xlog_bread() has re-aligned it. Using offset as the base for the calculation of where to place the second read data ensures that it will be correctly placed immediately following the data from the first read instead of sometimes over-writing the end of it. The attached patch has resolved the reported problem of occasional inability to recover the journal (reporting "bad clientid"). Signed-off-by: Andy Poling <andy@realbig.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: cleanup data end I/O handlersChristoph Hellwig
Currently we have different end I/O handlers for read vs the different types of write I/O. But they are all very similar so we could just use one with a few conditionals and reduce code size a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: use WRITE_SYNC_PLUG for synchronous writeoutChristoph Hellwig
The VM and I/O schedulers now expect us to use WRITE_SYNC_PLUG for synchronous writeout. Right now I can't see any changes in performance numbers with this, but we're getting some beating for not using it, and the knowledge definitely could help the block code to make better decisions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: reset the i_iolock lock class in the reclaim pathChristoph Hellwig
The iolock is used for protecting reads, writes and block truncates against each other. We have two classes of callers, the first one is induced by a file operation and requires a reference to the inode be held and not dropped after the operation is done: - xfs_vm_vmap, xfs_vn_fallocate, xfs_read, xfs_write, xfs_splice_read, xfs_splice_write and xfs_setattr are all implementations of VFS methods that require a live inode - xfs_getbmap and xfs_swap_extents are ioctl subcommand for which the same is true - xfs_truncate_file is only called on quota inodes just returned from xfs_iget - xfs_sync_inode_data does the lock just after an igrab() - xfs_filestream_associate and xfs_filestream_new_ag take the iolock on the parent inode of an inode which by VFS rules must be referenced And we have various calls to truncate blocks past EOF or the whole file when dropping the last reference to an inode. Unfortunately lockdep complains when we do memory allocations that can recurse into the filesystem in the first class because the second class happens to take the same lock. To avoid this re-init the iolock in the beginning of xfs_fs_clear_inode to get a new lock class. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: I/O completion handlers must use NOFS allocationsChristoph Hellwig
When completing I/O requests we must not allow the memory allocator to recurse into the filesystem, as we might deadlock on waiting for the I/O completion otherwise. The only thing currently allocating normal GFP_KERNEL memory is the allocation of the transaction structure for the unwritten extent conversion. Add a memflags argument to _xfs_trans_alloc to allow controlling the allocator behaviour. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Thomas Neumann <tneumann@users.sourceforge.net> Tested-by: Thomas Neumann <tneumann@users.sourceforge.net> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: fix mmap_sem/iolock inversion in xfs_free_eofblocksChristoph Hellwig
When xfs_free_eofblocks is called from ->release the VM might already hold the mmap_sem, but in the write path we take the iolock before taking the mmap_sem in the generic write code. Switch xfs_free_eofblocks to only trylock the iolock if called from ->release and skip trimming the prellocated blocks in that case. We'll still free them later on the final iput. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11xfs: simplify inode teardownChristoph Hellwig
Currently the reclaim code for the case where we don't reclaim the final reclaim is overly complicated. We know that the inode is clean but instead of just directly reclaiming the clean inode we go through the whole process of marking the inode reclaimable just to directly reclaim it from the calling context. Besides being overly complicated this introduces a race where iget could recycle an inode between marked reclaimable and actually being reclaimed leading to panics. This patch gets rid of the existing reclaim path, and replaces it with a simple call to xfs_ireclaim if the inode was clean. While we're at it we also use the slightly more lax xfs_inode_clean check we'd use later to determine if we need to flush the inode here. Finally get rid of xfs_reclaim function and place the remaining small bits of reclaim code directly into xfs_fs_destroy_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Patrick Schreurs <patrick@news-service.com> Reported-by: Tommy van Leeuwen <tommy@news-service.com> Tested-by: Patrick Schreurs <patrick@news-service.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2009-12-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (49 commits) nilfs2: separate wait function from nilfs_segctor_write nilfs2: add iterator for segment buffers nilfs2: hide nilfs_write_info struct in segment buffer code nilfs2: relocate io status variables to segment buffer nilfs2: do not return io error for bio allocation failure nilfs2: use list_splice_tail or list_splice_tail_init nilfs2: replace mark_inode_dirty as nilfs_mark_inode_dirty nilfs2: delete mark_inode_dirty in nilfs_delete_entry nilfs2: delete mark_inode_dirty in nilfs_commit_chunk nilfs2: change return type of nilfs_commit_chunk nilfs2: split nilfs_unlink as nilfs_do_unlink and nilfs_unlink nilfs2: delete redundant mark_inode_dirty nilfs2: expand inode_inc_link_count and inode_dec_link_count nilfs2: delete mark_inode_dirty from nilfs_set_link nilfs2: delete mark_inode_dirty in nilfs_new_inode nilfs2: add norecovery mount option nilfs2: add helper to get if volume is in a valid state nilfs2: move recovery completion into load_nilfs function nilfs2: apply readahead for recovery on mount nilfs2: clean up get/put function of a segment usage ...
2009-12-11sysfs: sysfs_setattr remove unnecessary permission check.Eric W. Biederman
inode_change_ok already clears the SGID bit when necessary so there is no reason for sysfs_setattr to carry code to do the same, and it is good to kill the extra copy because when I moved the code last in certain corner cases the code will look at the wrong gid. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Factor out sysfs_rename from sysfs_rename_dir and sysfs_move_dirEric W. Biederman
These two functions do 90% of the same work and it doesn't significantly obfuscate the function to allow both the parent dir and the name to change at the same time. So merge them together to simplify maintenance, and increase testing. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Propagate renames to the vfs on demandEric W. Biederman
By teaching sysfs_revalidate to hide a dentry for a sysfs_dirent if the sysfs_dirent has been renamed, and by teaching sysfs_lookup to return the original dentry if the sysfs dirent has been renamed. I can show the results of renames correctly without having to update the dcache during the directory rename. This massively simplifies the rename logic allowing a lot of weird sysfs special cases to be removed along with a lot of now unnecesary helper code. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Gut sysfs_addrm_start and sysfs_addrm_finishEric W. Biederman
With lazy inode updates and dentry operations bringing everything into sync on demand there is no longer any need to immediately update the vfs or grab i_mutex to protect those updates as we make changes to sysfs. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: In sysfs_chmod_file lazily propagate the mode change.Eric W. Biederman
Now that sysfs_getattr and sysfs_permission refresh the vfs inode there is no need to immediatly push the mode change into the vfs cache. Reducing the amount of work needed and simplifying the locking. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Implement sysfs_getattr & sysfs_permissionEric W. Biederman
With the implementation of sysfs_getattr and sysfs_permission sysfs becomes able to lazily propogate inode attribute changes from the sysfs_dirents to the vfs inodes. This paves the way for deleting significant chunks of now unnecessary code. While doing this we did not reference sysfs_setattr from sysfs_symlink_inode_operations so I added along with sysfs_getattr and sysfs_permission. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Nicely indent sysfs_symlink_inode_operationsEric W. Biederman
Lining up the functions in sysfs_symlink_inode_operations follows the pattern in the rest of sysfs and makes things slightly more readable. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Update s_iattr on link and unlink.Eric W. Biederman
Currently sysfs updates the timestamps on the vfs directory inode when we create or remove a directory entry but doesn't update the cached copy on the sysfs_dirent, fix that oversight. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Fix locking and factor out sysfs_sd_setattrEric W. Biederman
Cleanly separate the work that is specific to setting the attributes of a sysfs_dirent from what is needed to update the attributes of a vfs inode. Additionally grab the sysfs_mutex to keep any nasties from surprising us when updating the sysfs_dirent. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Simplify iattr time assignmentsEric W. Biederman
The granularity of sysfs time when we keep it is 1 ns. Which when passed to timestamp_trunc results in a nop. So remove the unnecessary function call making sysfs_setattr slightly easier to read. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Simplify sysfs_chmod_file semanticsEric W. Biederman
Currently every caller of sysfs_chmod_file happens at either file creation time to set a non-default mode or in response to a specific user requested space change in policy. Making timestamps of when the chmod happens and notification of a file changing mode uninteresting. Remove the unnecessary time stamp and filesystem change notification, and removes the last of the explicit inotify and donitfy support from sysfs. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Use dentry_ops instead of directly playing with the dcacheEric W. Biederman
Calling d_drop unconditionally when a sysfs_dirent is deleted has the potential to leak mounts, so instead implement dentry delete and revalidate operations that cause sysfs dentries to be removed at the appropriate time. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Rename sysfs_d_iput to sysfs_dentry_iputEric W. Biederman
Using dentry instead of d in the function name is what several other filesystems are doing and it seems to be a more readable convention. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: Update sysfs_setxattr so it updates secdata under the sysfs_mutexEric W. Biederman
The sysfs_mutex is required to ensure updates are and will remain atomic with respect to other inode iattr updates, that do not happen through the filesystem. Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11debugfs: fix create mutex racy fops and private dataMathieu Desnoyers
Setting fops and private data outside of the mutex at debugfs file creation introduces a race where the files can be opened with the wrong file operations and private data. It is easy to trigger with a process waiting on file creation notification. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-11sysfs: mark a locally-only used function staticStefan Richter
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: David P. Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-12-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: always use GFP_NOFS
2009-12-10Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (47 commits) ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem) ext4: Do not override ext2 or ext3 if built they are built as modules jbd2: Export jbd2_log_start_commit to fix ext4 build ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT ext4: Wait for proper transaction commit on fsync ext4: fix incorrect block reservation on quota transfer. ext4: quota macros cleanup ext4: ext4_get_reserved_space() must return bytes instead of blocks ext4: remove blocks from inode prealloc list on failure ext4: wait for log to commit when umounting ext4: Avoid data / filesystem corruption when write fails to copy data ext4: Use ext4 file system driver for ext2/ext3 file system mounts ext4: Return the PTR_ERR of the correct pointer in setup_new_group_blocks() jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() ext4: remove unused parameter wbc from __ext4_journalled_writepage() ext4: remove encountered_congestion trace ext4: move_extent_per_page() cleanup ext4: initialize moved_len before calling ext4_move_extents() ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT ext4: use ext4_data_block_valid() in ext4_free_blocks() ...
2009-12-10Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds
* 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Multi-device mirror support exofs: Move all operations to an io_engine exofs: move osd.c to ios.c exofs: statfs blocks is sectors not FS blocks exofs: Prints on mount and unmout exofs: refactor exofs_i_info initialization into common helper exofs: dbg-print less exofs: More sane debug print trivial: some small fixes in exofs documentation
2009-12-10Merge git://git.infradead.org/ubifs-2.6Linus Torvalds
* git://git.infradead.org/ubifs-2.6: UBIFS: fix return code in check_leaf UBI: flush wl before clearing update marker MAINTAINERS: change e-mail of Artem Bityutskiy UBIFS: remove manual O_SYNC handling UBIFS: support mounting of UBI volume character devices UBI: Add ubi_open_volume_path
2009-12-10exofs: Multi-device mirror supportBoaz Harrosh
This patch changes on-disk format, it is accompanied with a parallel patch to mkfs.exofs that enables multi-device capabilities. After this patch, old exofs will refuse to mount a new formatted FS and new exofs will refuse an old format. This is done by moving the magic field offset inside the FSCB. A new FSCB *version* field was added. In the future, exofs will refuse to mount unmatched FSCB version. To up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option before mounting. Introduced, a new object that contains a *device-table*. This object contains the default *data-map* and a linear array of devices information, which identifies the devices used in the filesystem. This object is only written to offline by mkfs.exofs. This is why it is kept separate from the FSCB, since the later is written to while mounted. Same partition number, same object number is used on all devices only the device varies. * define the new format, then load the device table on mount time make sure every thing is supported. * Change I/O engine to now support Mirror IO, .i.e write same data to multiple devices, read from a random device to spread the read-load from multiple clients (TODO: stripe read) Implementation notes: A few points introduced in previous patch should be mentioned here: * Special care was made so absolutlly all operation that have any chance of failing are done before any osd-request is executed. This is to minimize the need for a data consistency recovery, to only real IO errors. * Each IO state has a kref. It starts at 1, any osd-request executed will increment the kref, finally when all are executed the first ref is dropped. At IO-done, each request completion decrements the kref, the last one to return executes the internal _last_io() routine. _last_io() will call the registered io_state_done. On sync mode a caller does not supply a done method, indicating a synchronous request, the caller is put to sleep and a special io_state_done is registered that will awaken the caller. Though also in sync mode all operations are executed in parallel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: Move all operations to an io_engineBoaz Harrosh
In anticipation for multi-device operations, we separate osd operations into an abstract I/O API. Currently only one device is used but later when adding more devices, we will drive all devices in parallel according to a "data_map" that describes how data is arranged on multiple devices. The file system level operates, like before, as if there is one object (inode-number) and an i_size. The io engine will split this to the same object-number but on multiple device. At first we introduce Mirror (raid 1) layout. But at the final outcome we intend to fully implement the pNFS-Objects data-map, including raid 0,4,5,6 over mirrored devices, over multiple device-groups. And more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12 * Define an io_state based API for accessing osd storage devices in an abstract way. Usage: First a caller allocates an io state with: exofs_get_io_state(struct exofs_sb_info *sbi, struct exofs_io_state** ios); Then calles one of: exofs_sbi_create(struct exofs_io_state *ios); exofs_sbi_remove(struct exofs_io_state *ios); exofs_sbi_write(struct exofs_io_state *ios); exofs_sbi_read(struct exofs_io_state *ios); exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len); And when done exofs_put_io_state(struct exofs_io_state *ios); * Convert all source files to use this new API * Convert from bio_alloc to bio_kmalloc * In io engine we make use of the now fixed osd_req_decode_sense There are no functional changes or on disk additions after this patch. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: move osd.c to ios.cBoaz Harrosh
If I do a "git mv" together with a massive code change and commit in one patch, git looses the rename and records a delete/new instead. This is bad because I want a rename recorded so later rebased/cherry-picked patches to the old name will work. Also the --follow is lost. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: statfs blocks is sectors not FS blocksBoaz Harrosh
Even though exofs has a 4k block size, statfs blocks is in sectors (512 bytes). Also if target returns 0 for capacity then make it ULLONG_MAX. df does not like zero-size filesystems Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: Prints on mount and unmoutBoaz Harrosh
It is important to print in the logs when a filesystem was mounted and eventually unmounted. Print the osd-device's osd_name and pid the FS was mounted/unmounted on. TODO: How to also print the namespace path the filesystem was mounted on? Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: refactor exofs_i_info initialization into common helperBoaz Harrosh
There are two places that initialize inodes: exofs_iget() and exofs_new_inode() As more members of exofs_i_info that need initialization are added this code will grow. (soon) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: dbg-print lessBoaz Harrosh
Iner-loops printing is converted to EXOFS_DBG2 which is #defined to nothing. It is now almost bareable to just leave debug-on. Every operation is printed once, with most relevant info (I hope). Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-10exofs: More sane debug printBoaz Harrosh
debug prints should be somewhat useful without actually reading the source code Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2009-12-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...
2009-12-09ext4: Fix potential fiemap deadlock (mmap_sem vs. i_data_sem)Theodore Ts'o
Fix the following potential circular locking dependency between mm->mmap_sem and ei->i_data_sem: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-04115-gec044c5 #37 ------------------------------------------------------- ureadahead/1855 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81107224>] might_fault+0x5c/0xac but task is already holding lock: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem){++++..}: [<ffffffff81099bfa>] __lock_acquire+0xb67/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81516633>] down_read+0x51/0x84 [<ffffffff811a2414>] ext4_get_blocks+0x50/0x2a5 [<ffffffff811a3453>] ext4_get_block+0xab/0xef [<ffffffff81154f39>] do_mpage_readpage+0x198/0x48d [<ffffffff81155360>] mpage_readpages+0xd0/0x114 [<ffffffff811a104b>] ext4_readpages+0x1d/0x1f [<ffffffff810f8644>] __do_page_cache_readahead+0x12f/0x1bc [<ffffffff810f86f2>] ra_submit+0x21/0x25 [<ffffffff810f0cfd>] filemap_fault+0x19f/0x32c [<ffffffff81107b97>] __do_fault+0x55/0x3a2 [<ffffffff81109db0>] handle_mm_fault+0x327/0x734 [<ffffffff8151aaa9>] do_page_fault+0x292/0x2aa [<ffffffff81518205>] page_fault+0x25/0x30 [<ffffffff812a34d8>] clear_user+0x38/0x3c [<ffffffff81167e16>] padzero+0x20/0x31 [<ffffffff81168b47>] load_elf_binary+0x8bc/0x17ed [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff81166d64>] load_script+0x1b8/0x1cc [<ffffffff81130e95>] search_binary_handler+0xc2/0x259 [<ffffffff8113255f>] do_execve+0x1ce/0x2cf [<ffffffff81027494>] sys_execve+0x43/0x5a [<ffffffff8102918a>] stub_execve+0x6a/0xc0 -> #0 (&mm->mmap_sem){++++++}: [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 1 lock held by ureadahead/1855: #0: (&ei->i_data_sem){++++..}, at: [<ffffffff811be1fd>] ext4_fiemap+0x11b/0x159 stack backtrace: Pid: 1855, comm: ureadahead Not tainted 2.6.32-04115-gec044c5 #37 Call Trace: [<ffffffff81098c70>] print_circular_bug+0xa8/0xb7 [<ffffffff81099aa4>] __lock_acquire+0xa11/0xd0f [<ffffffff8102f229>] ? sched_clock+0x9/0xd [<ffffffff81099e7e>] lock_acquire+0xdc/0x102 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81107251>] might_fault+0x89/0xac [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff81124b44>] ? __kmalloc+0x13b/0x18c [<ffffffff81139382>] fiemap_fill_next_extent+0x95/0xda [<ffffffff811bcb43>] ext4_ext_fiemap_cb+0x138/0x157 [<ffffffff811bca0b>] ? ext4_ext_fiemap_cb+0x0/0x157 [<ffffffff811be069>] ext4_ext_walk_space+0x178/0x1f1 [<ffffffff811be21e>] ext4_fiemap+0x13c/0x159 [<ffffffff81107224>] ? might_fault+0x5c/0xac [<ffffffff811390e6>] do_vfs_ioctl+0x348/0x4d6 [<ffffffff8129f6d0>] ? __up_read+0x8d/0x95 [<ffffffff81517fb5>] ? retint_swapgs+0x13/0x1b [<ffffffff811392ca>] sys_ioctl+0x56/0x79 [<ffffffff81028cb2>] system_call_fastpath+0x16/0x1b Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-09ext4: Do not override ext2 or ext3 if built they are built as modulesTheodore Ts'o
The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the ext2 or ext3 file systems if the those file system drivers are configured to be built as mdoules. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-12-09jbd2: Export jbd2_log_start_commit to fix ext4 buildTheodore Ts'o
This fixes: ERROR: "jbd2_log_start_commit" [fs/ext4/ext4.ko] undefined! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>