aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs
AgeCommit message (Collapse)Author
2008-09-25Btrfs: Add a per-inode csum mutex to avoid races creating csum itemsChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Change find_extent_buffer to use TestSetPageLockedChris Mason
This makes it possible for callers to check for extent_buffers in cache without deadlocking against any btree locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add btree locking to the tree defragmentation codeChris Mason
The online btree defragger is simplified and rewritten to use standard btree searches instead of a walk up / down mechanism. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Replace the transaction work queue with kthreadsChris Mason
This creates one kthread for commits and one kthread for deleting old snapshots. All the work queues are removed. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Add btrfs_end_transaction_throttle to force writers to wait for pending commitsChris Mason
The existing throttle mechanism was often not sufficient to prevent new writers from coming in and making a given transaction run forever. This adds an explicit wait at the end of most operations so they will allow the current transaction to close. There is no wait inside file_write, inode updates, or cow filling, all which have different deadlock possibilities. This is a temporary measure until better asynchronous commit support is added. This code leads to stalls as it waits for data=ordered writeback, and it really needs to be fixed. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Fix snapshot deletion to release the alloc_mutex much more often.Chris Mason
This lowers the impact of snapshot deletion on the rest of the FS. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add a skip_locking parameter to struct path, and make various funcs ↵Chris Mason
honor it Allocations may need to read in block groups from the extent allocation tree, which will require a tree search and take locks on the extent allocation tree. But, those locks might already be held in other places, leading to deadlocks. Since the alloc_mutex serializes everything right now, it is safe to skip the btree locking while caching block groups. A better fix will be to either create a recursive lock or find a way to back off existing locks while caching block groups. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Fix btrfs_next_leaf to check for new items after dropping locksChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Fix btrfs_del_ordered_inode to allow forcing the drop during unlinksChris Mason
This allows us to delete an unlinked inode with dirty pages from the list instead of forcing commit to write these out before deleting the inode. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Drop locks in btrfs_search_slot when reading a tree block.Chris Mason
One lock per btree block can make for significant congestion if everyone has to wait for IO at the high levels of the btree. This drops locks held by a path when doing reads during a tree search. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Replace the big fs_mutex with a collection of other locksChris Mason
Extent alloctions are still protected by a large alloc_mutex. Objectid allocations are covered by a objectid mutex Other btree operations are protected by a lock on individual btree nodes Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Start btree concurrency work.Chris Mason
The allocation trees and the chunk trees are serialized via their own dedicated mutexes. This means allocation location is still not very fine grained. The main FS btree is protected by locks on each block in the btree. Locks are taken top / down, and as processing finishes on a given level of the tree, the lock is released after locking the lower level. The end result of a search is now a path where only the lowest level is locked. Releasing or freeing the path drops any locks held. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add a thread pool just for submit_bioChris Mason
If a bio submission is after a lock holder waiting for the bio on the work queue, it is possible to deadlock. Move the bios into their own pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25BTRFS_IOC_TRANS_START should be privileguedChristoph Hellwig
As mentioned in the comment next to it btrfs_ioctl_trans_start can do bad damage to filesystems and thus should be limited to privilegued users. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: split out ioctl.cChristoph Hellwig
Split the ioctl handling out of inode.c into a file of it's own. Also fix up checkpatch.pl warnings for the moved code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: kerneldoc comments for extent_map.cChristoph Hellwig
Add kerneldoc comments for all exported functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add a mount option to control worker thread pool sizeChris Mason
mount -o thread_pool_size changes the default, which is min(num_cpus + 2, 8). Larger thread pools would make more sense on very large disk arrays. This mount option controls the max size of each thread pool. There are multiple thread pools, so the total worker count will be larger than the mount option. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Worker thread optimizationsChris Mason
This changes the worker thread pool to maintain a list of idle threads, avoiding a complex search for a good thread to wake up. Threads have two states: idle - we try to reuse the last thread used in hopes of improving the batching ratios busy - each time a new work item is added to a busy task, the task is rotated to the end of the line. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add backport for the kthread work on kernels older than 2.6.20Chris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Fix mount -o max_inline=0Chris Mason
max_inline=0 used to force the max_inline size to one sector instead. Now it properly disables inline data items, while still being able to read any that happen to exist on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add async worker threads for pre and post IO checksummingChris Mason
Btrfs has been using workqueues to spread the checksumming load across other CPUs in the system. But, workqueues only schedule work on the same CPU that queued the work, giving them a limited benefit for systems with higher CPU counts. This code adds a generic facility to schedule work with pools of kthreads, and changes the bio submission code to queue bios up. The queueing is important to make sure large numbers of procs on the system don't turn streaming workloads into random workloads by sending IO down concurrently. The end result of all of this is much higher performance (and CPU usage) when doing checksumming on large machines. Two worker pools are created, one for writes and one for endio processing. The two could deadlock if we tried to service both from a single pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25btrfs: allow scanning multiple devices during mountChristoph Hellwig
Allows to specify one or multiple device=/dev/foo options during mount so that ioctls on the control device can be avoided. Especially useful when trying to mount a multi-device setup as root. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25btrfs: sanity mount option parsing and early mount codeChristoph Hellwig
Also adds lots of comments to describe what's going on here. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25btrfs: fix strange indentation in lookup_extent_mappingChristoph Hellwig
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25btrfs: tiny makefile cleanupChristoph Hellwig
use normal kbuild syntax to build acl.o conditinally and remove comment out lines. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: transaction ioctlsSage Weil
These ioctls let a user application hold a transaction open while it performs a series of operations. A final ioctl does a sync on the fs (closing the current transaction). This is the main requirement for Ceph's OSD to be able to keep the data it's storing in a btrfs volume consistent, and AFAICS it works just fine. The application would do something like fd = ::open("some/file", O_RDONLY); ::ioctl(fd, BTRFS_IOC_TRANS_START); /* do a bunch of stuff */ ::ioctl(fd, BTRFS_IOC_TRANS_END); or just ::close(fd); And to ensure it commits to disk, ::ioctl(fd, BTRFS_IOC_SYNC); When a transaction is held open, the trans_handle is attached to the struct file (via private_data) so that it will get cleaned up if the process dies unexpectedly. A held transaction is also ended on fsync() to avoid a deadlock. A misbehaving application could also deliberately hold a transaction open, effectively locking up the FS, so it may make sense to restrict something like this to root or something. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Dislable acl xattr handlersYan
The acl code is not yet complete, and the xattr handlers are causing problems for cp -p on some distros. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: bdi_init and bdi_destroy come with 2.6.23Jan Engelhardt
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25btrfsctl -A error code fixupLinda Knippers
Send the error back to userland if the ioctl fails Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Invalidate dcache entry after creating snapshot andSven Wegener
We need to invalidate an existing dcache entry after creating a new snapshot or subvolume, because a negative dache entry will stop us from accessing the new snapshot or subvolume. --- ctree.h | 23 +++++++++++++++++++++++ inode.c | 4 ++++ transaction.c | 4 ++++ 3 files changed, 31 insertions(+) Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Fix race in running_transaction checksChris Mason
When a new transaction was started, the code would incorrectly set the pointer in fs_info before all the data structures were setup. fsync heavy workloads hit races on the setup of the ordered inode spinlock Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25btrfs delete ordered inode handling fixMingming
Use btrfs_release_file instead of a put_inode call Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Always use the async submission queue for checksummed writesChris Mason
This avoids IO stalls and poorly ordered IO from inline writers mixing in with the async submission queue Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Allocator fix variety packChris Mason
* Force chunk allocation when find_free_extent has to do a full scan * Record the max key at the start of defrag so it doesn't run forever * Block groups might not be contiguous, make a forward search for the next block group in extent-tree.c * Get rid of extra checks for total fs size * Fix relocate_one_reference to avoid relocating the same file data block twice when referenced by an older transaction * Use the open device count when allocating chunks so that we don't try to allocate from devices that don't exist Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Use kzalloc on the fs_devices allocationChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Handle transid == 0 while opening devicesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Enable btree balancing on old kernels againChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Change the congestion functions to meter the number of async submits ↵Chris Mason
as well The async submit workqueue was absorbing too many requests, leading to long stalls where the async submitters were stalling. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Fix corners in writepage and btrfs_truncate_pageChris Mason
The extent_io writepage calls needed an extra check for discarding pages that started on th last byte in the file. btrfs_truncate_page needed checks to make sure the page was still part of the file after reading it, and most importantly, needed to wait for all IO to the page to finish before freeing the corresponding extents on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Fix btrfs_open_devices to deal with changes since the scan ioctlsChris Mason
Devices can change after the scan ioctls are done, and btrfs_open_devices needs to be able to verify them as they are opened and used by the FS. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add mount -o degraded to allow mounts to continue with missing devicesChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Handle write errors on raid1 and raid10Chris Mason
When duplicate copies exist, writes are allowed to fail to one of those copies. This changeset includes a few changes that allow the FS to continue even when some IOs fail. It also adds verification of the parent generation number for btree blocks. This generation is stored in the pointer to a block, and it ensures that missed writes to are detected. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Pass down the expected generation number when reading tree blocksChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Don't do btree balance_dirty_pages on old kernels, it stalls foreverChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Chunk relocation fine tuning, and add a few printks to show progressChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: A number of nodatacow fixesChris Mason
Once part of a delalloc request fails the cow checks, just cow the entire range It is possible for the back references to all be from the same root, but still have snapshots against an extent. The checks are now more strict, forcing cow any time there are multiple refs against the data extent. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Only open block devices once during mount -o subvol=Chris Mason
btrfs_open_devices needed a check to see if the device was already open. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Update nodatacow mode to support cloned single files and resizingChris Mason
Before, nodatacow only checked to make sure multiple roots didn't have references on a single extent. This check makes sure that multiple inodes don't have references. nodatacow needed an extra check to see if the block group was currently readonly. This way cows forced by the chunk relocation code are honored. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Properly find the root for snapshotted blocks during chunk relocationChris Mason
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2008-09-25Btrfs: Add support for online device removalChris Mason
This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: Chris Mason <chris.mason@oracle.com>