aboutsummaryrefslogtreecommitdiff
path: root/drivers/md/raid10.c
AgeCommit message (Collapse)Author
2006-10-21[PATCH] md: fix calculation of ->degraded for multipath and raid10NeilBrown
Two less-used md personalities have bugs in the calculation of ->degraded (the extent to which the array is degraded). Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] md: define ->congested_fn for raid1, raid10, and multipathNeilBrown
raid1, raid10 and multipath don't report their 'congested' status through bdi_*_congested, but should. This patch adds the appropriate functions which just check the 'congested' status of all active members (with appropriate locking). raid1 read_balance should be modified to prefer devices where bdi_read_congested returns false. Then we could use the '&' branch rather than the '|' branch. However that should would need some benchmarking first to make sure it is actually a good idea. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] md: Improve locking around error handlingNeilBrown
The error handling routines don't use proper locking, and so two concurrent errors could trigger a problem. So: - use test-and-set and test-and-clear to synchonise the In_sync bits with the ->degraded count - use the spinlock to protect updates to the degraded count (could use an atomic_t but that would be a bigger change in code, and isn't really justified) - remove un-necessary locking in raid5 Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] md: remove 'working_disks' from raid10 stateNeilBrown
It isn't needed as mddev->degraded contains equivalent info. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] md: replace magic numbers in sb_dirty with well defined bit flagsNeilBrown
Instead of magic numbers (0,1,2,3) in sb_dirty, we have some flags instead: MD_CHANGE_DEVS Some device state has changed requiring superblock update on all devices. MD_CHANGE_CLEAN The array has transitions from 'clean' to 'dirty' or back, requiring a superblock update on active devices, but possibly not on spares MD_CHANGE_PENDING A superblock update is underway. We wait for an update to complete by waiting for all flags to be clear. A flag can be set at any time, even during an update, without risk that the change will be lost. Stop exporting md_update_sb - isn't needed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-10-03[PATCH] md: factor out part of raid10d into a separate function.NeilBrown
raid10d has toooo many nested block, so take the fix_read_error functionality out into a separate function. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-10[PATCH] md: include sector number in messages about corrected read errorsNeilBrown
This is generally useful, but particularly helps see if it is the same sector that always needs correcting, or different ones. [akpm@osdl.org: fix printk warnings] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: Calculate correct array size for raid10 in new offset modeNeilBrown
The size calculation made assumtion which the new offset mode didn't follow. This gets the size right in all cases. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: support stripe/offset mode in raid10NeilBrown
The "industry standard" DDF format allows for a stripe/offset layout where data is duplicated on different stripes. e.g. A B C D D A B C E F G H H E F G (columns are drives, rows are stripes, LETTERS are chunks of data). This is similar to raid10's 'far' mode, but not quite the same. So enhance 'far' mode with a 'far/offset' option which follows the layout of DDFs stripe/offset. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: allow checkpoint of recovery with version-1 superblockNeilBrown
For a while we have had checkpointing of resync. The version-1 superblock allows recovery to be checkpointed as well, and this patch implements that. Due to early carelessness we need to add a feature flag to signal that the recovery_offset field is in use, otherwise older kernels would assume that a partially recovered array is in fact fully recovered. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-26[PATCH] md: remove arbitrary limit on chunk sizeNeilBrown
The largest chunk size the code can support without substantial surgery is 2^30 bytes, so make that the limit instead of an arbitrary 4Meg. Some day, the 'chunksize' should change to a sector-shift instead of a byte-count. Then no limit would be needed. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01[PATCH] md: Fixed refcounting/locking when attempting read error correction ↵NeilBrown
in raid10 We need to hold a reference to rdevs while reading and writing to attempt to correct read errors. This reference must be taken under an rcu lock. Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-05-01[PATCH] md: Avoid oops when attempting to fix read errors on raid10NeilBrown
We should add to the counter for the rdev *after* checking if the rdev is NULL!!! Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-04-02BUG_ON() Conversion in md/raid10.cEric Sesterhenn
this changes if() BUG(); constructs to BUG_ON() which is cleaner and can better optimized away Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-02-03[PATCH] md: Assorted little md fixesNeilBrown
- version-1 superblock + The default_bitmap_offset is in sectors, not bytes. + the 'size' field in the superblock is in sectors, not KB - raid0_run should return a negative number on error, not '1' - raid10_read_balance should not return a valid 'disk' number if ->rdev turned out to be NULL - kmem_cache_destroy doesn't like being passed a NULL. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-14[PATCH] Unlinline a bunch of other functionsArjan van de Ven
Remove the "inline" keyword from a bunch of big functions in the kernel with the goal of shrinking it by 30kb to 40kb Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: count corrected read errors per driveNeilBrown
Store this total in superblock (As appropriate), and make it available to userspace via sysfs. Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: allow array level to be set textually via sysfsNeilBrown
Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: fix typo in commentNeilBrown
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: define and use safe_put_page for mdNeilBrown
md sometimes call put_page on NULL pointers (treating it like kfree). This is not safe, so define and use a 'safe_put_page' which checks for NULL. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: fix possible problem in raid1/raid10 error overwritingNeilBrown
The code to overwrite/reread for addressing read errors in raid1/raid10 currently assumes that the read will not alter the buffer which could be used to write to the next device. This is not a safe assumption to make. So we split the loops into a overwrite loop and a separate re-read loop, so that the writing is complete before reading is attempted. Cc: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: remove personality numbering from mdNeilBrown
md supports multiple different RAID level, each being implemented by a 'personality' (which is often in a separate module). These personalities have fairly artificial 'numbers'. The numbers are use to: 1- provide an index into an array where the various personalities are recorded 2- identify the module (via an alias) which implements are particular personality. Neither of these uses really justify the existence of personality numbers. The array can be replaced by a linked list which is searched (array lookup only happens very rarely). Module identification can be done using an alias based on level rather than 'personality' number. The current 'raid5' modules support two level (4 and 5) but only one personality. This slight awkwardness (which was handled in the mapping from level to personality) can be better handled by allowing raid5 to register 2 personalities. With this change in place, the core md module does not need to have an exhaustive list of all possible personalities, so other personalities can be added independently. This patch also moves the check for chunksize being non-zero into the ->run routines for the personalities that need it, rather than having it in core-md. This has a side effect of allowing 'faulty' and 'linear' not to have a chunk-size set. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: break out of a loop that doesn't need to run to completionNeilBrown
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: convert md to use kzalloc throughoutNeilBrown
Replace multiple kmalloc/memset pairs with kzalloc calls. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: clean up 'page' related names in mdNeilBrown
Substitute: page_cache_get -> get_page page_cache_release -> put_page PAGE_CACHE_SHIFT -> PAGE_SHIFT PAGE_CACHE_SIZE -> PAGE_SIZE PAGE_CACHE_MASK -> PAGE_MASK __free_page -> put_page because we aren't using the page cache, we are just using pages. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: raid10 read-error handling - resync and read-onlyNeilBrown
Add in correct read-error handling for resync and read-only situations. When read-only, we don't over-write, so we need to mark the failed drive in the r10_bio so we don't re-try it. During resync, we always read all blocks, so if there is a read error, we simply over-write it with the good block that we found (assuming we found one). Note that the recovery case still isn't handled in an interesting way. There is nothing useful to do for the 2-copies case. If there are 3 or more copies, then we could try reading from one of the non-missing copies, but this is a bit complicated and very rarely would be used, so I'm leaving it for now. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: auto-correct correctable read errors in raid10NeilBrown
Largely just a cross-port from raid1. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: support check-without-repair of raid10 arraysNeilBrown
Also keep count on the number of errors found. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: write intent bitmap support for raid10NeilBrown
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-06[PATCH] md: improve raid10 "IO Barrier" conceptNeilBrown
raid10 needs to put up a barrier to new requests while it does resync or other background recovery. The code for this is currently open-coded, slighty obscure by its use of two waitqueues, and not documented. This patch gathers all the related code into 4 functions, and includes a comment which (hopefully) explains what is happening. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-28[PATCH] md: improve read speed to raid10 arrays using 'far copies'NeilBrown
raid10 has two different layouts. One uses near-copies (so multiple copies of a block are at the same or similar offsets of different devices) and the other uses far-copies (so multiple copies of a block are stored a greatly different offsets on different devices). The point of far-copies is that it allows the first section (normally first half) to be layed out in normal raid0 style, and thus provide raid0 sequential read performance. Unfortunately, the read balancing in raid10 makes some poor decisions for far-copies arrays and you don't get the desired performance. So turn off that bad bit of read_balance for far-copies arrays. With this patch, read speed of an 'f2' array is comparable with a raid0 with the same number of devices, though write speed is ofcourse still very slow. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09[PATCH] md: convert 'faulty' and 'in_sync' fields to bits in 'flags' fieldNeilBrown
This has the advantage of removing the confusion caused by 'rdev_t' and 'mddev_t' both having 'in_sync' fields. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-09[PATCH] md: provide proper rcu_dereference / rcu_assign_pointer annotations ↵Suzanne Wood
in md Acked-by: <paulmck@us.ibm.com> Signed-off-by: Suzanne Wood <suzannew@cs.pdx.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-11-01[BLOCK] Unify the seperate read/write io stat fields into arraysJens Axboe
Instead of having ->read_sectors and ->write_sectors, combine the two into ->sectors[2] and similar for the other fields. This saves a branch several places in the io path, since we don't have to care for what the actual io direction is. On my x86-64 box, that's 200 bytes less text in just the core (not counting the various drivers). Signed-off-by: Jens Axboe <axboe@suse.de>
2005-10-08[PATCH] gfp flags annotations - part 1Al Viro
- added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] md: fix BUG when raid10 rebuilds without enough drivesNeilBrown
This shouldn't be a BUG. We should cope. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] md: fix raid10 assembly when too many devices are missingNeilBrown
If you try to assemble an array with too many missing devices, raid10 will now reject the attempt, instead of allowing it. Also check when hot-adding a drive and refuse the hot-add if the array is beyond hope. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] md: fail IO request to md that require a barrier.NeilBrown
md does not yet support BIO_RW_BARRIER, so be honest about it and fail (-EOPNOTSUPP) any such requests. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-09[PATCH] md: fix minor error in raid10 read-balancing calculation.NeilBrown
'this_sector' is a virtual (array) address while 'head_position' is a physical (device) address, so substraction doesn't make any sense. devs[slot].addr should be used instead of this_sector. However, this patch doesn't make much practical different to the read balancing due to the effects of later code. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: remove unneeded NULL checks before kfreeJesper Juhl
This patch removes some unneeded checks of pointers being NULL before calling kfree() on them. kfree() handles NULL pointers just fine, checking first is pointless. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: fix deadlock due to md thread processing delayed requests.NeilBrown
Before completing a 'write' the md superblock might need to be updated. This is best done by the md_thread. The current code schedules this up and queues the write request for later handling by the md_thread. However some personalities (Raid5/raid6) will deadlock if the md_thread tries to submit requests to its own array. So this patch changes things so the processes submitting the request waits for the superblock to be written and then submits the request itself. This fixes a recently-created deadlock in raid5/raid6 Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: improve the interface to sync_requestNeilBrown
1/ change the return value (which is number-of-sectors synced) from 'int' to 'sector_t'. The number of sectors is usually easily small enough to fit in an int, but if resync needs to abort, it may want to return the total number of remaining sectors, which could be large. Also errors cannot be returned as negative numbers now, so use 0 instead 2/ Add a 'skipped' return parameter to allow the array to report that it skipped the sectors. This allows md to take this into account in the speed calculations. Currently there is no important skipping, but the bitmap-based-resync that is coming will use this. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: improve locking on 'safemode' and move superblock writesNeilBrown
When md marks the superblock dirty before a write, it calls generic_make_request (to write the superblock) from within generic_make_request (to write the first dirty block), which could cause problems later. With this patch, the superblock write is always done by the helper thread, and write request are delayed until that write completes. Also, the locking around marking the array dirty and writing the superblock is improved to avoid possible races. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21[PATCH] md: merge md_enter_safemode into md_check_recoveryNeilBrown
md_enter_safemode checks if it is time to mark the md superblock as 'clean'. i.e. if all writes have completed and a suitable delay has passed. This is currently called from md_handle_safemode which in-turn is called (almost) every time md_check_recovery is called, and from the end of md_do_sync which causes the mddev->thread to run, which will always call md_check_recovery as well. So it doesn't need to be a separate function and fits quite well into md_check_recovery. The "almost" is because multipathd calls md_check_recovery but not md_handle_safemode. This is OK because the code from md_enter_safemode is a no-op if mddev->safemode == 0, which it always is for a multipathd (providing we don't allow it to be set to 2 on a signal...) Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-17[PATCH] md: set the unplug_fn and issue_flush_fn for md devices *after* ↵NeilBrown
committed to creation We we set the too early, they may still be in place and possibly get called even though the array didn't get set up properly. Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-01[PATCH] Change synchronize_kernel to _rcu and _schedPaul E. McKenney
This patch changes calls to synchronize_kernel(), deprecated in the earlier "Deprecate synchronize_kernel, GPL replacement" patch to instead call the new synchronize_rcu() and synchronize_sched() APIs. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!