aboutsummaryrefslogtreecommitdiff
path: root/block/cfq-iosched.c
AgeCommit message (Collapse)Author
2009-11-26cfq: Make use of service count to estimate the rb_key offsetGui Jianfeng
For the moment, different workload cfq queues are put into different service trees. But CFQ still uses "busy_queues" to estimate rb_key offset when inserting a cfq queue into a service tree. I think this isn't appropriate, and it should make use of service tree count to do this estimation. This patch is for for-2.6.33 branch. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-11block: jiffies fixesRandy Dunlap
Use HZ-independent calculation of milliseconds. Add jiffies.h where it was missing since functions or macros from it are used. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-08cfq-iosched: fix next_rq computationCorrado Zoccolo
Cfq has a bug in computation of next_rq, that affects transition between multiple sequential request streams in a single queue (e.g.: two sequential buffered writers of the same priority), causing the alternation between the two streams for a transient period. 8,0 1 18737 0.260400660 5312 D W 141653311 + 256 8,0 1 20839 0.273239461 5400 D W 141653567 + 256 8,0 1 20841 0.276343885 5394 D W 142803919 + 256 8,0 1 20843 0.279490878 5394 D W 141668927 + 256 8,0 1 20845 0.292459993 5400 D W 142804175 + 256 8,0 1 20847 0.295537247 5400 D W 141668671 + 256 8,0 1 20849 0.298656337 5400 D W 142804431 + 256 8,0 1 20851 0.311481148 5394 D W 141668415 + 256 8,0 1 20853 0.314421305 5394 D W 142804687 + 256 8,0 1 20855 0.318960112 5400 D W 142804943 + 256 The fix makes sure that the next_rq is computed from the last dispatched request, and not affected by merging. 8,0 1 37776 4.305161306 0 D W 141738087 + 256 8,0 1 37778 4.308298091 0 D W 141738343 + 256 8,0 1 37780 4.312885190 0 D W 141738599 + 256 8,0 1 37782 4.315933291 0 D W 141738855 + 256 8,0 1 37784 4.319064459 0 D W 141739111 + 256 8,0 1 37786 4.331918431 5672 D W 142803007 + 256 8,0 1 37788 4.334930332 5672 D W 142803263 + 256 8,0 1 37790 4.337902723 5672 D W 142803519 + 256 8,0 1 37792 4.342359774 5672 D W 142803775 + 256 8,0 1 37794 4.345318286 0 D W 142804031 + 256 Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-04cfq-iosched: get rid of the coop_preempt flagJens Axboe
We need to rework this logic post the cooperating cfq_queue merging, for now just get rid of it and Jeff Moyer will fix the fall out. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-03cfq-iosched: fix merge errorJens Axboe
We ended up with testing the same condition twice, pretty pointless. Remove that first if. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-03Merge branch 'for-linus' into for-2.6.33Jens Axboe
Conflicts: block/cfq-iosched.c Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-03Merge branch 'cfq-2.6.33' into for-2.6.33Jens Axboe
2009-11-03cfq-iosched: limit coop preemptionShaohua Li
CFQ has an optimization for cooperated applications. if several io-context have close requests, they will get boost. But the optimization get abused. Considering thread a, b, which work on one file. a reads sectors s, s+2, s+4, ...; b reads sectors s+1, s+3, s +5, ... Both a and b are sequential read, so they can open idle window. a reads a sector s and goes to idle window and wakeup b. b reads sector s+1, since in current implementation, cfq_should_preempt() thinks a and b are cooperators, b will preempt a. b then reads sector s+1 and goes to idle window and wakeup a. for the same reason, a will preempt b and reads s+2. a and b will continue the circle. The circle will be very long, and a and b will occupy whole disk queue. Other applications will nearly have no chance to run. Fix this limiting coop preempt until a queue is scheduled normally again. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-03cfq-iosched: fix bad return value cfq_should_preempt()Jens Axboe
Commit a6151c3a5c8e1ff5a28450bc8d6a99a2a0add0a7 inadvertently reversed a preempt condition check, potentially causing a performance regression. Make the meta check correct again. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-02cfq-iosched: simplify prio-unboost codeCorrado Zoccolo
Eliminate redundant checks. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-28cfq-iosched: fix style issue in cfq_get_avg_queues()Jens Axboe
Line breaks and bad brace placement. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-28cfq-iosched: fairness for sync no-idle queuesCorrado Zoccolo
Currently no-idle queues in cfq are not serviced fairly: even if they can only dispatch a small number of requests at a time, they have to compete with idling queues to be serviced, experiencing large latencies. We should notice, instead, that no-idle queues are the ones that would benefit most from having low latency, in fact they are any of: * processes with large think times (e.g. interactive ones like file managers) * seeky (e.g. programs faulting in their code at startup) * or marked as no-idle from upper levels, to improve latencies of those requests. This patch improves the fairness and latency for those queues, by: * separating sync idle, sync no-idle and async queues in separate service_trees, for each priority * service all no-idle queues together * and idling when the last no-idle queue has been serviced, to anticipate for more no-idle work * the timeslices allotted for idle and no-idle service_trees are computed proportionally to the number of processes in each set. Servicing all no-idle queues together should have a performance boost for NCQ-capable drives, without compromising fairness. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-28cfq-iosched: enable idling for last queue on priority classCorrado Zoccolo
cfq can disable idling for queues in various circumstances. When workloads of different priorities are competing, if the higher priority queue has idling disabled, lower priority queues may steal its disk share. For example, in a scenario with an RT process performing seeky reads vs a BE process performing sequential reads, on an NCQ enabled hardware, with low_latency unset, the RT process will dispatch only the few pending requests every full slice of service for the BE process. The patch solves this issue by always performing idle on the last queue at a given priority class > idle. If the same process, or one that can pre-empt it (so at the same priority or higher), submits a new request within the idle window, the lower priority queue won't dispatch, saving the disk bandwidth for higher priority ones. Note: this doesn't touch the non_rotational + NCQ case (no hardware to test if this is a benefit in that case). Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-28cfq-iosched: reimplement priorities using different service treesCorrado Zoccolo
We use different service trees for different priority classes. This allows a simplification in the service tree insertion code, that no longer has to consider priority while walking the tree. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-28cfq-iosched: preparation to handle multiple service treesCorrado Zoccolo
We embed a pointer to the service tree in each queue, to handle multiple service trees easily. Service trees are enriched with a counter. cfq_add_rq_rb is invoked after putting the rq in the fifo, to ensure that all fields in rq are properly initialized. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-28cfq-iosched: adapt slice to number of processes doing I/OCorrado Zoccolo
When the number of processes performing I/O concurrently increases, a fixed time slice per process will cause large latencies. This patch, if low_latency mode is enabled, will scale the time slice assigned to each process according to a 300ms target latency. In order to keep fairness among processes: * The number of active processes is computed using a special form of running average, that quickly follows sudden increases (to keep latency low), and decrease slowly (to have fairness in spite of rapid decreases of this value). To safeguard sequential bandwidth, we impose a minimum time slice (computed using 2*cfq_slice_idle as base, adjusted according to priority and async-ness). Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-27cfq-iosched: improve hw_tag detectionShaohua Li
If active queue hasn't enough requests and idle window opens, cfq will not dispatch sufficient requests to hardware. In such situation, current code will zero hw_tag. But this is because cfq doesn't dispatch enough requests instead of hardware queue doesn't work. Don't zero hw_tag in such case. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-26cfq: break apart merged cfqqs if they stop cooperatingJeff Moyer
cfq_queues are merged if they are issuing requests within the mean seek distance of one another. This patch detects when the coopearting stops and breaks the queues back up. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-26cfq: change the meaning of the cfqq_coop flagJeff Moyer
The flag used to indicate that a cfqq was allowed to jump ahead in the scheduling order due to submitting a request close to the queue that just executed. Since closely cooperating queues are now merged, the flag holds little meaning. Change it to indicate that multiple queues were merged. This will later be used to allow the breaking up of merged queues when they are no longer cooperating. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-26cfq: merge cooperating cfq_queuesJeff Moyer
When cooperating cfq_queues are detected currently, they are allowed to skip ahead in the scheduling order. It is much more efficient to automatically share the cfq_queue data structure between cooperating processes. Performance of the read-test2 benchmark (which is written to emulate the dump(8) utility) went from 12MB/s to 90MB/s on my SATA disk. NFS servers with multiple nfsd threads also saw performance increases. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-26cfq: calculate the seek_mean per cfq_queue not per cfq_io_contextJeff Moyer
async cfq_queue's are already shared between processes within the same priority, and forthcoming patches will change the mapping of cic to sync cfq_queue from 1:1 to 1:N. So, calculate the seekiness of a process based on the cfq_queue instead of the cfq_io_context. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-08cfq-iosched: avoid probable slice overrun when idlingCorrado Zoccolo
If the average think time is larger than the remaining time slice for any given queue, don't allow it to idle. A succesful idle also means that we need to dispatch and complete a request, so if we don't even have time left for the idle process, we would overrun the slice in any case. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-07cfq-iosched: apply bool value where we return 0/1Jens Axboe
Saves 16 bytes of text, woohoo. But the more important point is that it makes the code more readable when returning bool for 0/1 cases. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-07cfq-iosched: fix think time allowed for seekersCorrado Zoccolo
CFQ enables idle only for processes that think less than the allowed idle time. Since idle time is lower for seeky queues, we should use the correct value in the comparison. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06cfq-iosched: fix the slice residual signJens Axboe
We should subtract the slice residual from the rb tree key, since a negative residual count indicates that the cfqq overran its slice the last time. Hence we want to add the overrun time, to position it a bit further away in the service tree. Reported-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06cfq-iosched: abstract out the 'may this cfqq dispatch' logicJens Axboe
Makes the whole thing easier to read, cfq_dispatch_requests() was a bit messy before. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05block: get rid of kblock_schedule_delayed_work()Jens Axboe
It was briefly introduced to allow CFQ to to delayed scheduling, but we ended up removing that feature again. So lets kill the function and export, and just switch CFQ back to the normal work schedule since it is now passing in a '0' delay from all call sites. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05cfq-iosched: fix possible problem with jiffies wraparoundCorrado Zoccolo
The RR service tree is indexed by a key that is relative to current jiffies. This can cause problems on jiffies wraparound. The patch fixes it using time_before comparison, and changing the add_front path to use a relative number, too. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05cfq-iosched: fix issue with rq-rq merging and fifo list orderingJens Axboe
cfq uses rq->start_time as the fifo indicator, but that field may get modified prior to cfq doing it's fifo list adjustment when a request gets merged with another request. This can cause the fifo list to become unordered. Reported-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-04cfq-iosched: don't delay async queue if it hasn't dispatched at allJens Axboe
We cannot delay for the first dispatch of the async queue if it hasn't dispatched at all, since that could present a local user DoS attack vector using an app that just did slow timed sync reads while filling memory. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: use assigned slice sync value, not defaultJens Axboe
We should use the sysfs modified slice sync value, in case it differs from the default. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'Jens Axboe
Don't think that's necessarily a perfect description of what this option fiddles with, but it's probably better than 'desktop'. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: implement slower async initiate and queue ramp upJens Axboe
This slowly ramps up the async queue depth based on the time passed since the sync IO, and doesn't allow async at all until a sync slice period has passed. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: delay async IO dispatch, if sync IO was just doneVivek Goyal
o Do not allow more than max_dispatch requests from an async queue, if some sync request has finished recently. This is in the hope that sync activity is still going on in the system and we might receive a sync request soon. Most likely from a sync queue which finished a request and we did not enable idling on it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-02cfq-iosched: add a knob for desktop interactivenessJens Axboe
This is basically identical to what Vivek Goyal posted, but combined into one and labelled 'desktop' instead of 'fairness'. The goal is to continue to improve on the latency side of things as it relates to interactiveness, keeping the questionable bits under this sysfs tunable so it would be easy for throughput-only people to turn off. Apart from adding the interactive sysfs knob, it also adds the behavioural change of allowing slice idling even if the hardware does tagged command queuing. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-14cfq: choose a new next_req when a request is dispatchedJeff Moyer
This patch addresses http://bugzilla.kernel.org/show_bug.cgi?id=13401, a regression introduced in 2.6.30. From the bug report: Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11cfq: fix the log message after dispatched a requestShan Wei
The blktrace tools can show process id when cfq dispatched a request, using cfq_log_cfqq() instead of cfq_log(). Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11cfq-iosched: get rid of must_alloc flagJens Axboe
It's not currently used, as pointed out by Gui Jianfeng <guijianfeng@cn.fujitsu.com>. We already check the wait_request flag to allow an idling queue priority allocation access, so we don't need this extra flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11bio: first step in sanitizing the bio->bi_rw flag testingJens Axboe
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11cfq-iosched: no need to keep track of busy_rt_queuesVivek Goyal
o Get rid of busy_rt_queues infrastructure. Looks like it is redundant. o Once an RT queue gets request it will preempt any of the BE or IDLE queues immediately. Otherwise this queue will be put on service tree and scheduler will anyway select this queue before any of the BE or IDLE queue. Hence looks like there is no need to keep track of how many busy RT queues are currently on service tree. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11cfq-iosched: drain device queue before switching to a sync queueJens Axboe
To lessen the impact of async IO on sync IO, let the device drain of any async IO in progress when switching to a sync cfqq that has idling enabled. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-08-14Merge branch 'percpu-for-linus' into percpu-for-nextTejun Heo
Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>
2009-07-10cfq-iosched: reset oom_cfqq in cfq_set_request()Vivek Goyal
In case memory is scarce, we now default to oom_cfqq. Once memory is available again, we should allocate a new cfqq and stop using oom_cfqq for a particular io context. Once a new request comes in, check if we are using oom_cfqq, and if yes, try to allocate a new cfqq. Tested the patch by forcing the use of oom_cfqq and upon next request thread realized that it was using oom_cfqq and it allocated a new cfqq. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-04Merge branch 'master' into for-nextTejun Heo
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h
2009-07-01cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request()Shan Wei
With the changes for falling back to an oom_cfqq, we never fail to find/allocate a queue in cfq_get_queue(). So remove the check. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-01cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue()Jens Axboe
Setup an emergency fallback cfqq that we allocate at IO scheduler init time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue() never fails without having to ensure free memory. On cfqq lookup, always try to allocate a new cfqq if the given cfq io context has the oom_cfqq assigned. This ensures that we only temporarily punt to this shared queue. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-01cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue()Jens Axboe
We're going to be needing that init code outside of that function to get rid of the __GFP_NOFAIL in cfqq allocation. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-06-24percpu: clean up percpu variable definitionsTejun Heo
Percpu variable definition is about to be updated such that all percpu symbols including the static ones must be unique. Update percpu variable definitions accordingly. * as,cfq: rename ioc_count uniquely * cpufreq: rename cpu_dbs_info uniquely * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and rename it * ipv4,6: rename cookie_scratch uniquely * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to pmc_irq_entry and nmi_entry to pmc_nmi_entry * perf_counter: rename disable_count to perf_disable_count * ftrace: rename test_event_disable to ftrace_test_event_disable * kmemleak: rename test_pointer to kmemleak_test_pointer * mce: rename next_interval to mce_next_interval [ Impact: percpu usage cleanups, no duplicate static percpu var names ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm <linux-mm@kvack.org> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andi Kleen <andi@firstfloor.org>
2009-06-16cfq: remove extraneous '\n' in blktrace outputJeff Moyer
I noticed a blank line in blktrace output. This patch fixes that. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>