aboutsummaryrefslogtreecommitdiff
path: root/kernel/sched_fair.c
AgeCommit message (Collapse)Author
2007-10-24sched: isolate SMP balancing code a bit morePeter Williams
At the moment, a lot of load balancing code that is irrelevant to non SMP systems gets included during non SMP builds. This patch addresses this issue and reduces the binary size on non SMP systems: text data bss dec hex filename 10983 28 1192 12203 2fab sched.o.before 10739 28 1192 11959 2eb7 sched.o.after Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-24sched: reduce balance-tasks overheadPeter Williams
At the moment, balance_tasks() provides low level functionality for both move_tasks() and move_one_task() (indirectly) via the load_balance() function (in the sched_class interface) which also provides dual functionality. This dual functionality complicates the interfaces and internal mechanisms and makes the run time overhead of operations that are called with two run queue locks held. This patch addresses this issue and reduces the overhead of these operations. Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-17sched: fix new task startup crashSrivatsa Vaddagiri
Child task may be added on a different cpu that the one on which parent is running. In which case, task_new_fair() should check whether the new born task's parent entity should be added as well on the cfs_rq. Patch below fixes the problem in task_new_fair. This could fix the put_prev_task_fair() crashes reported. Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Reported-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: reintroduce cache-hot affinityIngo Molnar
reintroduce a simplified version of cache-hot/cold scheduling affinity. This improves performance with certain SMP workloads, such as sysbench. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: speed up context-switches a bitIngo Molnar
speed up context-switches a bit by not clearing p->exec_start. (as a side-effect, this also makes p->exec_start a universal timestamp available to cache-hot estimations.) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: do not wakeup-preempt with SCHED_BATCH tasksIngo Molnar
do not wakeup-preempt with SCHED_BATCH tasks, their preemption is batched too, driven by the tick. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: update commentIngo Molnar
update comment: clarify time-slices and remove obsolete tuning detail. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: prevent wakeup over-schedulingMike Galbraith
Prevent wakeup over-scheduling. Once a task has been preempted by a task of the same or lower priority, it becomes ineligible for repeated preemption by same until it has been ticked, or slept. Instead, the task is marked for preemption at the next tick. Tasks of higher priority still preempt immediately. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: disable forced preemption by defaultPeter Zijlstra
Implement feature bit to disable forced preemption. This way it can be checked whether a workload is overscheduling or not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: fix group scheduling for SCHED_BATCHDmitry Adamushko
The following patch (sched: disable sleeper_fairness on SCHED_BATCH) seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the possibility of 'p' being always a valid address. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: disable sleeper_fairness on SCHED_BATCHPeter Zijlstra
disable sleeper fairness for batch tasks - they are about batch processing after all. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: another wakeup_granularity fixPeter Zijlstra
unit mis-match: wakeup_gran was used against a vruntime Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: fix: move the CPU check into ->task_new_fair()Ingo Molnar
noticed by Peter Zijlstra: fix: move the CPU check into ->task_new_fair(), this way we can call place_entity() and get child ->vruntime right at initial wakeup time. (without this there can be large latencies) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched: cleanup: function prototype cleanupsIngo Molnar
noticed by Thomas Gleixner: cleanup: function prototype cleanups - move into single line wherever possible. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVGIngo Molnar
cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG, to make SCHED_FEAT_ names more consistent. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: cleanup, make dequeue_entity() and update_stats_wait_end() similarDmitry Adamushko
make dequeue_entity() / enqueue_entity() and update_stats_dequeue() / update_stats_enqueue() look similar, structure-wise. zero effect, functionality-wise: text data bss dec hex filename 34550 3026 100 37676 932c sched.o.before 34550 3026 100 37676 932c sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: cleanup, remove calc_weighted()Dmitry Adamushko
remove obsolete code -- calc_weighted() Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: uninline schedulerAlexey Dobriyan
* save ~300 bytes * activate_idle_task() was moved to avoid a warning bloat-o-meter output: add/remove: 6/0 grow/shrink: 0/16 up/down: 438/-733 (-295) <=== function old new delta __enqueue_entity - 165 +165 finish_task_switch - 110 +110 update_curr_rt - 79 +79 __load_balance_iterator - 32 +32 __task_rq_unlock - 28 +28 find_process_by_pid - 24 +24 do_sched_setscheduler 133 123 -10 sys_sched_rr_get_interval 176 165 -11 sys_sched_getparam 156 145 -11 normalize_rt_tasks 482 470 -12 sched_getaffinity 112 99 -13 sys_sched_getscheduler 86 72 -14 sched_setaffinity 226 212 -14 sched_setscheduler 666 642 -24 load_balance_start_fair 33 9 -24 load_balance_next_fair 33 9 -24 dequeue_task_rt 133 67 -66 put_prev_task_rt 97 28 -69 schedule_tail 133 50 -83 schedule 682 594 -88 enqueue_entity 499 366 -133 task_new_fair 317 180 -137 Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: tweak wakeup granularityIngo Molnar
tweak wakeup granularity. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: fix __pick_next_entity()Dmitry Adamushko
The thing is that __pick_next_entity() must never be called when first_fair(cfs_rq) == NULL. It wouldn't be a problem, should 'run_node' be the very first field of 'struct sched_entity' (and it's the second). The 'nr_running != 0' check is _not_ enough, due to the fact that 'current' is not within the tree. Generic paths are ok (e.g. schedule() as put_prev_task() is called previously)... I'm more worried about e.g. migration_call() -> CPU_DEAD_FROZEN -> migrate_dead_tasks()... if 'current' == rq->idle, no problems.. if it's one of the SCHED_NORMAL tasks (or imagine, some other use-cases in the future -- i.e. we should not make outer world dependent on internal details of sched_fair class) -- it may be "Houston, we've got a problem" case. it's +16 bytes to the ".text". Another variant is to make 'run_node' the first data member of 'struct sched_entity' but an additional check (se ! = NULL) is still needed in pick_next_entity(). Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: vslice fixups for non-0 nice levelsIngo Molnar
Make vslice accurate wrt nice levels, and add some comments while we're at it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: mark scheduling classes as constIngo Molnar
mark scheduling classes as const. The speeds up the code a bit and shrinks it: text data bss dec hex filename 40027 4018 292 44337 ad31 sched.o.before 40190 3842 292 44324 ad24 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: group scheduler, fix latencySrivatsa Vaddagiri
There is a possibility that because of task of a group moving from one cpu to another, it may gain more cpu time that desired. See http://marc.info/?l=linux-kernel&m=119073197730334 for details. This is an attempt to fix that problem. Basically it simulates dequeue of higher level entities as if they are going to sleep. Similarly it simulate wakeup of higher level entities as if they are waking up from sleep. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: group scheduler, fix bloatSrivatsa Vaddagiri
Recent fix to check_preempt_wakeup() to check for preemption at higher levels caused a size bloat for !CONFIG_FAIR_GROUP_SCHED. Fix the problem. 42277 10598 320 53195 cfcb kernel/sched.o-before_this_patch 42216 10598 320 53134 cf8e kernel/sched.o-after_this_patch Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: cleanup, remove stale commentIngo Molnar
cleanup, remove stale comment. Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: speed up and simplify vslice calculationsPeter Zijlstra
speed up and simplify vslice calculations. [ From: Mike Galbraith <efault@gmx.de>: build fix ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: clean up min_vruntime usePeter Zijlstra
clean up min_vruntime use. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: yield fixDmitry Adamushko
fix yield bugs due to the current-not-in-rbtree changes: the task is not in the rbtree so rbtree-removal is a no-no. [ From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>: build fix. ] also, nice code size reduction: kernel/sched.o: text data bss dec hex filename 38323 3506 24 41853 a37d sched.o.before 38236 3506 24 41766 a326 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: group scheduler wakeup latency fixSrivatsa Vaddagiri
group scheduler wakeup latency fix: when checking for preemption we must check cross-group too, not just intra-group. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-15sched: remove set_leftmost()Ingo Molnar
Lee Schermerhorn noticed that set_leftmost() contains dead code, remove this. Reported-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: max_vruntime() simplificationPeter Zijlstra
max_vruntime() simplification. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched: fix sign check error in place_entity()Ingo Molnar
fix sign check error in place_entity() - we'd get excessive latencies due to negatives being converted to large u64's. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched: undo some of the recent changesIngo Molnar
undo some of the recent changes that are not needed after all, such as last_min_vruntime. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched: remove last_min_vruntime effectIngo Molnar
remove last_min_vruntime use - prepare to remove it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched: entity_key() fixIngo Molnar
entity_key() fix - we'd occasionally end up with a 0 vruntime in the !initial case. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched debug: check spreadPeter Zijlstra
debug feature: check how well we schedule within a reasonable vruntime 'spread' range. (note that CPU overload can increase the spread, so this is not a hard condition, but normal loads should be within the spread.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched: add vslicePeter Zijlstra
add vslice: the load-dependent "virtual slice" a task should run ideally, so that the observed latency stays within the sched_latency window. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: remove unneeded tunablesIngo Molnar
remove unneeded tunables. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: clean up code under CONFIG_FAIR_GROUP_SCHEDSrivatsa Vaddagiri
With the view of supporting user-id based fair scheduling (and not just container-based fair scheduling), this patch renames several functions and makes them independent of whether they are being used for container or user-id based fair scheduling. Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating less-sized array for tg->cfs_rq[] and tf->se[]). Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: print &rq->cfs statsSrivatsa Vaddagiri
- Print &rq->cfs statistics as well (useful for group scheduling) Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: fix minor bug in yieldSrivatsa Vaddagiri
- fix a minor bug in yield (seen for CONFIG_FAIR_GROUP_SCHED), group scheduling would skew when yield was called. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: revert recent removal of set_curr_task()Srivatsa Vaddagiri
Revert removal of set_curr_task. Use put_prev_task/set_curr_task when changing groups/policies Signed-off-by: Srivatsa Vaddagiri < vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched: kernel/sched_fair.c whitespace cleanupsIngo Molnar
some trivial whitespace cleanups. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: rework enqueue/dequeue_entity() to get rid of set_curr_task()Dmitry Adamushko
rework enqueue/dequeue_entity() to get rid of sched_class::set_curr_task(). This simplifies sched_setscheduler(), rt_mutex_setprio() and sched_move_tasks(). text data bss dec hex filename 24330 2734 20 27084 69cc sched.o.before 24233 2730 20 26983 6967 sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: simplify sched_class::yield_task()Dmitry Adamushko
the 'p' (task_struct) parameter in the sched_class :: yield_task() is redundant as the caller is always the 'current'. Get rid of it. text data bss dec hex filename 24341 2734 20 27095 69d7 sched.o.before 24330 2734 20 27084 69cc sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: optimize task_new_fair()Dmitry Adamushko
due to the fact that we no longer keep the 'current' within the tree, dequeue/enqueue_entity() is useless for the 'current' in task_new_fair(). We are about to reschedule and sched_class->put_prev_task() will put the 'current' back into the tree, based on its new key. text data bss dec hex filename 24388 2734 20 27142 6a06 sched.o.before 24341 2734 20 27095 69d7 sched.o.after Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: do not keep current in the tree and get rid of sched_entity::fair_keyDmitry Adamushko
Get rid of 'sched_entity::fair_key'. As a side effect, 'current' is not kept withing the tree for SCHED_NORMAL/BATCH tasks anymore. This simplifies some parts of code (e.g. entity_tick() and yield_task_fair()) and also somewhat optimizes them (e.g. a single update_curr() now vs. dequeue/enqueue() before in entity_tick()). Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: sched_setscheduler() fixDmitry Adamushko
Fix a problem in the 'sched-group' patch for !CONFIG_FAIR_GROUP_SCHED. description: sched_setscheduler() { ... if (task_running()) p->sched_class->put_prev_entity(); [ this one sets up cfs_rq->curr to NULL ] ... if (task_running) p->sched_class->set_curr_task(); [ and this one is a _NOP_ (empty) for !CONFIG_FAIR_GROUP_SCHED ] As a result, the task continues to run with cfs_rq->curr == NULL... no crashes (due to checks for !NULL in place) but e.g. update_curr() effectively becomes a NOP... i.e. runtime statistics for this task is not accounted untill it's rescheduled anew. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: group-scheduler coreSrivatsa Vaddagiri
Add interface to control cpu bandwidth allocation to task-groups. (not yet configurable, due to missing CONFIG_CONTAINERS) Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-10-15sched: better min_vruntime trackingPeter Zijlstra
Better min_vruntime tracking: update it every time 'curr' is updated - not just when a task is enqueued into the tree. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Mike Galbraith <efault@gmx.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>