aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include/asm/topology.h
AgeCommit message (Collapse)Author
2009-12-15hugetlb: add generic definition of NUMA_NO_NODELee Schermerhorn
Move definition of NUMA_NO_NODE from ia64 and x86_64 arch specific headers to generic header 'linux/numa.h' for use in generic code. NUMA_NO_NODE replaces bare '-1' where it's used in this series to indicate "no node id specified". Ultimately, it can be used to replace the -1 elsewhere where it is used similarly. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-03sched: Disable SD_PREFER_LOCAL at node levelMike Galbraith
Yanmin Zhang reported that SD_PREFER_LOCAL induces an order of magnitude increase in select_task_rq_fair() overhead while running heavy wakeup benchmarks (tbench and vmark). Since SD_BALANCE_WAKE is off at node level, turn SD_PREFER_LOCAL off as well pending further investigation. Reported-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14sched: Disable SD_PREFER_LOCAL for MC/CPU domainsPeter Zijlstra
Yanmin reported that both tbench and hackbench were significantly hurt by trying to keep tasks local on these domains, esp on small cache machines. So disable it in order to promote spreading outside of the cache domains. Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Mike Galbraith <efault@gmx.de> LKML-Reference: <1255083400.8802.15.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24x86: Remove redundant non-NUMA topology functionsRusty Russell
arch/x86/include/asm/topology.h declares inline fns cpu_to_node and cpumask_of_node for !NUMA, even though they are then declared as macros by asm-generic/topology.h, which is #included just below. The macros (which are the same) end up being used; these functions are just confusing. Noticed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: "Greg Kroah-Hartman" <gregkh@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <200909241748.45629.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-16sched: Disable wakeup balancingPeter Zijlstra
Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't really mind too much, SD_BALANCE_NEWIDLE picks up most of the slack. On a dual socket, quad core, dual thread nehalem system: sysbench (--num_threads=16): SD_BALANCE_WAKE-: 13982 tx/s SD_BALANCE_WAKE+: 15688 tx/s kbuild (-j16): SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% ) SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% ) (same within noise) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Reduce forkexec_idxPeter Zijlstra
If we're looking to place a new task, we might as well find the idlest position _now_, not 1 tick ago. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Improve latencies and throughputMike Galbraith
Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: Jason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Tweak wake_idxPeter Zijlstra
When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Merge select_task_rq_fair() and sched_balance_self()Peter Zijlstra
The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-07sched: enable SD_WAKE_IDLEPeter Zijlstra
Now that SD_WAKE_IDLE doesn't make pipe-test suck anymore, enable it by default for MC, CPU and NUMA domains. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Turn on SD_BALANCE_NEWIDLEIngo Molnar
Start the re-tuning of the balancer by turning on newidle. It improves hackbench performance and parallelism on a 4x4 box. The "perf stat --repeat 10" measurements give us: domain0 domain1 ....................................... -SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE: 2041.273208 task-clock-msecs # 9.354 CPUs ( +- 0.363% ) +SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE: 2086.326925 task-clock-msecs # 11.934 CPUs ( +- 0.301% ) +SD_BALANCE_NEWIDLE +SD_BALANCE_NEWIDLE: 2115.289791 task-clock-msecs # 12.158 CPUs ( +- 0.263% ) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Clean up topology.hIngo Molnar
Re-organize the flag settings so that it's visible at a glance which sched-domains flags are set and which not. With the new balancer code we'll need to re-tune these details anyway, so make it cleaner to make fewer mistakes down the road ;-) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11sched: Don't export sched_mc_power_savings on multi-socket single core systemVaidyanathan Srinivasan
Fix to prevent sched_mc_power_saving from being exported through sysfs for multi-scoket single core system. Max cores should be always greater than one (1). My earlier patch that introduced fix for not exporting 'sched_mc_power_saving' on laptops broke it on multi-socket single core system. This fix addresses issue on both laptop and multi-socket single core system. Below are the Test results: 1. Single socket - multi-core Before Patch: Does not export 'sched_mc_power_saving' After Patch: Does not export 'sched_mc_power_saving' Result: Pass 2. Multi Socket - single core Before Patch: exports 'sched_mc_power_saving' After Patch: Does not export 'sched_mc_power_saving' Result: Pass 3. Multi Socket - Multi core Before Patch: exports 'sched_mc_power_saving' After Patch: exports 'sched_mc_power_saving' [ Impact: make the sched_mc_power_saving control available more consistently ] Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Suresh B Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090511143914.GB4853@dirshya.in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-22x86/PCI: set_pci_bus_resources_arch_default cleanupsYinghai Lu
Rename set_pci_bus_resources_arch_default to x86_pci_root_bus_res_quirks, move the weak version from common.c to i386.c, and before calling, make sure it's a root bus. Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-03-31Merge branch 'cpumask-for-linus' of ↵Rusty Russell
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip Conflicts: arch/x86/include/asm/topology.h drivers/oprofile/buffer_sync.c (Both cases: changed in Linus' tree, removed in Ingo's).
2009-03-30cpumask: remove node_to_first_cpuRusty Russell
Everyone defines it, and only one person uses it (arch/mips/sgi-ip27/ip27-nmi.c). So just open code it there. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-mips@linux-mips.org
2009-03-13cpumask: remove x86 cpumask_t uses.Rusty Russell
Impact: cleanup We are removing cpumask_t in favour of struct cpumask: mainly as a marker of what code is now CONFIG_CPUMASK_OFFSTACK-safe. The only non-trivial change here is vector_allocation_domain(): explicitly clear the mask and set the first word, rather than using assignment. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13cpumask: use new cpumask functions throughout x86Rusty Russell
Impact: cleanup 1) &cpu_online_map -> cpu_online_mask 2) first_cpu/next_cpu_nr -> cpumask_first/cpumask_next 3) cpu_*_map manipulation -> init_cpu_* / set_cpu_* Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13cpumask: convert node_to_cpumask_map[] to cpumask_var_tRusty Russell
Impact: reduce kernel memory usage when CONFIG_CPUMASK_OFFSTACK=y Straightforward conversion: done for 32 and 64 bit kernels. node_to_cpumask_map is now a cpumask_var_t array. 64-bit used to be a dynamic cpumask_t array, and 32-bit used to be a static cpumask_t array. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13x86: unify 32 and 64-bit node_to_cpumask_mapRusty Russell
Impact: cleanup We take the 64-bit code and use it on 32-bit as well. The new file is called mm/numa.c. In a minor cleanup, we use cpu_none_mask instead of declaring a local cpu_mask_none. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13cpumask: remove x86's node_to_cpumask now everyone uses cpumask_of_nodeRusty Russell
Impact: cleanup Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13cpumask: x86: convert cpu_sibling_map/cpu_core_map to cpumask_var_tRusty Russell
Impact: reduce per-cpu size for CONFIG_CPUMASK_OFFSTACK=y In most places it's cleaner to use the accessors cpu_sibling_mask() and cpu_core_mask() wrappers which already exist. I couldn't avoid cleaning up the access in oprofile, either. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13cpumask: remove obsolete topology_core_siblings and ↵Rusty Russell
topology_thread_siblings: x86 Impact: cleanup There were replaced by topology_core_cpumask and topology_thread_cpumask. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13cpumask: remove cpu_coregroup_map: x86Rusty Russell
Impact: cleanup cpu_coregroup_mask is the New Hotness. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13cpumask: remove the now-obsoleted pcibus_to_cpumask(): x86Rusty Russell
Impact: reduce stack usage for large NR_CPUS cpumask_of_pcibus() is the new version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-01-27x86: move 64-bit NUMA codeBrian Gerst
Impact: Code movement, no functional change. Move the 64-bit NUMA code from setup_percpu.c to numa_64.c Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-21Merge branch 'cpus4096' into core/percpuIngo Molnar
Conflicts: arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c arch/x86/kernel/tlb_32.c Merge it here because both the cpumask changes and the ongoing percpu work is touching the TLB code. The percpu changes take precedence, as they eliminate tlb_32.c altogether. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-19x86-64: Move nodenumber from PDA to per-cpu.Brian Gerst
tj: * s/nodenumber/node_number/ * removed now unused pda variable from pda_init() Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2009-01-16x86: make early_per_cpu() a lvalue and use itTejun Heo
Make early_per_cpu() a lvalue as per_cpu() is and use it where applicable. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-15x86: fix build warning when CONFIG_NUMA not defined.Mike Travis
Impact: fix build warning The macro cpu_to_node did not reference it's argument, and instead simply returned a 0. This causes a "unused variable" warning if it's the only reference in a function (show_cache_disable). Replace it with the more correct inline function. Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-03Merge branch 'master' of ↵Mike Travis
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into merge-rr-cpumask Conflicts: arch/x86/kernel/io_apic.c kernel/rcuclassic.c kernel/sched.c kernel/time/tick-sched.c Signed-off-by: Mike Travis <travis@sgi.com> [ mingo@elte.hu: backmerged typo fix for io_apic.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-26cpumask: cpu_coregroup_mask(): x86Rusty Russell
Impact: New API Like cpu_coregroup_map, but returns a (const) pointer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Cc: Ingo Molnar <mingo@redhat.com>
2008-12-26cpumask: x86: Introduce cpumask_of_{node,pcibus} to replace ↵Rusty Russell
{node,pcibus}_to_cpumask Impact: New APIs The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these return a pointer to a struct cpumask. Part of removing cpumasks from the stack. Also makes __pcibus_to_node take a const pointer. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu>
2008-12-16x86: Introduce topology_core_cpumask()/topology_thread_cpumask()Mike Travis
Impact: new API The old topology_core_siblings() and topology_thread_siblings() return a cpumask_t; these new ones return a (const) struct cpumask *. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com>
2008-12-01sched: don't export sched_mc_power_savings in laptopsMahesh Salgaonkar
Impact: do not expose a control that has no effect Fix to prevent sched_mc_power_saving from being exported through sysfs on single-socket systems. (Say multicore single socket (Laptop)) CPU core map of the boot cpu should be equal to possible number of cpus for single socket system. This fix has been developed at FOSS.in kernel workout. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05sched: re-tune balancingIngo Molnar
Impact: improve wakeup affinity on NUMA systems, tweak SMP systems Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain balancing defaults on NUMA and SMP systems. Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason why we would not want to have wakeup affinity across nodes as well. (we already do this in the standard NUMA template.) lat_ctx on a NUMA box is particularly happy about this change: before: | phoenix:~/l> ./lat_ctx -s 0 2 | "size=0k ovr=2.60 | 2 5.70 after: | phoenix:~/l> ./lat_ctx -s 0 2 | "size=0k ovr=2.65 | 2 2.07 a 2.75x speedup. pipe-test is similarly happy about it too: | phoenix:~/sched-tests> ./pipe-test | 18.26 usecs/loop. | 14.70 usecs/loop. | 14.38 usecs/loop. | 10.55 usecs/loop. # +WAKE_AFFINE on domain0+domain1 | 8.63 usecs/loop. | 8.59 usecs/loop. | 9.03 usecs/loop. | 8.94 usecs/loop. | 8.96 usecs/loop. | 8.63 usecs/loop. Also: - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings) - enable SD_WAKE_BALANCE on SMP domains Sysbench+postgresql improves all around the board, quite significantly: .28-rc3-11474e2c .28-rc3-11474e2c-tune ------------------------------------------------- 1: 571 688 +17.08% 2: 1236 1206 -2.55% 4: 2381 2642 +9.89% 8: 4958 5164 +3.99% 16: 9580 9574 -0.07% 32: 7128 8118 +12.20% 64: 7342 8266 +11.18% 128: 7342 8064 +8.95% 256: 7519 7884 +4.62% 512: 7350 7731 +4.93% ------------------------------------------------- SUM: 55412 59341 +6.62% So it's a win both for the runup portion, the peak area and the tail. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-22x86: Fix ASM_X86__ header guardsH. Peter Anvin
Change header guards named "ASM_X86__*" to "_ASM_X86_*" since: a. the double underscore is ugly and pointless. b. no leading underscore violates namespace constraints. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-10-22x86, um: ... and asm-x86 moveAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@zytor.com>