aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2008-08-11lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]()Ingo Molnar
the names were too generic: drivers/uio/uio.c:87: error: expected identifier or '(' before 'do' drivers/uio/uio.c:87: error: expected identifier or '(' before 'while' drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11lockdep: handle chains involving classes defined in modulesRabin Vincent
Solve this by marking the classes as unused and not printing information about the unused classes. Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11lockdep: spin_lock_nest_lock()Peter Zijlstra
Expose the new lock protection lock. This can be used to annotate places where we take multiple locks of the same class and avoid deadlocks by always taking another (top-level) lock first. NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11lockdep: lock protection locksPeter Zijlstra
On Fri, 2008-08-01 at 16:26 -0700, Linus Torvalds wrote: > On Fri, 1 Aug 2008, David Miller wrote: > > > > Taking more than a few locks of the same class at once is bad > > news and it's better to find an alternative method. > > It's not always wrong. > > If you can guarantee that anybody that takes more than one lock of a > particular class will always take a single top-level lock _first_, then > that's all good. You can obviously screw up and take the same lock _twice_ > (which will deadlock), but at least you cannot get into ABBA situations. > > So maybe the right thing to do is to just teach lockdep about "lock > protection locks". That would have solved the multi-queue issues for > networking too - all the actual network drivers would still have taken > just their single queue lock, but the one case that needs to take all of > them would have taken a separate top-level lock first. > > Never mind that the multi-queue locks were always taken in the same order: > it's never wrong to just have some top-level serialization, and anybody > who needs to take <n> locks might as well do <n+1>, because they sure as > hell aren't going to be on _any_ fastpaths. > > So the simplest solution really sounds like just teaching lockdep about > that one special case. It's not "nesting" exactly, although it's obviously > related to it. Do as Linus suggested. The lock protection lock is called nest_lock. Note that we still have the MAX_LOCK_DEPTH (48) limit to consider, so anything that spills that it still up shit creek. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11lockdep: map_acquirePeter Zijlstra
Most the free-standing lock_acquire() usages look remarkably similar, sweep them into a new helper. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11lockdep: shrink held_lock structureDave Jones
struct held_lock { u64 prev_chain_key; /* 0 8 */ struct lock_class * class; /* 8 8 */ long unsigned int acquire_ip; /* 16 8 */ struct lockdep_map * instance; /* 24 8 */ int irq_context; /* 32 4 */ int trylock; /* 36 4 */ int read; /* 40 4 */ int check; /* 44 4 */ int hardirqs_off; /* 48 4 */ /* size: 56, cachelines: 1 */ /* padding: 4 */ /* last cacheline: 56 bytes */ }; struct held_lock { u64 prev_chain_key; /* 0 8 */ long unsigned int acquire_ip; /* 8 8 */ struct lockdep_map * instance; /* 16 8 */ unsigned int class_idx:11; /* 24:21 4 */ unsigned int irq_context:2; /* 24:19 4 */ unsigned int trylock:1; /* 24:18 4 */ unsigned int read:2; /* 24:16 4 */ unsigned int check:2; /* 24:14 4 */ unsigned int hardirqs_off:1; /* 24:13 4 */ /* size: 32, cachelines: 1 */ /* padding: 4 */ /* bit_padding: 13 bits */ /* last cacheline: 32 bytes */ }; [mingo@elte.hu: shrunk hlock->class too] [peterz@infradead.org: fixup bit sizes] Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2008-08-11lockdep: re-annotate scheduler runqueuesPeter Zijlstra
Instead of using a per-rq lock class, use the regular nesting operations. However, take extra care with double_lock_balance() as it can release the already held rq->lock (and therefore change its nesting class). So what can happen is: spin_lock(rq->lock); // this rq subclass 0 double_lock_balance(rq, other_rq); // release rq // acquire other_rq->lock subclass 0 // acquire rq->lock subclass 1 spin_unlock(other_rq->lock); leaving you with rq->lock in subclass 1 So a subsequent double_lock_balance() call can try to nest a subclass 1 lock while already holding a subclass 1 lock. Fix this by introducing double_unlock_balance() which releases the other rq's lock, but also re-sets the subclass for this rq's lock to 0. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11lockdep: lock_set_subclass - reset a held lock's subclassPeter Zijlstra
this can be used to reset a held lock's subclass, for arbitrary-depth iterated data structures such as trees or lists which have per-node locks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-11Merge branch 'linus' into sched/clockIngo Molnar
2008-08-11sched_clock: delay using sched_clock()Peter Zijlstra
Some arch's can't handle sched_clock() being called too early - delay this until sched_clock_init() has been called. Reported-by: Bill Gatliff <bgat@billgatliff.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Nishanth Aravamudan <nacc@us.ibm.com> CC: Russell King - ARM Linux <linux@arm.linux.org.uk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-08-07DMA: make dma-coherent.c documentation kdoc-friendlyDmitry Baryshkov
Spotted by Randy. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2008-08-05pm_qos: spelling fixesRichard Hughes
A documentation cleanup patch. With a minor tweak to clarify units for kbs. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: mark gross <mgross@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-05dma: fix order calculation in dma_mark_declared_memory_occupied()Jan Beulich
get_order() takes byte-sized input, not a page-granular one. Irrespective of this fix I'm inclined to believe that this doesn't work right anyway - bitmap_allocate_region() has an implicit assumption of 'pos' being suitable for 'order', which this function doesn't seem to enforce (and since it's being called with a byte-granular value there's no reason to believe that the callers would make sure device_addr is passed accordingly - it's also not documented that way). Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dmitry Baryshkov <dbaryshkov@gmail.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-05genirq: better warning on irqchip->set_type() failureDavid Brownell
While I'm glad to finally see the hole fixed whereby passing an invalid IRQ trigger type to request_irq() would be ignored, the current diagnostic isn't quite useful. Fixed by also listing the trigger type which was rejected. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-05semaphore: __down_common: use signal_pending_state()Oleg Nesterov
Change __down_common() to use signal_pending_state() instead of open coding. The changes in kernel/semaphore.o are just artifacts, the state checks are optimized away. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-05relay: fix "full buffer with exactly full last subbuffer" accounting problemTom Zanussi
In relay's current read implementation, if the buffer is completely full but hasn't triggered the buffer-full condition (i.e. the last write didn't cross the subbuffer boundary) and the last subbuffer is exactly full, the subbuffer accounting code erroneously finds nothing available. This patch fixes the problem. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org> Cc: Andrea Righi <righi.andrea@gmail.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-04Merge branch 'audit.b56' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b56' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: Re: [PATCH] Fix the kernel panic of audit_filter_task when key field is set
2008-08-04__sched_setscheduler: don't do any policy checks when not "user"Jeremy Fitzhardinge
The "user" parameter to __sched_setscheduler indicates whether the change is being done on behalf of a user process or not. If not, we shouldn't apply any permissions checks, so don't call security_task_setscheduler(). Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Steve Wise <swise@opengridcomputing.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-04Re: [PATCH] Fix the kernel panic of audit_filter_task when key field is setzhangxiliang
Sorry, I miss a blank between if and "(". And I add "unlikely" to check "ctx" in audit_match_perm() and audit_match_filetype(). This is a new patch for it. Signed-off-by: Zhang Xiliang <zhangxiliang@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-01tracehook: fix exit_signal=0 caseRoland McGrath
My commit 2b2a1ff64afbadac842bbc58c5166962cf4f7664 introduced a regression (sorry about that) for the odd case of exit_signal=0 (e.g. clone_flags=0). This is not a normal use, but it's used by a case in the glibc test suite. Dying with exit_signal=0 sends no signal, but it's supposed to wake up a parent's blocked wait*() calls (unlike the delayed_group_leader case). This fixes tracehook_notify_death() and its caller to distinguish a "signal 0" wakeup from the delayed_group_leader case (with no wakeup). Signed-off-by: Roland McGrath <roland@redhat.com> Tested-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-01Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: locking: fix mutex @key parameter kernel-doc notation
2008-08-01Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: fix gdb serial thread queries kgdb: fix kgdb_validate_break_address to perform a mem write kgdb: remove the requirement for CONFIG_FRAME_POINTER
2008-08-01[PATCH] Fix the bug of using AUDIT_STATUS_RATE_LIMIT when set fail, no error ↵zhangxiliang
output. When the "status_get->mask" is "AUDIT_STATUS_RATE_LIMIT || AUDIT_STATUS_BACKLOG_LIMIT". If "audit_set_rate_limit" fails and "audit_set_backlog_limit" succeeds, the "err" value will be greater than or equal to 0. It will miss the failure of rate set. Signed-off-by: Zhang Xiliang <zhangxiliang@cn.fujitsu.com> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-01[PATCH] Fix the kernel panic of audit_filter_task when key field is setzhangxiliang
When calling audit_filter_task(), it calls audit_filter_rules() with audit_context is NULL. If the key field is set, the result in audit_filter_rules() will be set to 1 and ctx->filterkey will be set to key. But the ctx is NULL in this condition, so kernel will panic. Signed-off-by: Zhang Xiliang <zhangxiliang@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-01Re: [PATCH] the loginuid field should be output in all AUDIT_CONFIG_CHANGE ↵zhangxiliang
audit messages > shouldn't these be using the "audit_get_loginuid(current)" and if we > are going to output loginuid we also should be outputting sessionid Thanks for your detailed explanation. I have made a new patch for outputing "loginuid" and "sessionid" by audit_get_loginuid(current) and audit_get_sessionid(current). If there are some deficiencies, please give me your indication. Signed-off-by: Zhang Xiliang <zhangxiliang@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-01kernel/audit.c control character detection is off-by-oneVesa-Matti J Kari
Hello, According to my understanding there is an off-by-one bug in the function: audit_string_contains_control() in: kernel/audit.c Patch is included. I do not know from how many places the function is called from, but for example, SELinux Access Vector Cache tries to log untrusted filenames via call path: avc_audit() audit_log_untrustedstring() audit_log_n_untrustedstring() audit_string_contains_control() If audit_string_contains_control() detects control characters, then the string is hex-encoded. But the hex=0x7f dec=127, DEL-character, is not detected. I guess this could have at least some minor security implications, since a user can create a filename with 0x7f in it, causing logged filename to possibly look different when someone reads it on the terminal. Signed-off-by: Vesa-Matti Kari <vmkari@cc.helsinki.fi> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-01[PATCH] Audit: Collect signal info when SIGUSR2 is sent to auditdEric Paris
Makes the kernel audit subsystem collect information about the sending process when that process sends SIGUSR2 to the userspace audit daemon. SIGUSR2 is a new interesting signal to auditd telling auditd that it should try to start logging to disk again and the error condition which caused it to stop logging to disk (usually out of space) has been rectified. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-01kgdb: fix gdb serial thread queriesJason Wessel
The command "info threads" did not work correctly with kgdb. It would result in a silent kernel hang if used. This patach addresses several problems. - Fix use of deprecated NR_CPUS - Fix kgdb to not walk linearly through the pid space - Correctly implement shadow pids - Change the threads per query to a #define - Fix kgdb_hex2long to work with negated values The threads 0 and -1 are reserved to represent the current task. That means that CPU 0 will start with a shadow thread id of -2, and CPU 1 will have a shadow thread id of -3, etc... From the debugger you can switch to a shadow thread to see what one of the other cpus was doing, however it is not possible to execute run control operations on any other cpu execept the cpu executing the kgdb_handle_exception(). Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-08-01kgdb: fix kgdb_validate_break_address to perform a mem writeJason Wessel
A regression to the kgdb core was found in the case of using the CONFIG_DEBUG_RODATA kernel option. When this option is on, a breakpoint cannot be written into any readonly memory page. When an external debugger requests a breakpoint to get set, the kgdb_validate_break_address() was only checking to see if the address to place the breakpoint was readable and lacked a write check. This patch changes the validate routine to try reading (via the breakpoint set request) and also to try immediately writing the break point. If either fails, an error is correctly returned and the debugger behaves correctly. Then an end user can make the descision to use hardware breakpoints. Also update the documentation to reflect that using CONFIG_DEBUG_RODATA will inhibit the use of software breakpoints. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2008-08-01lockdep: change scheduler annotationPeter Zijlstra
While thinking about David's graph walk lockdep patch it _finally_ dawned on me that there is no reason we have a lock class per cpu ... Sorry for being dense :-/ The below changes the annotation from a lock class per cpu, to a single nested lock, as the scheduler never holds more that 2 rq locks at a time anyway. If there was code requiring holding all rq locks this would not work and the original annotation would be the only option, but that not being the case, this is a much lighter one. Compiles and boots on a 2-way x86_64. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Miller <davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-31lockdep: fix combinatorial explosion in lock subgraph traversalDavid Miller
When we traverse the graph, either forwards or backwards, we are interested in whether a certain property exists somewhere in a node reachable in the graph. Therefore it is never necessary to traverse through a node more than once to get a correct answer to the given query. Take advantage of this property using a global ID counter so that we need not clear all the markers in all the lock_class entries before doing a traversal. A new ID is choosen when we start to traverse, and we continue through a lock_class only if it's ID hasn't been marked with the new value yet. This short-circuiting is essential especially for high CPU count systems. The scheduler has a runqueue per cpu, and needs to take two runqueue locks at a time, which leads to long chains of backwards and forwards subgraphs from these runqueue lock nodes. Without the short-circuit implemented here, a graph traversal on a runqueue lock can take up to (1 << (N - 1)) checks on a system with N cpus. For anything more than 16 cpus or so, lockdep will eventually bring the machine to a complete standstill. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-31Merge branch 'linus' into sched/urgentIngo Molnar
2008-07-31sched clock: couple local and remote clocksIngo Molnar
When taking the time of a remote CPU, use the opportunity to couple (sync) the clocks to each other. (in a monotonic way) Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de>
2008-07-31sched clock: simplify __update_sched_clock()Ingo Molnar
- return the current clock instead of letting callers fetch it from scd->clock Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de>
2008-07-31sched: eliminate scd->prev_rawIngo Molnar
eliminate prev_raw and use tick_raw instead. It's enough to base the current time on the scheduler tick timestamp alone - the monotonicity and maximum checks will prevent any damage. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de>
2008-07-31sched clock: clean up sched_clock_cpu()Ingo Molnar
- simplify the remote clock rebasing Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de>
2008-07-31sched clock: revert various sched_clock() changesIngo Molnar
Found an interactivity problem on a quad core test-system - simple CPU loops would occasionally delay the system un an unacceptable way. After much debugging with Peter Zijlstra it turned out that the problem is caused by the string of sched_clock() changes - they caused the CPU clock to jump backwards a bit - which confuses the scheduler arithmetics. (which is unsigned for performance reasons) So revert: # c300ba2: sched_clock: and multiplier for TSC to gtod drift # c0c8773: sched_clock: only update deltas with local reads. # af52a90: sched_clock: stop maximum check on NO HZ # f7cce27: sched_clock: widen the max and min time This solves the interactivity problems. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de>
2008-07-30sched: make scheduler sysfs attributes sysdev class devicesAndi Kleen
They are really class devices, but were incorrectly declared. This leads to crashes with the recent changes that makes non normal sysdevs use a different prototype. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pierre Ossman <drzeus-list@drzeus.cx> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30workqueues: add comments to __create_workqueue_key()Oleg Nesterov
Dmitry Adamushko pointed out that the error handling in __create_workqueue_key() is not clear, add the comment. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30printk: fix comment for printk ratelimitingUwe Kleine-König
The comment assumed the burst to be one and the ratelimit used to be named printk_ratelimit_jiffies. Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30markers: fix markers read barrier for multiple probesMathieu Desnoyers
Paul pointed out two incorrect read barriers in the marker handler code in the path where multiple probes are connected. Those are ordering reads of "ptype" (single or multi probe marker), "multi" array pointer, and "multi" array data access. It should be ordered like this : read ptype smp_rmb() read multi array pointer smp_read_barrier_depends() access data referenced by multi array pointer The code with a single probe connected (optimized case, does not have to allocate an array) has correct memory ordering. It applies to kernel 2.6.26.x, 2.6.25.x and linux-next. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30cpuset: clean up cpuset hierarchy traversal codeLi Zefan
Use cpuset.stack_list rather than kfifo, so we avoid memory allocation for kfifo. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30cpuset: fix wrong calculation of relax domain levelLi Zefan
When multiple cpusets are overlapping in their 'cpus' and hence they form a single sched domain, the largest sched_relax_domain_level among those should be used. But when top_cpuset's sched_load_balance is set, its sched_relax_domain_level is used regardless other sub-cpusets'. This patch fixes it by walking the cpuset hierarchy to find the largest sched_relax_domain_level. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30cpuset: speed up sched domain partitionLai Jiangshan
All child cpusets contain a subset of the parent's cpus, so we can skip them when partitioning sched domains. This decreases 'csa' greately for cpusets with multi-level hierarchy. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30cpuset: a bit cleanup for scan_for_empty_cpusets()Li Zefan
clean up hierarchy traversal code Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Jackson <pj@sgi.com> Cc: Cliff Wickman <cpw@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30cgroup: uninline cgroup_has_css_refs()Li Zefan
It's not small enough, and has 2 call sites. text data bss dec hex filename 12813 1676 4832 19321 4b79 cgroup.o.orig 12775 1676 4832 19283 4b53 cgroup.o Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30cgroup: remove duplicate code in allocate_cg_link()Li Zefan
- just call free_cg_links() in allocate_cg_links() - the list will get initialized in allocate_cg_links(), so don't init it twice Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30cgroup: fix possible memory leakLi Zefan
There's a leak if copy_from_user() returns failure. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30resource: add resource_size()Magnus Damm
Avoid one-off errors by introducing a resource_size() function. Signed-off-by: Magnus Damm <damm@igel.co.jp> Cc: Ben Dooks <ben-linux@fluff.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-30Merge branch 'sched/urgent' into sched/clockIngo Molnar