Age | Commit message (Collapse) | Author |
|
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb:
kgdb: always use icache flush for sw breakpoints
kgdb: fix SMP NMI kgdb_handle_exception exit race
kgdb: documentation fixes
kgdb: allow static kgdbts boot configuration
kgdb: add documentation
kgdb: Kconfig fix
kgdb: add kgdb internal test suite
kgdb: fix several kgdb regressions
kgdb: kgdboc pl011 I/O module
kgdb: fix optional arch functions and probe_kernel_*
kgdb: add x86 HW breakpoints
kgdb: print breakpoint removed on exception
kgdb: clocksource watchdog
kgdb: fix NMI hangs
kgdb: fix kgdboc dynamic module configuration
kgdb: document parameters
x86: kgdb support
consoles: polling support, kgdboc
kgdb: core
uaccess: add probe_kernel_write()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
Remove DEBUG_SEMAPHORE from Kconfig
Improve semaphore documentation
Simplify semaphore implementation
Add down_timeout and change ACPI to use it
Introduce down_killable()
Generic semaphore implementation
Add semaphore.h to kernel_lock.c
Fix quota.h includes
|
|
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (36 commits)
[S390] Remove code duplication from monreader / dcssblk.
[S390] kernel: show last breaking-event-address on oops
[S390] lowcore: Change type of lowcores softirq_pending to __u32.
[S390] zcrypt: Comments and kernel-doc cleanup
[S390] uaccess: Always access the correct address space.
[S390] Fix a lot of sparse warnings.
[S390] Convert s390 to GENERIC_CLOCKEVENTS.
[S390] genirq/clockevents: move irq affinity prototypes/inlines to interrupt.h
[S390] Convert monitor calls to function calls.
[S390] qdio (new feature): enhancing info-retrieval from QDIO-adapters
[S390] replace remaining __FUNCTION__ occurrences
[S390] remove redundant display of free swap space in show_mem()
[S390] qdio: remove outdated developerworks link.
[S390] Add debug_register_mode() function to debug feature API
[S390] crypto: use more descriptive function names for init/exit routines.
[S390] switch sched_clock to store-clock-extended.
[S390] zcrypt: add support for large random numbers
[S390] hw_random: allow rng_dev_read() to return hardware errors.
[S390] Vertical cpu management.
[S390] cpu topology support for s390.
...
|
|
This breaks out the ptrace handling from get_signal_to_deliver into a
new subroutine. The actual code there doesn't change, and it gets
inlined into nearly identical compiled code. This makes the function
substantially shorter and thus easier to read, and it nicely isolates
the ptrace magic.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When I ran a test program to fork mass processes and at the same time
'cat /cgroup/tasks', I got the following oops:
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:72!
invalid opcode: 0000 [#1] SMP
Pid: 4178, comm: a.out Not tainted (2.6.25-rc9 #72)
...
Call Trace:
[<c044a5f9>] ? cgroup_exit+0x55/0x94
[<c0427acf>] ? do_exit+0x217/0x5ba
[<c0427ed7>] ? do_group_exit+0.65/0x7c
[<c0427efd>] ? sys_exit_group+0xf/0x11
[<c0404842>] ? syscall_call+0x7/0xb
[<c05e0000>] ? init_cyrix+0x2fa/0x479
...
EIP: [<c04df671>] list_del+0x35/0x53 SS:ESP 0068:ebc7df4
---[ end trace caffb7332252612b ]---
Fixing recursive fault but reboot is needed!
After digging into the code and debugging, I finlly found out a race
situation:
do_exit()
->cgroup_exit()
->if (!list_empty(&tsk->cg_list))
list_del(&tsk->cg_list);
cgroup_iter_start()
->cgroup_enable_task_cg_list()
->list_add(&tsk->cg_list, ..);
In this case the list won't be deleted though the process has exited.
We got two bug reports in the past, which seem to be the same bug as
this one:
http://lkml.org/lkml/2008/3/5/332
http://lkml.org/lkml/2007/10/17/224
Actually sometimes I got oops on list_del, sometimes oops on list_add.
And I can change my test program a bit to trigger other oops.
The patch has been tested both on x86_32 and x86_64.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On the ppc 4xx architecture the instruction cache must be flushed as
well as the data cache. This patch just makes it generic for all
architectures where CACHE_FLUSH_IS_SAFE is set to 1.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Fix the problem of protecting the kgdb handle_exception exit
which had an NMI race condition, while trying to restore
normal system operation.
There was a small window after the master processor sets cpu_in_debug
to zero but before it has set kgdb_active to zero where a
non-master processor in an SMP system could receive an NMI and
re-enter the kgdb_wait() loop.
As long as the master processor sets the cpu_in_debug before sending
the cpu roundup the cpu_in_debug variable can also be used to guard
against the race condition.
The kgdb_wait() function no longer needs to check
kgdb_active because it is done in the arch specific code
and handled along with the nmi traps at the low level.
This also allows kgdb_wait() to exit correctly if it was
entered for some unknown reason due to a spurious NMI that
could not be handled by the arch specific code.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
kgdb core fixes:
- Check to see that mm->mmap_cache is not null before calling
flush_cache_range(), else on arch=ARM it will cause a fatal
fault.
- Breakpoints should only be restored if they are in the BP_ACTIVE
state.
- Fix a typo in comments to "kgdb_register_io_module"
x86 kgdb fixes:
- Fix the x86 arch handler such that on a kill or detach that the
appropriate cleanup on the single stepping flags gets run.
- Add in the DIE_NMIWATCHDOG call for x86_64
- Touch the nmi watchdog before returning the system to normal
operation after performing any kind of kgdb operation, else
the possibility exists to trigger the watchdog.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Fix two regressions dealing with the kgdb core.
1) kgdb_skipexception and kgdb_post_primary_code are optional
functions that are only required on archs that need special exception
fixups.
2) The kernel address space scope must be set on any probe_kernel_*
function or archs such as ARCH=arm will not allow access to the kernel
memory space. As an example, it is required to allow the full kernel
address space is when you the kernel debugger to inspect a system
call.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Add HW breakpoints into the arch specific portion of x86 kgdb. In the
current x86 kernel.org kernels HW breakpoints are changed out in lazy
fashion because there is no infrastructure around changing them when
changing to a kernel task or entering the kernel mode via a system
call. This lazy approach means that if a user process uses HW
breakpoints the kgdb will loose out. This is an acceptable trade off
because the developer debugging the kernel is assumed to know what is
going on system wide and would be aware of this trade off.
There is a minor bug fix to the kgdb core so as to correctly call the
hw breakpoint functions with a valid value from the enum.
There is also a minor change to the x86_64 startup code when using
early HW breakpoints. When the debugger is connected, the cpu startup
code must not zero out the HW breakpoint registers or you cannot hit
the breakpoints you are interested in, in the first place.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
If kgdb does remove a breakpoint that had a problem on the recursion
check, it should also print the address of the breakpoint.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
In order to not trip the clocksource watchdog, kgdb must touch the
clocksource watchdog on the return to normal system run state.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
kgdb core code. Handles the protocol and the arch details.
[ mingo@elte.hu: heavily modified, simplified and cleaned up. ]
[ xemul@openvz.org: use find_task_by_pid_ns ]
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Move documentation from semaphore.h to semaphore.c as requested by
Andrew Morton. Also reformat to kernel-doc style and add some more
notes about the implementation.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
|
|
By removing the negative values of 'count' and relying on the wait_list to
indicate whether we have any waiters, we can simplify the implementation
by removing the protection against an unlikely race condition. Thanks to
David Howells for his suggestions.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
|
|
ACPI currently emulates a timeout for semaphores with calls to
down_trylock and sleep. This produces horrible behaviour in terms of
fairness and excessive wakeups. Now that we have a unified semaphore
implementation, adding a real down_trylock is almost trivial.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
|
|
down_killable() is the functional counterpart of mutex_lock_killable.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
|
|
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
|
|
> Generic code is not supposed to include irq.h. Replace this include
> by linux/hardirq.h instead and add/replace an include of linux/irq.h
> in asm header files where necessary.
> This change should only matter for architectures that make use of
> GENERIC_CLOCKEVENTS.
> Architectures in question are mips, x86, arm, sh, powerpc, uml and sparc64.
>
> I did some cross compile tests for mips, x86_64, arm, powerpc and sparc64.
> This patch fixes also build breakages caused by the include replacement in
> tick-common.h.
I generally dislike adding optional linux/* includes in asm/* includes -
I'm nervous about this causing include loops.
However, there's a separate point to be discussed here.
That is, what interfaces are expected of every architecture in the kernel.
If generic code wants to be able to set the affinity of interrupts, then
that needs to become part of the interfaces listed in linux/interrupt.h
rather than linux/irq.h.
So what I suggest is this approach instead (against Linus' tree of a
couple of days ago) - we move irq_set_affinity() and irq_can_set_affinity()
to linux/interrupt.h, change the linux/irq.h includes to linux/interrupt.h
and include asm/irq_regs.h where needed (asm/irq_regs.h is supposed to be
rarely used include since not much touches the stacked parent context
registers.)
Build tested on ARM PXA family kernels and ARM's Realview platform
kernels which both use genirq.
[ tglx@linutronix.de: add GENERIC_HARDIRQ dependencies ]
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
When I cleaned up printk() and split up the printk locking logic in
commit 266c2e0abeca649fa6667a1a427ad1da507c6375 ("Make printk() console
semaphore accesses sensible") I had incorrectly moved the call to
have_callable_console() outside of the console semaphore.
That was buggy. The console semaphore protects the console_drivers list
that is used by have_callable_console().
Thanks go to Bongani Hlope who saw this as a hang on shutdown and reboot
and bisected the bug to the right commit, and tested this patch. See
http://lkml.org/lkml/2008/4/11/315
Bisected-and-tested-by: Bongani Hlope <bonganilinux@mweb.co.za>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
revert "sched: fix fair sleepers" (e22ecef1d2658ba54ed7d3fdb5d60829fb434c23),
because it is causing audio skipping, see:
http://bugzilla.kernel.org/show_bug.cgi?id=10428
the patch is correct and the real cause of the skipping is not
understood (tracing makes it go away), but time has run out so we'll
revert it and re-try in 2.6.26.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Extend the /proc/<pid>/cgroup file to include the appropriate hierarchy ID on
each line.
Currently this ID isn't really needed since a hierarchy can be completely
identified by the set of subsystems bound to it, but this is likely to change
in the near future in order to support stateless subsystems and
merging/rebinding of subsystems. Getting this change into 2.6.25 reduces the
need for an API change later.
Signed-off-by: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The prevent_tail_call() macro works around the problem of the compiler
clobbering argument words on the stack, which for asmlinkage functions
is the caller's (user's) struct pt_regs. The tail/sibling-call
optimization is not the only way that the compiler can decide to use
stack argument words as scratch space, which we have to prevent.
Other optimizations can do it too.
Until we have new compiler support to make "asmlinkage" binding on the
compiler's own use of the stack argument frame, we have work around all
the manifestations of this issue that crop up.
More cases seem to be prevented by also keeping the incoming argument
variables live at the end of the function. This makes their original
stack slots attractive places to leave those variables, so the compiler
tends not clobber them for something else. It's still no guarantee, but
it handles some observed cases that prevent_tail_call() did not.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The effects of cgroup_disable=foo are:
- foo isn't auto-mounted if you mount all cgroups in a single hierarchy
- foo isn't visible as an individually mountable subsystem
As a result there will only ever be one call to foo->create(), at init time;
all processes will stay in this group, and the group will never be mounted on
a visible hierarchy. Any additional effects (e.g. not allocating metadata)
are up to the foo subsystem.
This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
but it could easily be extended to do so if any of the early_init systems
wanted it - I think it would just involve some nastier parameter processing
since it would occur before the command-line argument parser had been run.
Hugh said:
Ballpark figures, I'm trying to get this question out rather than
processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
to the affected paths, booting with cgroup_disable=memory cuts that back to
1% overhead (due to slightly bigger struct page).
I'm no expert on distros, they may have no interest whatever in
CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
without it, or apply the cgroup_disable=memory patches.
Unix bench's execl test result on x86_64 was
== just after boot without mounting any cgroup fs.==
mem_cgorup=off : Execl Throughput 43.0 3150.1 732.6
mem_cgroup=on : Execl Throughput 43.0 2932.6 682.0
==
[lizf@cn.fujitsu.com: fix boot option parsing]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Markers do not mix well with CONFIG_PREEMPT_RCU because it uses
preempt_disable/enable() and not rcu_read_lock/unlock for minimal
intrusiveness. We would need call_sched and sched_barrier primitives.
Currently, the modification (connection and disconnection) of probes
from markers requires changes to the data structure done in RCU-style :
a new data structure is created, the pointer is changed atomically, a
quiescent state is reached and then the old data structure is freed.
The quiescent state is reached once all the currently running
preempt_disable regions are done running. We use the call_rcu mechanism
to execute kfree() after such quiescent state has been reached.
However, the new CONFIG_PREEMPT_RCU version of call_rcu and rcu_barrier
does not guarantee that all preempt_disable code regions have finished,
hence the race.
The "proper" way to do this is to use rcu_read_lock/unlock, but we don't
want to use it to minimize intrusiveness on the traced system. (we do
not want the marker code to call into much of the OS code, because it
would quickly restrict what can and cannot be instrumented, such as the
scheduler).
The temporary fix, until we get call_rcu_sched and rcu_barrier_sched in
mainline, is to use synchronize_sched before each call_rcu calls, so we
wait for the quiescent state in the system call code path. It will slow
down batch marker enable/disable, but will make sure the race is gone.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Silence two kerneldoc warnings.
Warning(kernel/audit.c:1276): No description found for parameter 'string'
Warning(kernel/audit.c:1276): No description found for parameter 'len'
[also fix a typo for bonus points]
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Call mm_free_cgroup earlier. Otherwise a reference due to lazy mm switching
can prevent cgroup removal.
Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The futex init function is called init(). This is a pain in the neck
when debugging when you code dies in ... init :-)
This renames it to futex_init().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
NOHZ: reevaluate idle sleep length after add_timer_on()
clocksource: revert: use init_timer_deferrable for clocksource_watchdog
|
|
relay doesn't reference the pages it adds, however we need a non-NULL
hook or splice_to_pipe() can oops.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
I found that relay files can be read by pread(2). I fix it,
for relay files are not capable of seeking.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
|
add_timer_on() can add a timer on a CPU which is currently in a long
idle sleep, but the timer wheel is not reevaluated by the nohz code on
that CPU. So a timer can be delayed for quite a long time. This
triggered a false positive in the clocksource watchdog code.
To avoid this we need to wake up the idle CPU and enforce the
reevaluation of the timer wheel for the next timer event.
Add a function, which checks a given CPU for idle state, marks the
idle task with NEED_RESCHED and sends a reschedule IPI to notify the
other CPU of the change in the timer wheel.
Call this function from add_timer_on().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
--
include/linux/sched.h | 6 ++++++
kernel/sched.c | 43 +++++++++++++++++++++++++++++++++++++++++++
kernel/timer.c | 10 +++++++++-
3 files changed, 58 insertions(+), 1 deletion(-)
|
|
Revert
commit 1077f5a917b7c630231037826b344b2f7f5b903f
Author: Parag Warudkar <parag.warudkar@gmail.com>
Date: Wed Jan 30 13:30:01 2008 +0100
clocksource.c: use init_timer_deferrable for clocksource_watchdog
clocksource_watchdog can use a deferrable timer - reduces wakeups from
idle per second.
The watchdog timer needs to run with the specified interval. Otherwise
it will miss the possible wrap of the watchdog clocksource.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
|
|
The printk() logic on when/how to get the console semaphore was
unreadable, this splits the code up into a few helper functions and
makes it easier to follow what is going on.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In case we're accounting from a sub-namespace, the tgids reported will not
refer to the right namespace.
Save the pid_namespace we're accounting in on the acct_glbs and use it in
do_acct_process.
Two less :) places using the task_struct.tgid member.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is minor, but dereferencing even current real_parent is not safe on debug
kernels, since the memory, this points to, can be unmapped - RCU protection is
required.
Besides, the tgid field is deprecated and is to be replaced with task_tgid_xxx
call (the 2nd patch), so RCU will be required anyway.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As Paul pointed out, the ACCESS_ONCE are not needed because we already have
the explicit surrounding memory barriers.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Adrian Bunk <adrian.bunk@movial.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add comments requested by Andrew.
Updated comments about synchronize_sched(). Since we use call_rcu and
rcu_barrier now, these comments were out of sync with the code.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Adrian Bunk <adrian.bunk@movial.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The printk() can deadlock because it can wake up klogd(), and
task enqueueing will try to read the time in order to set a hrtimer.
Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Debugged-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Will be called each time the scheduling domains are rebuild.
Needed for architectures that don't have a static cpu topology.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Needed so it can be called from outside of sched.c.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Combine two unlikely's
Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
TREE_AVG and APPROX_AVG are initial task placement policies that have been
disabled for a long while.. time to remove them.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
[NET] ifb: set separate lockdep classes for queue locks
[IPV6] KCONFIG: Fix description about IPV6_TUNNEL.
[TCP]: Fix shrinking windows with window scaling
netpoll: zap_completion_queue: adjust skb->users counter
bridge: use time_before() in br_fdb_cleanup()
[TG3]: Fix build warning on sparc32.
MAINTAINERS: bluez-devel is subscribers-only
audit: netlink socket can be auto-bound to pid other than current->pid (v2)
[NET]: Fix permissions of /proc/net
[SCTP]: Fix a race between module load and protosw access
[NETFILTER]: ipt_recent: sanity check hit count
[NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup()
[RT2X00] drivers/net/wireless/rt2x00/rt2x00dev.c: remove dead code, fix warning
[IPV4]: esp_output() misannotations
[8021Q]: vlan_dev misannotations
xfrm: ->eth_proto is __be16
[IPV4]: ipv4_is_lbcast() misannotations
[SUNRPC]: net/* NULL noise
[SCTP]: fix misannotated __sctp_rcv_asconf_lookup()
[PKT_SCHED]: annotate cls_u32
...
|
|
From: Pavel Emelyanov <xemul@openvz.org>
This patch is based on the one from Thomas.
The kauditd_thread() calls the netlink_unicast() and passes
the audit_pid to it. The audit_pid, in turn, is received from
the user space and the tool (I've checked the audit v1.6.9)
uses getpid() to pass one in the kernel. Besides, this tool
doesn't bind the netlink socket to this id, but simply creates
it allowing the kernel to auto-bind one.
That's the preamble.
The problem is that netlink_autobind() _does_not_ guarantees
that the socket will be auto-bound to the current pid. Instead
it uses the current pid as a hint to start looking for a free
id. So, in case of conflict, the audit messages can be sent
to a wrong socket. This can happen (it's unlikely, but can be)
in case some task opens more than one netlink sockets and then
the audit one starts - in this case the audit's pid can be busy
and its socket will be bound to another id.
The proposal is to introduce an audit_nlk_pid in audit subsys,
that will point to the netlink socket to send packets to. It
will most often be equal to audit_pid. The socket id can be
got from the skb's netlink CB right in the audit_receive_msg.
The audit_nlk_pid reset to 0 is not required, since all the
decisions are taken based on audit_pid value only.
Later, if the audit tools will bind the socket themselves, the
kernel will have to provide a way to setup the audit_nlk_pid
as well.
A good side effect of this patch is that audit_pid can later
be converted to struct pid, as it is not longer safe to use
pid_t-s in the presence of pid namespaces. But audit code still
uses the tgid from task_struct in the audit_signal_info and in
the audit_filter_syscall.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Revert commit 1ada5cba6a0318f90e45b38557e7b5206a9cba38 ("clocksource:
make clocksource watchdog cycle through online CPUs") due to the
regression reported by Gabriel C at
http://lkml.org/lkml/2008/2/24/281
(short vesion: it makes TSC be marked as always unstable on his
machine).
Cc: Andi Kleen <ak@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robert Hancock <hancockr@shaw.ca>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Gabriel C <nix.or.die@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
reduce wake-up granularity for better interactivity.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Wakeup-buddy tasks are cache-hot - this makes it a bit harder
for the load-balancer to tear them apart. (but it's still possible,
if the load is sufficiently assymetric)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|