aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-04-12blktrace: fix output of BLK_TC_PC eventsLi Zefan
BLK_TC_PC events should be treated differently with BLK_TC_FS events. Before this patch: # echo 1 > /sys/block/sda/sda1/trace/enable # echo pc > /sys/block/sda/sda1/trace/act_mask # echo blk > /debugfs/tracing/current_tracer # (generate some BLK_TC_PC events) # cat trace bash-2184 [000] 1774.275413: 8,7 I N [bash] bash-2184 [000] 1774.275435: 8,7 D N [bash] bash-2184 [000] 1774.275540: 8,7 I R [bash] bash-2184 [000] 1774.275547: 8,7 D R [bash] ksoftirqd/0-4 [000] 1774.275580: 8,7 C N 0 [0] bash-2184 [000] 1774.275648: 8,7 I R [bash] bash-2184 [000] 1774.275653: 8,7 D R [bash] ksoftirqd/0-4 [000] 1774.275682: 8,7 C N 0 [0] bash-2184 [000] 1774.275739: 8,7 I R [bash] bash-2184 [000] 1774.275744: 8,7 D R [bash] ksoftirqd/0-4 [000] 1774.275771: 8,7 C N 0 [0] bash-2184 [000] 1774.275804: 8,7 I R [bash] bash-2184 [000] 1774.275808: 8,7 D R [bash] ksoftirqd/0-4 [000] 1774.275836: 8,7 C N 0 [0] After this patch: # cat trace bash-2263 [000] 366.782149: 8,7 I N 0 (00 ..) [bash] bash-2263 [000] 366.782323: 8,7 D N 0 (00 ..) [bash] bash-2263 [000] 366.782557: 8,7 I R 8 (25 00 ..) [bash] bash-2263 [000] 366.782560: 8,7 D R 8 (25 00 ..) [bash] ksoftirqd/0-4 [000] 366.782582: 8,7 C N (25 00 ..) [0] bash-2263 [000] 366.782648: 8,7 I R 8 (5a 00 3f 00) [bash] bash-2263 [000] 366.782650: 8,7 D R 8 (5a 00 3f 00) [bash] ksoftirqd/0-4 [000] 366.782669: 8,7 C N (5a 00 3f 00) [0] bash-2263 [000] 366.782710: 8,7 I R 8 (5a 00 08 00) [bash] bash-2263 [000] 366.782713: 8,7 D R 8 (5a 00 08 00) [bash] ksoftirqd/0-4 [000] 366.782730: 8,7 C N (5a 00 08 00) [0] bash-2263 [000] 366.783375: 8,7 I R 36 (5a 00 08 00) [bash] bash-2263 [000] 366.783379: 8,7 D R 36 (5a 00 08 00) [bash] ksoftirqd/0-4 [000] 366.783404: 8,7 C N (5a 00 08 00) [0] This is what we do with PC events in user-space blktrace. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <49D32387.9040106@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12blktrace: fix output of unknown eventsLi Zefan
Not all events are pc (packet command) events. An event is a pc event only if it has BLK_TC_PC bit set. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <49D3236D.3090705@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12tracing, kmemtrace: Make kmem tracepoints use TRACE_EVENT macroZhaolei
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <49DEE6DA.80600@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and ↵Zhaolei
tracepoint part Impact: refactor code for future changes Current kmemtrace.h is used both as header file of kmemtrace and kmem's tracepoints definition. Tracepoints' definition file may be used by other code, and should only have definition of tracepoint. We can separate include/trace/kmemtrace.h into 2 files: include/linux/kmemtrace.h: header file for kmemtrace include/trace/kmem.h: definition of kmem tracepoints Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <49DEE68A.5040902@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12tracing: Document the event tracing systemTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1239479479-2603-3-git-send-email-tytso@mit.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-12tracing: Add documentation for the power tracerTheodore Ts'o
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <1239479479-2603-4-git-send-email-tytso@mit.edu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10tracing, net, skb tracepoint: make skb tracepoint use the TRACE_EVENT() macroZhaolei
TRACE_EVENT is a more generic way to define a tracepoint. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "Steven Rostedt ;" <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <49DD90D2.5020604@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10x86, function-graph: only save return values on x86_64Steven Rostedt
Impact: speed up The return to handler portion of the function graph tracer should only need to save the return values. The caller already saved off the registers that the callee can modify. The returning function already saved the registers it modified. When we call our own trace function it too will save the registers that the callee must restore. There's no reason to save off anything more that the registers used to return the values. Note, I did a complete kernel build with this modification and the function graph tracer running on x86_64. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10tracing/lockdep: report the time waited for a lockFrederic Weisbecker
While trying to optimize the new lock on reiserfs to replace the bkl, I find the lock tracing very useful though it lacks something important for performance (and latency) instrumentation: the time a task waits for a lock. That's what this patch implements: bash-4816 [000] 202.652815: lock_contended: lock_contended: &sb->s_type->i_mutex_key bash-4816 [000] 202.652819: lock_acquired: &rq->lock (0.000 us) <...>-4787 [000] 202.652825: lock_acquired: &rq->lock (0.000 us) <...>-4787 [000] 202.652829: lock_acquired: &rq->lock (0.000 us) bash-4816 [000] 202.652833: lock_acquired: &sb->s_type->i_mutex_key (16.005 us) As shown above, the "lock acquired" field is followed by the time it has been waiting for the lock. Usually, a lock contended entry is followed by a near lock_acquired entry with a non-zero time waited. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1238975373-15739-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10Merge branch 'tracing/urgent' into tracing/coreIngo Molnar
Merge reason: pick up both v2.6.30-rc1 [which includes tracing/urgent fixes] and pick up the current lineup of tracing/urgent fixes as well Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10tracing: fix splice return too largeLai Jiangshan
I got these from strace: splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192 I wanted to splice_read 4096 bytes, but it returns 8192 or larger. It is because the return value of tracing_buffers_splice_read() does not include "zero out any left over data" bytes. But tracing_buffers_read() includes these bytes, we make them consistent. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <49D46674.9030804@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10tracing: update file->f_pos when splice(2) itLai Jiangshan
Impact: Cleanup These two lines: if (unlikely(*ppos)) return -ESPIPE; in tracing_buffers_splice_read() are not needed, VFS layer has disabled seek(2). We remove these two lines, and then we can update file->f_pos. And tracing_buffers_read() updates file->f_pos, this fix make tracing_buffers_splice_read() updates file->f_pos too. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <49D46670.4010503@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10tracing: allocate page when neededLai Jiangshan
Impact: Cleanup Sometimes, we open trace_pipe_raw, but we don't read(2) it, we just splice(2) it, thus, the page is not used. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <49D4666B.4010608@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-10tracing: disable seeking for trace_pipe_rawLai Jiangshan
Impact: disable pread() We set tracing_buffers_fops.llseek to no_llseek, but we can still perform pread() to read this file. That is not expected. This fix uses nonseekable_open() to disable it. tracing_buffers_fops.llseek is still set to no_llseek, it mark this file is a "non-seekable device" and is used by sys_splice(). See also do_splice() or manual of splice(2): ERRORS EINVAL Target file system doesn't support splicing; neither of the descriptors refers to a pipe; or offset given for non-seekable device. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <49D46668.8030806@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09MN10300: Kill MN10300's own profiling KconfigDavid Howells
Kill MN10300's own profiling Kconfig as this is superfluous given that the profiling options have moved to init/Kconfig and arch/Kconfig. Not only is this now superfluous, but the dependencies are not correct. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-09FRV: Use <asm-generic/pgtable.h> in NOMMU modeDavid Howells
asm-frv/pgtable.h could just #include <asm-generic/pgtable.h> in NOMMU mode rather than #defining macros for lazy MMU and CPU stuff. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-09keys: Handle there being no fallback destination keyring for request_key()David Howells
When request_key() is called, without there being any standard process keyrings on which to fall back if a destination keyring is not specified, an oops is liable to occur when construct_alloc_key() calls down_write() on dest_keyring's semaphore. Due to function inlining this may be seen as an oops in down_write() as called from request_key_and_link(). This situation crops up during boot, where request_key() is called from within the kernel (such as in CIFS mounts) where nobody is actually logged in, and so PAM has not had a chance to create a session keyring and user keyrings to act as the fallback. To fix this, make construct_alloc_key() not attempt to cache a key if there is no fallback key if no destination keyring is given specifically. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-09afs: BUG to BUG_ON changesStoyan Gaydarov
Signed-off-by: Stoyan Gaydarov <stoyboyker@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-09Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: cpu_debug remove execute permission x86: smarten /proc/interrupts output for new counters x86: DMI match for the Dell DXP061 as it needs BIOS reboot x86: make 64 bit to use default_inquire_remote_apic x86, setup: un-resequence mode setting for VGA 80x34 and 80x60 modes x86, intel-iommu: fix X2APIC && !ACPI build failure
2009-04-09Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: consolidate documents blktrace: pass the right pointer to kfree() tracing/syscalls: use a dedicated file header tracing: append a comma to INIT_FTRACE_GRAPH
2009-04-09Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: do not count frozen tasks toward load sched: refresh MAINTAINERS entry sched: Print sched_group::__cpu_power in sched_domain_debug cpuacct: add per-cgroup utime/stime statistics posixtimers, sched: Fix posix clock monotonicity sched_rt: don't allocate cpumask in fastpath cpuacct: make cpuacct hierarchy walk in cpuacct_charge() safe when rcupreempt is used -v2
2009-04-09Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and ↵Linus Torvalds
'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: printk: fix wrong format string iter for printk futex: comment requeue key reference semantics * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: fix cpumask memory leak on offstack cpumask kernels * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF) posix-timers: fix RLIMIT_CPU && fork() timers: add missing kernel-doc
2009-04-09MN10300: Convert obsolete no_irq_type to no_irq_chipThomas Gleixner
Convert the last remaining users to no_irq_chip. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dmLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: dm kcopyd: fix callback race dm kcopyd: prepare for callback race fix dm: implement basic barrier support dm: remove dm_request loop dm: rework queueing and suspension dm: simplify dm_request loop dm: split DMF_BLOCK_IO flag into two dm: rearrange dm_wq_work dm: remove limited barrier support dm: add integrity support
2009-04-09module: try_then_request_module must waitHerbert Xu
Since the whole point of try_then_request_module is to retry the operation after a module has been loaded, we must wait for the module to fully load. Otherwise all sort of things start breaking, e.g., you won't be able to read your encrypted disks on the first attempt. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com> Tested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-09sched: do not count frozen tasks toward loadNathan Lynch
Freezing tasks via the cgroup freezer causes the load average to climb because the freezer's current implementation puts frozen tasks in uninterruptible sleep (D state). Some applications which perform job-scheduling functions consult the load average when making decisions. If a cgroup is frozen, the load average does not provide a useful measure of the system's utilization to such applications. This is especially inconvenient if the job scheduler employs the cgroup freezer as a mechanism for preempting low priority jobs. Contrast this with using SIGSTOP for the same purpose: the stopped tasks do not count toward system load. Change task_contributes_to_load() to return false if the task is frozen. This results in /proc/loadavg behavior that better meets users' expectations. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Nigel Cunningham <nigel@tuxonice.net> Tested-by: Nigel Cunningham <nigel@tuxonice.net> Cc: <stable@kernel.org> Cc: containers@lists.linux-foundation.org Cc: linux-pm@lists.linux-foundation.org Cc: Matt Helsley <matthltc@us.ibm.com> LKML-Reference: <20090408194512.47a99b95@manatee.lan> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09tracing: consolidate documentsLi Zefan
Move kmemtrace.txt, tracepoints.txt, ftrace.txt and mmiotrace.txt to the new trace/ directory. I didnt find any references to those documents in both source files and documents, so no extra work needs to be done. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> LKML-Reference: <49DD6E2B.6090200@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09x86: cpu_debug remove execute permissionJaswinder Singh Rajput
It seems by mistake these files got execute permissions so removing it. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> LKML-Reference: <1239211186.9037.2.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09blktrace: pass the right pointer to kfree()Li Zefan
Impact: fix kfree crash with non-standard act_mask string If passing a string with leading white spaces to strstrip(), the returned ptr != the original ptr. This bug was introduced by me. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <49DD694C.8020902@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-09tracing/syscalls: use a dedicated file headerFrederic Weisbecker
Impact: fix build warnings and possibe compat misbehavior on IA64 Building a kernel on ia64 might trigger these ugly build warnings: CC arch/ia64/ia32/sys_ia32.o In file included from arch/ia64/ia32/sys_ia32.c:55: arch/ia64/ia32/ia32priv.h:290:1: warning: "elf_check_arch" redefined In file included from include/linux/elf.h:7, from include/linux/module.h:14, from include/linux/ftrace.h:8, from include/linux/syscalls.h:68, from arch/ia64/ia32/sys_ia32.c:18: arch/ia64/include/asm/elf.h:19:1: warning: this is the location of the previous definition [...] sys_ia32.c includes linux/syscalls.h which in turn includes linux/ftrace.h to import the syscalls tracing prototypes. But including ftrace.h can pull too much things for a low level file, especially on ia64 where the ia32 private headers conflict with higher level headers. Now we isolate the syscall tracing headers in their own lightweight file. Reported-by: Tony Luck <tony.luck@intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Baron <jbaron@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Jiaying Zhang <jiayingz@google.com> Cc: Michael Rubin <mrubin@google.com> Cc: Martin Bligh <mbligh@google.com> Cc: Michael Davidson <md@google.com> LKML-Reference: <20090408184058.GB6017@nowhere> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: work_on_cpu(): rewrite it to create a kernel thread on demand kthread: move sched-realeted initialization from kthreadd context kthread: Don't looking for a task in create_kthread() #2
2009-04-08Merge git://git.infradead.org/battery-2.6Linus Torvalds
* git://git.infradead.org/battery-2.6: pda_power: Add optional OTG transceiver and voltage regulator support pcf50633_charger: Remove unused mbc_set_status function pcf50633_charger: Enable periodic charging restart
2009-04-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: cap_prctl: don't set error to 0 at 'no_change'
2009-04-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: igb: remove sysfs entry that was used to set the number of vfs igbvf: add new driver to support 82576 virtual functions drivers/net/eql.c: Fix a dev leakage. niu: Fix unused variable warning. r6040: set MODULE_VERSION bnx2: Don't use reserved names FEC driver: add missing #endif niu: Fix error handling mv643xx_eth: don't reset the rx coal timer on interface up smsc911x: correct debugging message on mii read timeout ethoc: fix library build errors netfilter: ctnetlink: fix regression in expectation handling netfilter: fix selection of "LED" target in netfilter netfilter: ip6tables regression fix
2009-04-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc: Hook up sys_preadv and sys_pwritev sparc64: add_node_ranges() must be __init serial: sunsu: sunsu_kbd_ms_init needs to be __devinit sparc: Fix section mismatch warnings in cs4231 sound driver. sparc64: Fix section mismatch warnings in PCI controller drivers. sparc64: Fix section mismatch warnings in power driver. sparc64: get_cells() can't be marked __init
2009-04-08Merge branch 'ext3-latency-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'ext3-latency-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext3: Try to avoid starting a transaction in writepage for data=writepage block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG
2009-04-09work_on_cpu(): rewrite it to create a kernel thread on demandAndrew Morton
Impact: circular locking bugfix The various implemetnations and proposed implemetnations of work_on_cpu() are vulnerable to various deadlocks because they all used queues of some form. Unrelated pieces of kernel code thus gained dependencies wherein if one work_on_cpu() caller holds a lock which some other work_on_cpu() callback also takes, the kernel could rarely deadlock. Fix this by creating a short-lived kernel thread for each work_on_cpu() invokation. This is not terribly fast, but the only current caller of work_on_cpu() is pci_call_probe(). It would be nice to find some other way of doing the node-local allocations in the PCI probe code so that we can zap work_on_cpu() altogether. The code there is rather nasty. I can't think of anything simple at this time... Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-09kthread: move sched-realeted initialization from kthreadd contextOleg Nesterov
kthreadd is the single thread which implements ths "create" request, move sched_setscheduler/etc from create_kthread() to kthread_create() to improve the scalability. We should be careful with sched_setscheduler(), use _nochek helper. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Vitaliy Gusev <vgusev@openvz.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-09kthread: Don't looking for a task in create_kthread() #2Vitaliy Gusev
Remove the unnecessary find_task_by_pid_ns(). kthread() can just use "current" to get the same result. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-09dm kcopyd: fix callback raceMikulas Patocka
If the thread calling dm_kcopyd_copy is delayed due to scheduling inside split_job/segment_complete and the subjobs complete before the loop in split_job completes, the kcopyd callback could be invoked from the thread that called dm_kcopyd_copy instead of the kcopyd workqueue. dm_kcopyd_copy -> split_job -> segment_complete -> job->fn() Snapshots depend on the fact that callbacks are called from the singlethreaded kcopyd workqueue and expect that there is no racing between individual callbacks. The racing between callbacks can lead to corruption of exception store and it can also mean that exception store callbacks are called twice for the same exception - a likely reason for crashes reported inside pending_complete() / remove_exception(). This patch fixes two problems: 1. job->fn being called from the thread that submitted the job (see above). - Fix: hand over the completion callback to the kcopyd thread. 2. job->fn(read_err, write_err, job->context); in segment_complete reports the error of the last subjob, not the union of all errors. - Fix: pass job->write_err to the callback to report all error bits (it is done already in run_complete_job) Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09dm kcopyd: prepare for callback race fixMikulas Patocka
Use a variable in segment_complete() to point to the dm_kcopyd_client struct and only release job->pages in run_complete_job() if any are defined. These changes are needed by the next patch. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09dm: implement basic barrier supportMikulas Patocka
Barriers are submitted to a worker thread that issues them in-order. The thread is modified so that when it sees a barrier request it waits for all pending IO before the request then submits the barrier and waits for it. (We must wait, otherwise it could be intermixed with following requests.) Errors from the barrier request are recorded in a per-device barrier_error variable. There may be only one barrier request in progress at once. For now, the barrier request is converted to a non-barrier request when sending it to the underlying device. This patch guarantees correct barrier behavior if the underlying device doesn't perform write-back caching. The same requirement existed before barriers were supported in dm. Bottom layer barrier support (sending barriers by target drivers) and handling devices with write-back caches will be done in further patches. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09dm: remove dm_request loopMikulas Patocka
Remove queue_io return value and a loop in dm_request. IO may be submitted to a worker thread with queue_io(). queue_io() sets DMF_QUEUE_IO_TO_THREAD so that all further IO is queued for the thread. When the thread finishes its work, it clears DMF_QUEUE_IO_TO_THREAD and from this point on, requests are submitted from dm_request again. This will be used for processing barriers. Remove the loop in dm_request. queue_io() can submit I/Os to the worker thread even if DMF_QUEUE_IO_TO_THREAD was not set. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09dm: rework queueing and suspensionMikulas Patocka
Rework shutting down on suspend and document the associated rules. Drop write lock in __split_and_process_bio to allow more processing concurrency. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09dm: simplify dm_request loopAlasdair G Kergon
Refactor the code in dm_request(). Require the new DMF_BLOCK_FOR_SUSPEND flag on readahead bios we will discard so we don't drop such bios while processing a barrier. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09dm: split DMF_BLOCK_IO flag into twoAlasdair G Kergon
Split the DMF_BLOCK_IO flag into two. DMF_BLOCK_IO_FOR_SUSPEND is set when I/O must be blocked while suspending a device. DMF_QUEUE_IO_TO_THREAD is set when I/O must be queued to a worker thread for later processing. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09dm: rearrange dm_wq_workAlasdair G Kergon
Refactor dm_wq_work() to make later patch more readable. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09dm: remove limited barrier supportMikulas Patocka
Prepare for full barrier implementation: first remove the restricted support. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09dm: add integrity supportMartin K. Petersen
This patch provides support for data integrity passthrough in the device mapper. - If one or more component devices support integrity an integrity profile is preallocated for the DM device. - If all component devices have compatible profiles the DM device is flagged as capable. - Handle integrity metadata when splitting and cloning bios. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-04-09cap_prctl: don't set error to 0 at 'no_change'Serge E. Hallyn
One-liner: capsh --print is broken without this patch. In certain cases, cap_prctl returns error > 0 for success. However, the 'no_change' label was always setting error to 0. As a result, for example, 'prctl(CAP_BSET_READ, N)' would always return 0. It should return 1 if a process has N in its bounding set (as by default it does). I'm keeping the no_change label even though it's now functionally the same as 'error'. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>