aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2009-05-19futex: setup writeable mapping for futex ops which modify user space dataThomas Gleixner
The futex code installs a read only mapping via get_user_pages_fast() even if the futex op function has to modify user space data. The eventual fault was fixed up by futex_handle_fault() which walked the VMA with mmap_sem held. After the cleanup patches which removed the mmap_sem dependency of the futex code commit 4dc5b7a36a49eff97050894cf1b3a9a02523717 (futex: clean up fault logic) removed the private VMA walk logic from the futex code. This change results in a stale RO mapping which is not fixed up. Instead of reintroducing the previous fault logic we set up the mapping in get_user_pages_fast() read/write for all operations which modify user space data. Also handle private futexes in the same way and make the current unconditional access_ok(VERIFY_WRITE) depend on the futex op. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> CC: stable@kernel.org
2009-05-18Merge branches 'sched-fixes-for-linus-2' and 'core-fixes-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix fallback sched_clock()'s offset when using jiffies * 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: increase MAX_LOCKDEP_ENTRIES and MAX_LOCKDEP_CHAINS
2009-05-18Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Append prompt in /debug/tracing/README file x86/function-graph: fix constraint for recording old return value
2009-05-18Merge commit 'v2.6.30-rc6' into perfcounters/coreIngo Molnar
Merge reason: this branch was on an -rc4 base, merge it up to -rc6 to get the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: check sysdev_suspend(PMSG_FREEZE) return value
2009-05-17perf_counter: fix threaded task exitIngo Molnar
Flushing counters in __exit_signal() with irqs disabled is not a good idea as perf_counter_exit_task() acquires mutexes. So flush it before acquiring the tasklist lock. (Note, we still need a fix for when the PID has been unhashed.) [ Impact: fix crash with inherited counters ] Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-17perf_counter: Fix counter inheritancePeter Zijlstra
Srivatsa Vaddagiri reported that a Java workload triggers this warning in kernel/exit.c: WARN_ON_ONCE(!list_empty(&tsk->perf_counter_ctx.counter_list)); Add the inherited counter propagation on self-detach, this could cause counter leaks and incomplete stats in threaded code like the below: #include <pthread.h> #include <unistd.h> void *thread(void *arg) { sleep(5); return NULL; } void main(void) { pthread_t thr; pthread_create(&thr, NULL, thread, NULL); } Reported-by: Srivatsa Vaddagiri <vatsa@in.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-17perf_counter: Fix inheritance cleanup codePeter Zijlstra
Clean up code that open-coded the list_{add,del}_counter() code in __perf_counter_exit_task() which consequently diverged. This could lead to software counter crashes. Also, fold the ctx->nr_counter inc/dec into those functions and clean up some of the related code. [ Impact: fix potential sw counter crash, cleanup ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-16Fix caller information for warn_slowpath_nullLinus Torvalds
Ian Campbell noticed that since "Eliminate thousands of warnings with gcc 3.2 build" (commit 57adc4d2dbf968fdbe516359688094eef4d46581) all WARN_ON()'s currently appear to come from warn_slowpath_null(), eg: WARNING: at kernel/softirq.c:143 warn_slowpath_null+0x1c/0x20() because now that warn_slowpath_null() is in the call path, the __builtin_return_address(0) returns that, rather than the place that caused the warning. Fix this by splitting up the warn_slowpath_null/fmt cases differently, using a common helper function, and getting the return address in the right place. This also happens to avoid the unnecessary stack usage for the non-stdargs case, and just generally cleans things up. Make the function name printout use %pS while at it. Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-15PM: check sysdev_suspend(PMSG_FREEZE) return valueBjorn Helgaas
Check the return value of sysdev_suspend(). I think this was a typo. Without this change, the following "if" check is always false. I also changed the error message so it's distinguishable from the similar message a few lines above. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-05-15tracing: Append prompt in /debug/tracing/README fileGeunSik Lim
append prompt in /debug/tracing/README file. This is trivial issue. Fix typo Mini Howto file(README) for ftrace. [ Impact: cleanup ] Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: williams <williams@redhat.com> LKML-Reference: <1242289418.31161.45.camel@centos51> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: gdb documentation fix kgdb,i386: use address that SP register points to in the exception frame sysrq, intel_fb: fix sysrq g collision
2009-05-15perf_counter: allow arch to supply event misc flags and instruction pointerPaul Mackerras
At present the values we put in overflow events for the misc flags indicating processor mode and the instruction pointer are obtained using the standard user_mode() and instruction_pointer() functions. Those functions tell you where the performance monitor interrupt was taken, which might not be exactly where the counter overflow occurred, for example because interrupts were disabled at the point where the overflow occurred, or because the processor had many instructions in flight and chose to complete some more instructions beyond the one that caused the counter overflow. Some architectures (e.g. powerpc) can supply more precise information about where the counter overflow occurred and the processor mode at that point. This introduces new functions, perf_misc_flags() and perf_instruction_pointer(), which arch code can override to provide more precise information if available. They have default implementations which are identical to the existing code. This also adds a new misc flag value, PERF_EVENT_MISC_HYPERVISOR, for the case where a counter overflow occurred in the hypervisor. We encode the processor mode in the 2 bits previously used to indicate user or kernel mode; the values for user and kernel mode are unchanged and hypervisor mode is indicated by both bits being set. [ Impact: generalize perfcounter core facilities ] Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <18956.1272.818511.561835@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: frequency based adaptive irq_period, 32-bit fixPeter Zijlstra
fix: kernel/built-in.o: In function `perf_counter_alloc': perf_counter.c:(.text+0x7ddc7): undefined reference to `__udivdi3' [ Impact: build fix on 32-bit systems ] Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1242394667.6642.1887.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: frequency based adaptive irq_periodPeter Zijlstra
Instead of specifying the irq_period for a counter, provide a target interrupt frequency and dynamically adapt the irq_period to match this frequency. [ Impact: new perf-counter attribute/feature ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090515132018.646195868@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: per user mlock giftPeter Zijlstra
Instead of a per-process mlock gift for perf-counters, use a per-user gift so that there is less of a DoS potential. [ Impact: allow less worst-case unprivileged memory consumption ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090515132018.496182835@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: remove perf_disable/enable exportsPeter Zijlstra
Now that ACPI idle doesn't use it anymore, remove the exports. [ Impact: remove dead code/data ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <20090515132018.429826617@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15sysrq, intel_fb: fix sysrq g collisionJason Wessel
Commit 79e539453b34e35f39299a899d263b0a1f1670bd introduced a regression where you cannot use sysrq 'g' to enter kgdb. The solution is to move the intel fb sysrq over to V for video instead of G for graphics. The SMP VOYAGER code to register for the sysrq-v is not anywhere to be found in the mainline kernel, so the comments in the code were cleaned up as well. This patch also cleans up the sysrq definitions for kgdb to make it generic for the kernel debugger, such that the sysrq 'g' can be used in the future to enter a gdbstub or another kernel debugger. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-05-15Revert "mm: add /proc controls for pdflush threads"Jens Axboe
This reverts commit fafd688e4c0c34da0f3de909881117d374e4c7af. Work is progressing to switch away from pdflush as the process backing for flushing out dirty data. So it seems pointless to add more knobs to control pdflush threads. The original author of the patch did not have any specific use cases for adding the knobs, so we can easily revert this before 2.6.30 to avoid having to maintain this API forever. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-15perf_counter: Rework the perf counter disable/enablePeter Zijlstra
The current disable/enable mechanism is: token = hw_perf_save_disable(); ... /* do bits */ ... hw_perf_restore(token); This works well, provided that the use nests properly. Except we don't. x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore provide a reference counter disable/enable interface, where the first disable disables the hardware, and the last enable enables the hardware again. [ Impact: refactor, simplify the PMU disable/enable logic ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15perf_counter: Fix perf_output_copy() WARN to account for overflowPeter Zijlstra
The simple reservation test in perf_output_copy() failed to take unsigned int overflow into account, fix this. [ Impact: fix false positive warning with more than 4GB of profiling data ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12lockdep: increase MAX_LOCKDEP_ENTRIES and MAX_LOCKDEP_CHAINSIngo Molnar
Now that lockdep coverage has increased it has become easier to run out of entries: [ 21.401387] BUG: MAX_LOCKDEP_ENTRIES too low! [ 21.402007] turning off the locking correctness validator. [ 21.402007] Pid: 1555, comm: S99local Not tainted 2.6.30-rc5-tip #2 [ 21.402007] Call Trace: [ 21.402007] [<ffffffff81069789>] add_lock_to_list+0x53/0xba [ 21.402007] [<ffffffff810eb615>] ? lookup_mnt+0x19/0x53 [ 21.402007] [<ffffffff8106be14>] check_prev_add+0x14b/0x1c7 [ 21.402007] [<ffffffff8106c304>] validate_chain+0x474/0x52a [ 21.402007] [<ffffffff8106c6fc>] __lock_acquire+0x342/0x3c7 [ 21.402007] [<ffffffff8106c842>] lock_acquire+0xc1/0xe5 [ 21.402007] [<ffffffff810eb615>] ? lookup_mnt+0x19/0x53 [ 21.402007] [<ffffffff8153aedc>] _spin_lock+0x31/0x66 Double the size - as we've done in the past. [ Impact: allow lockdep to cover more locks ] Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12perf_counter: call hw_perf_save_disable/restore around group_sched_inPaul Mackerras
I noticed that when enabling a group via the PERF_COUNTER_IOC_ENABLE ioctl on the group leader, the counters weren't enabled and counting immediately on return from the ioctl, but did start counting a little while later (presumably after a context switch). The reason was that __perf_counter_enable calls group_sched_in which calls hw_perf_group_sched_in, which on powerpc assumes that the caller has called hw_perf_save_disable already. Until commit 46d686c6 ("perf_counter: put whole group on when enabling group leader") it was true that all callers of group_sched_in had called hw_perf_save_disable first, and the powerpc hw_perf_group_sched_in relies on that (there isn't an x86 version). This fixes the problem by putting calls to hw_perf_save_disable / hw_perf_restore around the calls to group_sched_in and counter_sched_in in __perf_counter_enable. Having the calls to hw_perf_save_disable/restore around the counter_sched_in call is harmless and makes this call consistent with the other call sites of counter_sched_in, which have all called hw_perf_save_disable first. [ Impact: more precise counter group disable/enable functionality ] Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <18953.25733.53359.147452@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11perf_counter: call atomic64_set for counter->countPaul Mackerras
A compile warning triggered because we are calling atomic_set(&counter->count). But since counter->count is an atomic64_t, we have to use atomic64_set. So the count can be set short, resulting in the reset ioctl only resetting the low word. [ Impact: clear counter properly during the reset ioctl ] Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <18951.48285.270311.981806@drongo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11perf_counter: don't count scheduler ticks as context switchesPaul Mackerras
The context-switch software counter gives inflated values at present because each scheduler tick and each process-wide counter enable/disable prctl gets counted as a context switch. This happens because perf_counter_task_tick, perf_counter_task_disable and perf_counter_task_enable all call perf_counter_task_sched_out, which calls perf_swcounter_event to record a context switch event. This fixes it by introducing a variant of perf_counter_task_sched_out with two underscores in front for internal use within the perf_counter code, and makes perf_counter_task_{tick,disable,enable} call it. This variant doesn't record a context switch event, and takes a struct perf_counter_context *. This adds the new variant rather than changing the behaviour or interface of perf_counter_task_sched_out because that is called from other code. [ Impact: fix inflated context-switch event counts ] Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <18951.48034.485580.498953@drongo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11perf_counter: Put whole group on when enabling group leaderPaul Mackerras
Currently, if you have a group where the leader is disabled and there are siblings that are enabled, and then you enable the leader, we only put the leader on the PMU, and not its enabled siblings. This is incorrect, since the enabled group members should be all on or all off at any given point. This fixes it by adding a call to group_sched_in in __perf_counter_enable in the case where we're enabling a group leader. To avoid the need for a forward declaration this also moves group_sched_in up before __perf_counter_enable. The actual content of group_sched_in is unchanged by this patch. [ Impact: fix bug in counter enable code ] Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <18951.34946.451546.691693@drongo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-09Convert obvious places to deactivate_locked_super()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-05-09sched: Fix fallback sched_clock()'s offset when using jiffiesRon
Account for the initial offset to the jiffy count. [ Impact: fix printk timestamps on architectures using fallback sched_clock() ] Signed-off-by: Ron Lee <ron@debian.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08kprobes: fix to use text_mutex around arm/disarm kprobeMasami Hiramatsu
Fix kprobes to lock text_mutex around some arch_arm/disarm_kprobe() which are newly added by commit de5bd88d5a5cce3cacea904d3503e5ebdb3852a2. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-08perf_counter: add PERF_RECORD_CPUPeter Zijlstra
Allow recording the CPU number the event was generated on. RFC: this leaves a u32 as reserved, should we fill in the node_id() there, or leave this open for future extention, as userspace can already easily do the cpu->node mapping if needed. [ Impact: extend perfcounter output record format ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090508170029.008627711@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08perf_counter: add PERF_RECORD_CONFIGPeter Zijlstra
Much like CONFIG_RECORD_GROUP records the hw_event.config to identify the values, allow to record this for all counters. [ Impact: extend perfcounter output record format ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090508170028.923228280@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08perf_counter: rework ioctl()sPeter Zijlstra
Corey noticed that ioctl()s on grouped counters didn't work on the whole group. This extends the ioctl() interface to take a second argument that is interpreted as a flags field. We then provide PERF_IOC_FLAG_GROUP to toggle the behaviour. Having this flag gives the greatest flexibility, allowing you to individually enable/disable/reset counters in a group, or all together. [ Impact: fix group counter enable/disable semantics ] Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090508170028.837558214@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08perf_counter: optimize perf_counter_task_tick()Peter Zijlstra
perf_counter_task_tick() does way too much work to find out there's nothing to do. Provide an easy short-circuit for the normal case where there are no counters on the system. [ Impact: micro-optimization ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090508170028.750619201@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-06Eliminate thousands of warnings with gcc 3.2 buildAndi Kleen
When building with gcc 3.2 I get thousands of warnings such as include/linux/gfp.h: In function `allocflags_to_migratetype': include/linux/gfp.h:105: warning: null format string due to passing a NULL format string to warn_slowpath() in #define __WARN() warn_slowpath(__FILE__, __LINE__, NULL) Split this case out into a separate call. This also shrinks the kernel slightly: text data bss dec hex filename 4802274 707668 712704 6222646 5ef336 vmlinux text data bss dec hex filename 4799027 703572 712704 6215303 5ed687 vmlinux due to removeing one argument from the commonly-called __WARN(). [akpm@linux-foundation.org: reduce scope of `empty'] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-06inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positiveWu Fengguang
There is what we believe to be a false positive reported by lockdep. inotify_inode_queue_event() => take inotify_mutex => kernel_event() => kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim => dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock The plan is to fix this via lockdep annotation, but that is proving to be quite involved. The patch flips the allocation over to GFP_NFS to shut the warning up, for the 2.6.30 release. Hopefully we will fix this for real in 2.6.31. I'll queue a patch in -mm to switch it back to GFP_KERNEL so we don't forget. ================================= [ INFO: inconsistent lock state ] 2.6.30-rc2-next-20090417 #203 --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes: (&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0 {RECLAIM_FS-ON-W} state was registered at: [<ffffffff81079188>] mark_held_locks+0x68/0x90 [<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100 [<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0 [<ffffffff81130652>] kernel_event+0xe2/0x190 [<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230 [<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110 [<ffffffff8110444d>] vfs_create+0xcd/0x140 [<ffffffff8110825d>] do_filp_open+0x88d/0xa20 [<ffffffff810f6b68>] do_sys_open+0x98/0x140 [<ffffffff810f6c50>] sys_open+0x20/0x30 [<ffffffff8100c272>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 690455 hardirqs last enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80 hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0 softirqs last enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220 softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50 other info that might help us debug this: 2 locks held by kswapd0/380: #0: (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180 #1: (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0 stack backtrace: Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203 Call Trace: [<ffffffff810789ef>] print_usage_bug+0x19f/0x200 [<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50 [<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0 [<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0 [<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0 [<ffffffff810f478c>] ? slob_free+0x10c/0x370 [<ffffffff8107c0a1>] lock_acquire+0xe1/0x120 [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0 [<ffffffff81562d43>] mutex_lock_nested+0x63/0x420 [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0 [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0 [<ffffffff81012fe9>] ? sched_clock+0x9/0x10 [<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0 [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0 [<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0 [<ffffffff8110cb23>] d_kill+0x33/0x60 [<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350 [<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0 [<ffffffff810d0cc5>] shrink_slab+0x125/0x180 [<ffffffff810d1540>] kswapd+0x560/0x7a0 [<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0 [<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0 [<ffffffff8106555b>] kthread+0x5b/0xa0 [<ffffffff8100d40a>] child_rip+0xa/0x20 [<ffffffff8100cdd0>] ? restore_args+0x0/0x30 [<ffffffff81065500>] ? kthread+0x0/0xa0 [<ffffffff8100d400>] ? child_rip+0x0/0x20 [eparis@redhat.com: fix audit too] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matt Mackall <mpm@selenic.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Paris <eparis@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-06Merge branch 'core/locking' into perfcounters/coreIngo Molnar
Merge reason: we moved a mutex.h commit that originated from the perfcounters tree into core/locking - but now merge back that branch to solve a merge artifact and to pick up cleanups of this commit that happened in core/locking. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05Merge branch 'timers/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clockevents: prevent endless loop in tick_handle_periodic()
2009-05-05Merge branch 'irq/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "genirq: assert that irq handlers are indeed running in hardirq context"
2009-05-05Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: account system time properly
2009-05-05Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kernel/posix-cpu-timers.c: fix sparse warning dma-debug: remove broken dma memory leak detection for 2.6.30 locking: Documentation: lockdep-design.txt, fix note of state bits
2009-05-05Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: x86, mmiotrace: fix range test tracing: fix ref count in splice pages
2009-05-05perf_counter: inheritable sample countersPeter Zijlstra
Redirect the output to the parent counter and put in some sanity checks. [ Impact: new perfcounter feature - inherited sampling counters ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090505155437.331556171@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05perf_counter: fix the output lockPeter Zijlstra
Use -1 instead of 0 as unlocked, since 0 is a valid cpu number. ( This is not an issue right now but will be once we allow multiple counters to output to the same mmap area. ) [ Impact: prepare code for multi-counter profile output ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090505155437.232686598@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05perf_counter: provide an mlock thresholdPeter Zijlstra
Provide a threshold to relax the mlock accounting, increasing usability. Each counter gets perf_counter_mlock_kb for free. [ Impact: allow more mmap buffering ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090505155437.112113632@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05perf_counter: add ioctl(PERF_COUNTER_IOC_RESET)Peter Zijlstra
Provide a way to reset an existing counter - this eases PAPI libraries around perfcounters. Similar to read() it doesn't collapse pending child counters. [ Impact: new perfcounter fd ioctl method to reset counters ] Suggested-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090505155437.022272933@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-05perf_counter: uncouple data_head updates from wakeupsPeter Zijlstra
Keep data_head up-to-date irrespective of notifications. This fixes the case where you disable a counter and don't get a notification for the last few pending events, and it also allows polling usage. [ Impact: increase precision of perfcounter mmap-ed fields ] Suggested-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090505155436.925084300@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-04perf_counter: convert perf_resource_mutex to a spinlockIngo Molnar
Now percpu counters can be initialized very early. But the init sequence uses mutex_lock(). Fortunately, perf_resource_mutex should be a spinlock anyway, so convert it. [ Impact: fix crash due to early init mutex use ] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-04perf_counter: initialize the per-cpu context earlierIngo Molnar
percpu scheduling for perfcounters wants to take the context lock, but that lock first needs to be initialized. Currently it is an early_initcall() - but that is too late, the task tick runs much sooner than that. Call it explicitly from the scheduler init sequence instead. [ Impact: fix access-before-init crash ] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-04perf_counter: round-robin per-CPU counters tooIngo Molnar
This used to be unstable when we had the rq->lock dependencies, but now that they are that of the past we can turn on percpu counter RR too. [ Impact: handle counter over-commit for per-CPU counters too ] LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-02mm: prevent divide error for small values of vm_dirty_bytesAndrea Righi
Avoid setting less than two pages for vm_dirty_bytes: this is necessary to avoid potential division by 0 (like the following) in get_dirty_limits(). [ 49.951610] divide error: 0000 [#1] PREEMPT SMP [ 49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent [ 49.952195] CPU 1 [ 49.952195] Modules linked in: pcspkr [ 49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1 [ 49.952195] RIP: 0010:[<ffffffff802d39a9>] [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP: 0018:ffff88001de03a98 EFLAGS: 00010202 [ 49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3 [ 49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000 [ 49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000 [ 49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001 [ 49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78 [ 49.952195] FS: 00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000 [ 49.952195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0 [ 49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250) [ 49.952195] Stack: [ 49.952195] ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000 [ 49.952195] 00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218 [ 49.952195] 0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7 [ 49.952195] Call Trace: [ 49.952195] [<ffffffff802d3ed7>] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0 [ 49.952195] [<ffffffff80368f8e>] ? ext3_writeback_write_end+0x9e/0x120 [ 49.952195] [<ffffffff802cc7df>] generic_file_buffered_write+0x12f/0x330 [ 49.952195] [<ffffffff802cce8d>] __generic_file_aio_write_nolock+0x26d/0x460 [ 49.952195] [<ffffffff802cda32>] ? generic_file_aio_write+0x52/0xd0 [ 49.952195] [<ffffffff802cda49>] generic_file_aio_write+0x69/0xd0 [ 49.952195] [<ffffffff80365fa6>] ext3_file_write+0x26/0xc0 [ 49.952195] [<ffffffff803034d1>] do_sync_write+0xf1/0x140 [ 49.952195] [<ffffffff80290d1a>] ? get_lock_stats+0x2a/0x60 [ 49.952195] [<ffffffff80280730>] ? autoremove_wake_function+0x0/0x40 [ 49.952195] [<ffffffff8030411b>] vfs_write+0xcb/0x190 [ 49.952195] [<ffffffff803042d0>] sys_write+0x50/0x90 [ 49.952195] [<ffffffff8022ff6b>] system_call_fastpath+0x16/0x1b [ 49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 <48> f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02 [ 49.952195] RIP [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP <ffff88001de03a98> [ 50.096523] ---[ end trace 008d7aa02f244d7b ]--- Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Rientjes <rientjes@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>