aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2009-08-29rcu: Changes from reviews: avoid casts, fix/add warnings, improve commentsPaul E. McKenney
Changes suggested by review comments from Josh Triplett and Mathieu Desnoyers. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <20090827220012.GA30525@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-29rcu: Create rcutree plugins to handle hotplug CPU for multi-level treesPaul E. McKenney
When offlining CPUs from a multi-level tree, there is the possibility of offlining the last CPU from a given node when there are preempted RCU read-side critical sections that started life on one of the CPUs on that node. In this case, the corresponding tasks will be enqueued via the task_struct's rcu_node_entry list_head onto one of the rcu_node's blocked_tasks[] lists. These tasks need to be moved somewhere else so that they will prevent the current grace period from ending. That somewhere is the root rcu_node. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <20090827215816.GA30472@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-26rcu: Remove lockdep annotations from RCU's _notrace() API membersPaul E. McKenney
The lockdep annotations rcu_read_acquire() and rcu_read_release() might lead to infinite looping if called from lockdep. So this patch removes them. Formal repost of http://lkml.org/lkml/2009/8/25/309 on the strength of Lai Jiangshan's review. Suggested-by: Lai Jiangshan <laijs@cn.fujitsu.com> Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <20090826015337.GA18904@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-24rcu: Add "notrace" to RCU function headers used by ftracePaul E. McKenney
Both rcu_read_lock_sched_notrace() and rcu_read_unlock_sched_notrace() are used by ftrace, and thus need to be marked "notrace". Unfortunately, my naive assumption that gcc would see the inner "notrace" does not hold. Kudos to Lai Jiangshan for noting this. Reported-by: Ingo Molnar <mingo@elte.hu> Bug-spotted-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <12511321213243-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23rcu: Remove CONFIG_PREEMPT_RCUPaul E. McKenney
Now that CONFIG_TREE_PREEMPT_RCU is in place, there is no further need for CONFIG_PREEMPT_RCU. Remove it, along with whatever subtle bugs it may (or may not) contain. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <125097461396-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23rcu: Merge preemptable-RCU functionality into hierarchical RCUPaul E. McKenney
Create a kernel/rcutree_plugin.h file that contains definitions for preemptable RCU (or, under the #else branch of the #ifdef, empty definitions for the classic non-preemptable semantics). These definitions fit into plugins defined in kernel/rcutree.c for this purpose. This variant of preemptable RCU uses a new algorithm whose read-side expense is roughly that of classic hierarchical RCU under CONFIG_PREEMPT. This new algorithm's update-side expense is similar to that of classic hierarchical RCU, and, in absence of read-side preemption or blocking, is exactly that of classic hierarchical RCU. Perhaps more important, this new algorithm has a much simpler implementation, saving well over 1,000 lines of code compared to mainline's implementation of preemptable RCU, which will hopefully be retired in favor of this new algorithm. The simplifications are obtained by maintaining per-task nesting state for running tasks, and using a simple lock-protected algorithm to handle accounting when tasks block within RCU read-side critical sections, making use of lessons learned while creating numerous user-level RCU implementations over the past 18 months. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <12509746134003-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23rcu: Simplify rcu_pending()/rcu_check_callbacks() APIPaul E. McKenney
All calls from outside RCU are of the form: if (rcu_pending(cpu)) rcu_check_callbacks(cpu, user); This is silly, instead we put a call to rcu_pending() in rcu_check_callbacks(), and then make the outside calls be to rcu_check_callbacks(). This cuts down on the code a bit and also gives the compiler a better chance of optimizing. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <125097461311-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.hPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <12509746132349-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23rcu: Renamings to increase RCU clarityPaul E. McKenney
Make RCU-sched, RCU-bh, and RCU-preempt be underlying implementations, with "RCU" defined in terms of one of the three. Update the outdated rcu_qsctr_inc() names, as these functions no longer increment anything. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <12509746132696-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-23rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.hPaul E. McKenney
Some information hiding that makes it easier to merge preemptability into rcutree without descending into #include hell. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <1250974613373-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-22rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMPPaul E. McKenney
A couple of references to CONFIG_CLASSIC_RCU have survived. Although these are harmless, it is past time for them to go. The one in hardirq.h is strictly a readability problem. The two in pagemap.h appear to disable a !SMP performance optimization (which this patch re-enables). This does raise the issue as to whether pagemap.h should really be referring to the CPU implementation. Long term, I intend to make the RCU implementation driven by CONFIG_PREEMPT, at which point these should change from defined(CONFIG_TREE_RCU) to !defined(CONFIG_PREEMPT). In the meantime, is there something else that could be done in pagemap.h? Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: josht@linux.vnet.ibm.com Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org LKML-Reference: <20090822050851.GA8414@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-15cpu hotplug: Introduce cpu_notifier() to handle !HOTPLUG_CPU casePaul E. McKenney
This patch introduces a new cpu_notifier() API that is similar to hotcpu_notifier(), but which also notifies of CPUs coming online during boot in the !HOTPLUG_CPU case. Reported-by: Ingo Molnar <mingo@elte.hu> Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Tested-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: josht@linux.vnet.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: benh@kernel.crashing.org LKML-Reference: <12503552312611-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-15Merge commit 'v2.6.31-rc6' into core/rcuIngo Molnar
Merge reason: the branch was on pre-rc1 .30, update to latest. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-13Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_counter: Report the cloning task as parent on perf_counter_fork() perf_counter: Fix an ipi-deadlock perf: Rework/fix the whole read vs group stuff perf_counter: Fix swcounter context invariance perf report: Don't show unresolved DSOs and symbols when -S/-d is used perf tools: Add a general option to enable raw sample records perf tools: Add a per tracepoint counter attribute to get raw sample perf_counter: Provide hw_perf_counter_setup_online() APIs perf list: Fix large list output by using the pager perf_counter, x86: Fix/improve apic fallback perf record: Add missing -C option support for specifying profile cpu perf tools: Fix dso__new handle() to handle deleted DSOs perf tools: Fix fallback to cplus_demangle() when bfd_demangle() is not available perf report: Show the tid too in -D perf record: Fix .tid and .pid fill-in when synthesizing events perf_counter, x86: Fix generic cache events on P6-mobile CPUs perf_counter, x86: Fix lapic printk message
2009-08-13Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Fix handling of bad requeue syscall pairing futex: Fix compat_futex to be same as futex for REQUEUE_PI locking, sched: Give waitqueue spinlocks their own lockdep classes futex: Update futex_q lock_ptr on requeue proxy lock
2009-08-13perf: Rework/fix the whole read vs group stuffPeter Zijlstra
Replace PERF_SAMPLE_GROUP with PERF_SAMPLE_READ and introduce PERF_FORMAT_GROUP to deal with group reads in a more generic way. This allows you to get group reads out of read() as well. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey J Ashford <cjashfor@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> LKML-Reference: <20090813103655.117411814@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-13perf_counter: Provide hw_perf_counter_setup_online() APIsIngo Molnar
Provide weak aliases for hw_perf_counter_setup_online(). This is used by the BTS patches (for v2.6.32), but it interacts with fixes so propagate this upstream. (it has no effect as of yet) Also export perf_counter_output() to architecture code. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-12NFS: Fix an O_DIRECT Oops...Trond Myklebust
We can't call nfs_readdata_release()/nfs_writedata_release() without first initialising and referencing args.context. Doing so inside nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment() causes an Oops. We should rather be calling nfs_readdata_free()/nfs_writedata_free() in those cases. Looking at the O_DIRECT code, the "struct nfs_direct_req" is already referencing the nfs_open_context for us. Since the readdata and writedata structures carry a reference to that, we can simplify things by getting rid of the extra nfs_open_context references, so that we can replace all instances of nfs_readdata_release()/nfs_writedata_release(). Reported-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-10Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits) perf_counter: Zero dead bytes from ftrace raw samples size alignment perf_counter: Subtract the buffer size field from the event record size perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data perf_counter: Correct PERF_SAMPLE_RAW output perf tools: callchain: Fix bad rounding of minimum rate perf_counter tools: Fix libbfd detection for systems with libz dependency perf: "Longum est iter per praecepta, breve et efficax per exempla" perf_counter: Fix a race on perf_counter_ctx perf_counter: Fix tracepoint sampling to be part of generic sampling perf_counter: Work around gcc warning by initializing tracepoint record unconditionally perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode perf tools: callchain: Fix 'perf report' display to be callchain by default perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains perf record: Fix the -A UI for empty or non-existent perf.data perf util: Fix do_read() to fail on EOF instead of busy-looping perf list: Fix the output to not include tracepoints without an id perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc) perf report: Fix per task mult-counter stat reporting ...
2009-08-10perf_counter: Zero dead bytes from ftrace raw samples size alignmentFrederic Weisbecker
After aligning the ftrace raw samples, there are dead bytes storing random data from the stack. We don't want to leak these to userspace, then zero these out. Before: 0x2de88 [0x50]: event: 9 . . ... raw event: size 80 bytes . 0000: 09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff ......P........ . 0010: 68 01 00 00 68 01 00 00 2c 00 00 00 00 00 00 00 h...h...,...... . 0020: 2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00 ,...+...h...h.. . 0030: 6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00 kondemand/0.... . 0040: 68 01 00 00 40 7f 46 81 ff ff ff ff 00 10 1b 7f h...@.F........ ^ ^ ^ ^ Leak After: 0x2d318 [0x50]: event: 9 . . ... raw event: size 80 bytes . 0000: 09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff ......P........ . 0010: 68 01 00 00 68 01 00 00 68 14 00 00 00 00 00 00 h...h...h...... . 0020: 2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00 ,...+...h...h.. . 0030: 6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00 kondemand/0.... . 0040: 68 01 00 00 a0 80 46 81 ff ff ff ff 00 00 00 00 h.....F........ ^ ^ ^ ^ Fixed Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1249915116-5210-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de>
2009-08-10perf_counter: Subtract the buffer size field from the event record sizeFrederic Weisbecker
We compute the perf raw sample size by aligning the raw ftrace event size plus the buffer size field itself. We do that instead of aligning only the perf raw sample size, so that we might economize some in some cases. But this buffer size field is not stored in the perf raw sample, we must then substract its size from the buffer once we computed the alignment unless we may get a useless u32 field in the buffer. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090810141129.GA5124@nowhere> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-10locking, sched: Give waitqueue spinlocks their own lockdep classesPeter Zijlstra
Give waitqueue spinlocks their own lockdep classes when they are initialised from init_waitqueue_head(). This means that struct wait_queue::func functions can operate other waitqueues. This is used by CacheFiles to catch the page from a backing fs being unlocked and to wake up another thread to take a copy of it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Takashi Iwai <tiwai@suse.de> Cc: linux-cachefs@redhat.com Cc: torvalds@osdl.org Cc: akpm@linux-foundation.org LKML-Reference: <20090810113305.17284.81508.stgit@warthog.procyon.org.uk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-10perf_counter: Correct PERF_SAMPLE_RAW outputPeter Zijlstra
PERF_SAMPLE_* output switches should unconditionally output the correct format, as they are the only way to unambiguously parse the PERF_EVENT_SAMPLE data. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1249896447.17467.74.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Avoid redelivery of edge interrupt before next edge KVM: MMU: limit rmap chain length KVM: ia64: fix build failures due to ia64/unsigned long mismatches KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc KVM: fix ack not being delivered when msi present KVM: s390: fix wait_queue handling KVM: VMX: Fix locking imbalance on emulation failure KVM: VMX: Fix locking order in handle_invalid_guest_state KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages KVM: SVM: force new asid on vcpu migration KVM: x86: verify MTRR/PAT validity KVM: PIT: fix kpit_elapsed division by zero KVM: Fix KVM_GET_MSR_INDEX_LIST
2009-08-09perf_counter: Fix tracepoint sampling to be part of generic samplingFrederic Weisbecker
Based on Peter's comments, make tracepoint sampling generic just like all the other sampling bits are. This is a rename with no code changes: - PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW - struct perf_tracepoint_record to perf_raw_record We want the system in place that transport tracepoints raw samples events into the perf ring buffer to be generalized and usable by any type of counter. Reported-by; Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09perf_counter: Fix/complete ftrace event records samplingFrederic Weisbecker
This patch implements the kernel side support for ftrace event record sampling. A new counter sampling attribute is added: PERF_SAMPLE_TP_RECORD which requests ftrace events record sampling. In this case if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint fires, we emit the tracepoint binary record to the perfcounter event buffer, as a sample. Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf record: perf record -f -F 1 -a -e workqueue:workqueue_execution perf report -D 0x21e18 [0x48]: event: 9 . . ... raw event: size 72 bytes . 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........ . 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!...... . 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve . 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1........... . 0040: e0 b1 31 81 ff ff ff ff ....... . 0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33 The raw ftrace binary record starts at offset 0020. Translation: struct trace_entry { type = 0x2b = 43; flags = 1; preempt_count = 2; pid = 0xa = 10; tgid = 0xa = 10; } thread_comm = "events/1" thread_pid = 0xa = 10; func = 0xffffffff8131b1e0 = flush_to_ldisc() What will come next? - Userspace support ('perf trace'), 'flight data recorder' mode for perf trace, etc. - The unconditional copy from the profiling callback brings some costs however if someone wants no such sampling to occur, and needs to be fixed in the future. For that we need to have an instant access to the perf counter attribute. This is a matter of a flag to add in the struct ftrace_event. - Take care of the events recursivity! Don't ever try to record a lock event for example, it seems some locking is used in the profiling fast path and lead to a tracing recursivity. That will be fixed using raw spinlock or recursivity protection. - [...] - Profit! :-) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09perf_counter, ftrace: Fix perf_counter integrationPeter Zijlstra
Adds possible second part to the assign argument of TP_EVENT(). TP_perf_assign( __perf_count(foo); __perf_addr(bar); ) Which, when specified make the swcounter increment with @foo instead of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address associated with the event) when this triggers a counter overflow. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/hch/xfs-icache-racesLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/hch/xfs-icache-races: xfs: fix freeing of inodes not yet added to the inode cache vfs: add __destroy_inode vfs: fix inode_init_always calling convention
2009-08-07Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_counter: Fix double list iteration in per task precise stats perf: Auto-detect libelf perf symbol: Fix symbol parsing in certain cases: use the build-id as a symlink perf_counter/powerpc: Check oprofile_cpu_type for NULL before using it ftrace: Fix perf-tracepoint OOPS perf report: Add missing command line options to man page perf: Auto-detect libbfd perf report: Make --sort comm,dso,symbol the default
2009-08-07Merge git://git.infradead.org/mtd-2.6Linus Torvalds
* git://git.infradead.org/mtd-2.6: jffs2: Fix return value from jffs2_do_readpage_nolock() mtd: mtdblock: introduce mtdblks_lock mtd: remove 'SBC8240 Wind River' Device Driver Code mtd: OneNAND: OMAP2/3: free GPMC CS on module removal mtd: OneNAND: fix incorrect bufferram offset mtd: blkdevs: do not forget to get MTD devices mtd: fix the conversion from dev to mtd_info mtd: let include/linux/mtd/partitions.h stand on its own
2009-08-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: matrix_keypad - make matrix keymap size dynamic Input: wistron_btns - support Prestigio Wifi RF kill button Input: i8042 - add Asus G1S to noloop exception list
2009-08-07bzip2/lzma/gzip: fix comments describing decompressor APIPhillip Lougher
Fix and improve comments in decompress/generic.h that describe the decompressor API. Also remove an unused definition, and rename INBUF_LEN in lib/decompress_inflate.c to conform to bzip2/lzma naming. Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY awareKAMEZAWA Hiroyuki
At first, init_task's mems_allowed is initialized as this. init_task->mems_allowed == node_state[N_POSSIBLE] And cpuset's top_cpuset mask is initialized as this top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY] Before 2.6.29: policy's mems_allowed is initialized as this. 1. update tasks->mems_allowed by its cpuset->mems_allowed. 2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask) Updating task's mems_allowed in reference to top_cpuset's one. cpuset's mems_allowed is aware of N_HIGH_MEMORY, always. In 2.6.30: After commit 58568d2a8215cb6f55caf2332017d7bdff954e1c ("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed is initialized as this. 1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask) Here, if task is in top_cpuset, task->mems_allowed is not updated from init's one. Assume user excutes command as #numactrl --interleave=all ,.... policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK) Then, policy's mems_allowd can includes a possible node, which has no pgdat. MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this directly. NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL Then, what's we need is making policy->mems_allowed be aware of N_HIGH_MEMORY. This patch does that. But to do so, extra nodemask will be on statck. Because I know cpumask has a new interface of CPUMASK_ALLOC(), I added it to node. This patch stands on old behavior. But I feel this fix itself is just a Band-Aid. But to do fundametal fix, we have to take care of memory hotplug and it takes time. (task->mems_allowd should be N_HIGH_MEMORY, I think.) mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask should be includes only online nodes. In old behavior, this is guaranteed by frequent reference to cpuset's code. Now, most of them are removed and mempolicy has to check it by itself. To do check, a few nodemask_t will be used for calculating nodemask. But, size of nodemask_t can be big and it's not good to allocate them on stack. Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area. NODEMASK_ALLOC/FREE shoudl be there. [akpm@linux-foundation.org: cleanups & tweaks] Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Paul Menage <menage@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07vfs: add __destroy_inodeChristoph Hellwig
When we want to tear down an inode that lost the add to the cache race in XFS we must not call into ->destroy_inode because that would delete the inode that won the race from the inode cache radix tree. This patch provides the __destroy_inode helper needed to fix this, the actual fix will be in th next patch. As XFS was the only reason destroy_inode was exported we shift the export to the new __destroy_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2009-08-07vfs: fix inode_init_always calling conventionChristoph Hellwig
Currently inode_init_always calls into ->destroy_inode if the additional initialization fails. That's not only counter-intuitive because inode_init_always did not allocate the inode structure, but in case of XFS it's actively harmful as ->destroy_inode might delete the inode from a radix-tree that has never been added. This in turn might end up deleting the inode for the same inum that has been instanciated by another process and cause lots of cause subtile problems. Also in the case of re-initializing a reclaimable inode in XFS it would free an inode we still want to keep alive. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2009-08-05Input: matrix_keypad - make matrix keymap size dynamicEric Miao
Remove assumption on the shift and size of rows/columns form matrix_keypad driver. Signed-off-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2009-08-06ftrace: Fix perf-tracepoint OOPSPeter Zijlstra
Not all tracepoints are created equal, in specific the ftrace tracepoints are created with TRACE_EVENT_FORMAT() which does not generate the needed bits to tie them into perf counters. For those events, don't create the 'id' file and fail ->profile_enable when their ID is specified through other means. Reported-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1249497664.5890.4.camel@laptop> [ v2: fix build error in the !CONFIG_EVENT_PROFILE case ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-05KVM: fix ack not being delivered when msi presentMichael S. Tsirkin
kvm_notify_acked_irq does not check irq type, so that it sometimes interprets msi vector as irq. As a result, ack notifiers are not called, which typially hangs the guest. The fix is to track and check irq type. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-05drm/radeon: Add support for RS880 chipsAlex Deucher
These are new AMD IGP chips Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: tty-ldisc: be more careful in 'put_ldisc' locking tty-ldisc: turn ldisc user count into a proper refcount tty-ldisc: make refcount be atomic_t 'users' count
2009-08-04Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: Make SCSI SG v4 driver enabled by default and remove EXPERIMENTAL dependency, since udev depends on BSG block: Update topology documentation block: Stack optimal I/O size block: Add a wrapper for setting minimum request size without a queue block: Make blk_queue_stack_limits use the new stacking interface
2009-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) ehea: Fix napi list corruption on ifconfig down igbvf: Allow VF driver to correctly recognize failure to set mac 3c59x: Fix build failure with gcc 3.2 sky2: Avoid transmits during sky2_down() iwlagn: do not send key clear commands when rfkill enabled libertas: Read buffer overflow drivers/net/wireless: introduce missing kfree drivers/net/wireless/iwlwifi: introduce missing kfree zd1211rw: fix unaligned access in zd_mac_rx cfg80211: fix regression on beacon world roaming feature cfg80211: add two missing NULL pointer checks ixgbe: Patch to modify 82598 PCIe completion timeout values bluetooth: rfcomm_init bug fix mlx4_en: Fix double pci unmapping. mISDN: Fix handling of receive buffer size in L1oIP pcnet32: VLB support fixes pcnet32: remove superfluous NULL pointer check in pcnet32_probe1() net: restore the original spinlock to protect unicast list netxen: fix coherent dma mask setting mISDN: Read buffer overflow ...
2009-08-04Merge branch 'drm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/ttm: Read buffer overflow drm/radeon: Read buffer overflow drm/ttm: Fix a sync object leak. drm/radeon/kms: fix memory leak in radeon_driver_load_kms drm/radeon/kms: fix nomodeset. drm/ttm: Fix a potential comparison of structs. drm/radeon/kms: fix rv515 VRAM initialisation. drm/radeon: add some new r7xx pci ids drm: Catch stop possible NULL pointer reference drm: Small logic fix in drm_mode_setcrtc
2009-08-04Merge branch 'perfcounters-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf_counter: Set the CONFIG_PERF_COUNTERS default to y if CONFIG_PROFILING=y perf: Fix read buffer overflow perf top: Add mwait_idle_with_hints to skip_symbols[] perf tools: Fix faulty check perf report: Update for the new FORK/EXIT events perf_counter: Full task tracing perf_counter: Collapse inherit on read() tracing, perf_counter: Add help text to CONFIG_EVENT_PROFILE perf_counter tools: Fix link errors with older toolchains
2009-08-04tty-ldisc: make refcount be atomic_t 'users' countLinus Torvalds
This is pure preparation of changing the ldisc reference counting to be a true refcount that defines the lifetime of the ldisc. But this is a purely syntactic change for now to make the next steps easier. This patch should make no semantic changes at all. But I wanted to make the ldisc refcount be an atomic (I will be touching it without locks soon enough), and I wanted to rename it so that there isn't quite as much confusion between 'ldo->refcount' (ldisk operations refcount) and 'ld->refcount' (ldisc refcount itself) in the same file. So it's now an atomic 'ld->users' count. It still starts at zero, despite having a reference from 'tty->ldisc', but that will change once we turn it into a _real_ refcount. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-08-04drm/radeon: add some new r7xx pci idsAlex Deucher
Signed-off-by: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-08-03Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
2009-08-03cfg80211: fix regression on beacon world roaming featureLuis R. Rodriguez
A regression was added through patch a4ed90d6: "cfg80211: respect API on orig_flags on channel for beacon hint" We did indeed respect _orig flags but the intention was not clearly stated in the commit log. This patch fixes firmware issues picked up by iwlwifi when we lift passive scan of beaconing restrictions on channels its EEPROM has been configured to always enable. By doing so though we also disallowed beacon hints on devices registering their wiphy with custom world regulatory domains enabled, this happens to be currently ath5k, ath9k and ar9170. The passive scan and beacon restrictions on those devices would never be lifted even if we did find a beacon and the hardware did support such enhancements when world roaming. Since Johannes indicates iwlwifi firmware cannot be changed to allow beacon hinting we set up a flag now to specifically allow drivers to disable beacon hints for devices which cannot use them. We enable the flag on iwlwifi to disable beacon hints and by default enable it for all other drivers. It should be noted beacon hints lift passive scan flags and beacon restrictions when we receive a beacon from an AP on any 5 GHz non-DFS channels, and channels 12-14 on the 2.4 GHz band. We don't bother with channels 1-11 as those channels are allowed world wide. This should fix world roaming for ath5k, ath9k and ar9170, thereby improving scan time when we receive the first beacon from any AP, and also enabling beaconing operation (AP/IBSS/Mesh) on cards which would otherwise not be allowed to do so. Drivers not using custom regulatory stuff (wiphy_apply_custom_regulatory()) were not affected by this as the orig_flags for the channels would have been cleared upon wiphy registration. I tested this with a world roaming ath5k card. Cc: Jouni Malinen <jouni.malinen@atheros.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-03bluetooth: rfcomm_init bug fixDave Young
rfcomm tty may be used before rfcomm_tty_driver initilized, The problem is that now socket layer init before tty layer, if userspace program do socket callback right here then oops will happen. reporting in: http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2 make 3 changes: 1. remove #ifdef in rfcomm/core.c, make it blank function when rfcomm tty not selected in rfcomm.h 2. tune the rfcomm_init error patch to ensure tty driver initilized before rfcomm socket usage. 3. remove __exit for rfcomm_cleanup_sockets because above change need call it in a __init function. Reported-by: Oliver Hartkopp <oliver@hartkopp.net> Tested-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-03mtd: fix the conversion from dev to mtd_infoSaeed Bishara
The patch fixes a bug when converting dev to mtd_info by using the drvdata of the dev, the previous code used container_of(dev, struct mtd_info, dev), but won't work for the mtdXro devices as they created without being contained inside mtd_info structure. Signed-off-by: Saeed Bishara <saeed@marvell.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>