aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2009-06-10Merge branch 'sched-docs-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-docs-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Document memory barriers implied by sleep/wake-up primitives
2009-06-10Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: fix typo in sched-rt-group.txt file ftrace: fix typo about map of kernel priority in ftrace.txt file. sched: properly define the sched_group::cpumask and sched_domain::span fields sched, timers: cleanup avenrun users sched, timers: move calc_load() to scheduler sched: Don't export sched_mc_power_savings on multi-socket single core system sched: emit thread info flags with stack trace sched: rt: document the risk of small values in the bandwidth settings sched: Replace first_cpu() with cpumask_first() in ILB nomination code sched: remove extra call overhead for schedule() sched: use group_first_cpu() instead of cpumask_first(sched_group_cpus()) wait: don't use __wake_up_common() sched: Nominate a power-efficient ilb in select_nohz_balancer() sched: Nominate idle load balancer from a semi-idle package. sched: remove redundant hierarchy walk in check_preempt_wakeup
2009-06-10Merge branch 'x86-kbuild-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-kbuild-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (46 commits) x86, boot: add new generated files to the appropriate .gitignore files x86, boot: correct the calculation of ZO_INIT_SIZE x86-64: align __PHYSICAL_START, remove __KERNEL_ALIGN x86, boot: correct sanity checks in boot/compressed/misc.c x86: add extension fields for bootloader type and version x86, defconfig: update kernel position parameters x86, defconfig: update to current, no material changes x86: make CONFIG_RELOCATABLE the default x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB x86: document new bzImage fields x86, boot: make kernel_alignment adjustable; new bzImage fields x86, boot: remove dead code from boot/compressed/head_*.S x86, boot: use LOAD_PHYSICAL_ADDR on 64 bits x86, boot: make symbols from the main vmlinux available x86, boot: determine compressed code offset at compile time x86, boot: use appropriate rep string for move and clear x86, boot: zero EFLAGS on 32 bits x86, boot: set up the decompression stack as early as possible x86, boot: straighten out ranges to copy/zero in compressed/head*.S x86, boot: stylistic cleanups for boot/compressed/head_64.S ... Fixed trivial conflict in arch/x86/configs/x86_64_defconfig manually
2009-06-10Merge branch 'irq-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (76 commits) x86, apic: Fix dummy apic read operation together with broken MP handling x86, apic: Restore irqs on fail paths x86: Print real IOAPIC version for x86-64 x86: enable_update_mptable should be a macro sparseirq: Allow early irq_desc allocation x86, io-apic: Don't mark pin_programmed early x86, irq: don't call mp_config_acpi_gsi() if update_mptable is not enabled x86, irq: update_mptable needs pci_routeirq x86: don't call read_apic_id if !cpu_has_apic x86, apic: introduce io_apic_irq_attr x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector(), fix x86: read apic ID in the !acpi_lapic case x86: apic: Fixmap apic address even if apic disabled x86: display extended apic registers with print_local_APIC and cpu_debug code x86: read apic ID in the !acpi_lapic case x86: clean up and fix setup_clear/force_cpu_cap handling x86: apic: Check rev 3 fadt correctly for physical_apic bit x86/pci: update pirq_enable_irq() to setup io apic routing x86/acpi: move setup io apic routing out of CONFIG_ACPI scope x86/pci: add 4 more return parameters to IO_APIC_get_PCI_irq_vector() ...
2009-06-09cpumask: alloc zeroed cpumask for static cpumask_var_tsYinghai Lu
These are defined as static cpumask_var_t so if MAXSMP is not used, they are cleared already. Avoid surprises when MAXSMP is enabled. Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-06-08async: Fix lack of boot-time console due to insufficient synchronizationLinus Torvalds
Our async work synchronization was broken by "async: make sure independent async domains can't accidentally entangle" (commit d5a877e8dd409d8c702986d06485c374b705d340), because it would report the wrong lowest active async ID when there was both running and pending async work. This caused things like no being able to read the root filesystem, resulting in missing console devices and inability to run 'init', causing a boot-time panic. This fixes it by properly returning the lowest pending async ID: if there is any running async work, that will have a lower ID than any pending work, and we should _not_ look at the pending work list. There were alternative patches from Jaswinder and James, but this one also cleans up the code by removing the pointless 'ret' variable and the unnecesary testing for an empty list around 'for_each_entry()' (if the list is empty, the for_each_entry() thing just won't execute). Fixes-bug: http://bugzilla.kernel.org/show_bug.cgi?id=13474 Reported-and-tested-by: Chris Clayton <chris2553@googlemail.com> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-04ptrace: revert "ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic"Oleg Nesterov
Commit 95a3540da9c81a5987be810e1d9a83640a366bd5 ("ptrace_detach: the wrong wakeup breaks the ERESTARTxxx logic") removed the "extra" wake_up_process() from ptrace_detach(), but as Jan pointed out this breaks the compatibility. I believe the changelog is right and this wake_up() is wrong in many ways, but GDB assumes that ptrace(PTRACE_DETACH, child, 0, 0) always wakes up the tracee. Despite the fact this breaks SIGNAL_STOP_STOPPED/group_stop_count logic, and despite the fact this wake_up_process() can break another assumption: PTRACE_DETACH with SIGSTOP should leave the tracee in TASK_STOPPED case. Because the untraced child can dequeue SIGSTOP and call do_signal_stop() before ptrace_detach() calls wake_up_process(). Revert this change for now. We need some fixes even if we we want to keep the current behaviour, but these fixes are not for 2.6.30. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-04ptrace: tracehook_report_clone: fix false positivesOleg Nesterov
The "trace || CLONE_PTRACE" check in tracehook_report_clone() is not right, - If the untraced task does clone(CLONE_PTRACE) the new child is not traced, we must not queue SIGSTOP. - If we forked the traced task, but the tracer exits and untraces both the forking task and the new child (after copy_process() drops tasklist_lock), we should not queue SIGSTOP too. Change the code to check task_ptrace() != 0 instead. This is still racy, but the race is harmless. We can race with another tracer attaching to this child, or the tracer can exit and detach in parallel. But giwen that we didn't do wake_up_new_task() yet, the child must have the pending SIGSTOP anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-01Merge branch 'linus' into irq/numaIngo Molnar
Conflicts: arch/mips/sibyte/bcm1480/irq.c arch/mips/sibyte/sb1250/irq.c Merge reason: we gathered a few conflicts plus update to latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-26kmod: Release sub_info on cred allocation failure.Tetsuo Handa
call_usermodehelper_setup() forgot to kfree(sub_info) when prepare_usermodehelper_creds() failed. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: Do not hold dpm_list_mtx while disabling/enabling nonboot CPUs
2009-05-24async: make sure independent async domains can't accidentally entangleJames Bottomley
The problem occurs when async_synchronize_full_domain() is called when the async_pending list is not empty. This will cause lowest_running() to return the cookie of the first entry on the async_pending list, which might be nothing at all to do with the domain being asked for and thus cause the domain synchronization to wait for an unrelated domain. This can cause a deadlock if domain synchronization is used from one domain to wait for another. Fix by running over the async_pending list to see if any pending items actually belong to our domain (and return their cookies if they do). Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-24PM: Do not hold dpm_list_mtx while disabling/enabling nonboot CPUsRafael J. Wysocki
We shouldn't hold dpm_list_mtx while executing [disable|enable]_nonboot_cpus(), because theoretically this may lead to a deadlock as shown by the following example (provided by Johannes Berg): CPU 3 CPU 2 CPU 1 suspend/hibernate something: rtnl_lock() device_pm_lock() -> mutex_lock(&dpm_list_mtx) mutex_lock(&dpm_list_mtx) linkwatch_work -> rtnl_lock() disable_nonboot_cpus() -> flush CPU 3 workqueue Fortunately, device drivers are supposed to stop any activities that might lead to the registration of new device objects way before disable_nonboot_cpus() is called, so it shouldn't be necessary to hold dpm_list_mtx over the entire late part of device suspend and early part of device resume. Thus, during the late suspend and the early resume of devices acquire dpm_list_mtx only when dpm_list is going to be traversed and release it right after that. This patch is reported to fix the regressions tracked as http://bugzilla.kernel.org/show_bug.cgi?id=13245. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Miles Lane <miles.lane@gmail.com> Tested-by: Ming Lei <tom.leiming@gmail.com>
2009-05-23sparseirq: Allow early irq_desc allocationPaul Mundt
Presently non-legacy IRQs have their irq_desc allocated with kzalloc_node(). This assumes that all callers of irq_to_desc_node_alloc() will be sufficiently late in the boot process that kmalloc is available. While porting sparseirq support to sh this blew up immediately, as at the time that we register the CPU's interrupt vector map only bootmem is available. Check slab_is_available() to work out which path to use. [ Impact: fix SH early boot crash with sparseirq enabled ] Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> LKML-Reference: <20090522014008.GA2806@linux-sh.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-19futex: setup writeable mapping for futex ops which modify user space dataThomas Gleixner
The futex code installs a read only mapping via get_user_pages_fast() even if the futex op function has to modify user space data. The eventual fault was fixed up by futex_handle_fault() which walked the VMA with mmap_sem held. After the cleanup patches which removed the mmap_sem dependency of the futex code commit 4dc5b7a36a49eff97050894cf1b3a9a02523717 (futex: clean up fault logic) removed the private VMA walk logic from the futex code. This change results in a stale RO mapping which is not fixed up. Instead of reintroducing the previous fault logic we set up the mapping in get_user_pages_fast() read/write for all operations which modify user space data. Also handle private futexes in the same way and make the current unconditional access_ok(VERIFY_WRITE) depend on the futex op. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> CC: stable@kernel.org
2009-05-19sched: properly define the sched_group::cpumask and sched_domain::span fieldsIngo Molnar
Properly document the variable-size structure tricks we are doing wrt. struct sched_group and sched_domain, and use the field[0] GCC extension instead of defining a vla array. Dont use unions for this, as pointed out by Linus. [ Impact: cleanup, un-confuse Sparse and LLVM ] Reported-by: Jeff Garzik <jeff@garzik.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <alpine.LFD.2.01.0905180850110.3301@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-18Merge branches 'sched-fixes-for-linus-2' and 'core-fixes-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix fallback sched_clock()'s offset when using jiffies * 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: lockdep: increase MAX_LOCKDEP_ENTRIES and MAX_LOCKDEP_CHAINS
2009-05-18Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Append prompt in /debug/tracing/README file x86/function-graph: fix constraint for recording old return value
2009-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: PM: check sysdev_suspend(PMSG_FREEZE) return value
2009-05-16Fix caller information for warn_slowpath_nullLinus Torvalds
Ian Campbell noticed that since "Eliminate thousands of warnings with gcc 3.2 build" (commit 57adc4d2dbf968fdbe516359688094eef4d46581) all WARN_ON()'s currently appear to come from warn_slowpath_null(), eg: WARNING: at kernel/softirq.c:143 warn_slowpath_null+0x1c/0x20() because now that warn_slowpath_null() is in the call path, the __builtin_return_address(0) returns that, rather than the place that caused the warning. Fix this by splitting up the warn_slowpath_null/fmt cases differently, using a common helper function, and getting the return address in the right place. This also happens to avoid the unnecessary stack usage for the non-stdargs case, and just generally cleans things up. Make the function name printout use %pS while at it. Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-15PM: check sysdev_suspend(PMSG_FREEZE) return valueBjorn Helgaas
Check the return value of sysdev_suspend(). I think this was a typo. Without this change, the following "if" check is always false. I also changed the error message so it's distinguishable from the similar message a few lines above. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-05-15tracing: Append prompt in /debug/tracing/README fileGeunSik Lim
append prompt in /debug/tracing/README file. This is trivial issue. Fix typo Mini Howto file(README) for ftrace. [ Impact: cleanup ] Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: williams <williams@redhat.com> LKML-Reference: <1242289418.31161.45.camel@centos51> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: gdb documentation fix kgdb,i386: use address that SP register points to in the exception frame sysrq, intel_fb: fix sysrq g collision
2009-05-15sched, timers: cleanup avenrun usersThomas Gleixner
avenrun is an rough estimate so we don't have to worry about consistency of the three avenrun values. Remove the xtime lock dependency and provide a function to scale the values. Cleanup the users. [ Impact: cleanup ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org>
2009-05-15sched, timers: move calc_load() to schedulerThomas Gleixner
Dimitri Sivanich noticed that xtime_lock is held write locked across calc_load() which iterates over all online CPUs. That can cause long latencies for xtime_lock readers on large SMP systems. The load average calculation is an rough estimate anyway so there is no real need to protect the readers vs. the update. It's not a problem when the avenrun array is updated while a reader copies the values. Instead of iterating over all online CPUs let the scheduler_tick code update the number of active tasks shortly before the avenrun update happens. The avenrun update itself is handled by the CPU which calls do_timer(). [ Impact: reduce xtime_lock write locked section ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org>
2009-05-15sysrq, intel_fb: fix sysrq g collisionJason Wessel
Commit 79e539453b34e35f39299a899d263b0a1f1670bd introduced a regression where you cannot use sysrq 'g' to enter kgdb. The solution is to move the intel fb sysrq over to V for video instead of G for graphics. The SMP VOYAGER code to register for the sysrq-v is not anywhere to be found in the mainline kernel, so the comments in the code were cleaned up as well. This patch also cleans up the sysrq definitions for kgdb to make it generic for the kernel debugger, such that the sysrq 'g' can be used in the future to enter a gdbstub or another kernel debugger. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-05-15Revert "mm: add /proc controls for pdflush threads"Jens Axboe
This reverts commit fafd688e4c0c34da0f3de909881117d374e4c7af. Work is progressing to switch away from pdflush as the process backing for flushing out dirty data. So it seems pointless to add more knobs to control pdflush threads. The original author of the patch did not have any specific use cases for adding the knobs, so we can easily revert this before 2.6.30 to avoid having to maintain this API forever. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-12lockdep: increase MAX_LOCKDEP_ENTRIES and MAX_LOCKDEP_CHAINSIngo Molnar
Now that lockdep coverage has increased it has become easier to run out of entries: [ 21.401387] BUG: MAX_LOCKDEP_ENTRIES too low! [ 21.402007] turning off the locking correctness validator. [ 21.402007] Pid: 1555, comm: S99local Not tainted 2.6.30-rc5-tip #2 [ 21.402007] Call Trace: [ 21.402007] [<ffffffff81069789>] add_lock_to_list+0x53/0xba [ 21.402007] [<ffffffff810eb615>] ? lookup_mnt+0x19/0x53 [ 21.402007] [<ffffffff8106be14>] check_prev_add+0x14b/0x1c7 [ 21.402007] [<ffffffff8106c304>] validate_chain+0x474/0x52a [ 21.402007] [<ffffffff8106c6fc>] __lock_acquire+0x342/0x3c7 [ 21.402007] [<ffffffff8106c842>] lock_acquire+0xc1/0xe5 [ 21.402007] [<ffffffff810eb615>] ? lookup_mnt+0x19/0x53 [ 21.402007] [<ffffffff8153aedc>] _spin_lock+0x31/0x66 Double the size - as we've done in the past. [ Impact: allow lockdep to cover more locks ] Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-12Merge branch 'x86/apic' into irq/numaIngo Molnar
Merge reason: both topics modify the APIC code but were able to do it in parallel so far. An upcoming patch generates a conflict so merge them to avoid the conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-11x86: add extension fields for bootloader type and versionH. Peter Anvin
A long ago, in days of yore, it all began with a god named Thor. There were vikings and boats and some plans for a Linux kernel header. Unfortunately, a single 8-bit field was used for bootloader type and version. This has generally worked without *too* much pain, but we're getting close to flat running out of ID fields. Add extension fields for both type and version. The type will be extended if it the old field is 0xE; the version is a simple MSB extension. Keep /proc/sys/kernel/bootloader_type containing (type << 4) + (ver & 0xf) for backwards compatiblity, but also add /proc/sys/kernel/bootloader_version which contains the full version number. [ Impact: new feature to support more bootloaders ] Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-05-11Merge commit 'v2.6.30-rc5' into sched/coreIngo Molnar
Merge reason: sched/core was on .30-rc1 before, update to latest fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-09Convert obvious places to deactivate_locked_super()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-05-09sched: Fix fallback sched_clock()'s offset when using jiffiesRon
Account for the initial offset to the jiffy count. [ Impact: fix printk timestamps on architectures using fallback sched_clock() ] Signed-off-by: Ron Lee <ron@debian.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-08kprobes: fix to use text_mutex around arm/disarm kprobeMasami Hiramatsu
Fix kprobes to lock text_mutex around some arch_arm/disarm_kprobe() which are newly added by commit de5bd88d5a5cce3cacea904d3503e5ebdb3852a2. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-07sched: emit thread info flags with stack traceDavid Rientjes
When a thread is oom killed and fails to exit, it's helpful to know which threads have access to memory reserves if the machine livelocks. This is done by testing for the TIF_MEMDIE thread info flag and should be displayed alongside stack traces to identify tasks that have access to such reserves but are still stuck allocating pages, for instance. It would probably be helpful in other cases as well, so all thread info flags are emitted when showing a task. ( v2: fix warning reported by Stephen Rothwell ) [ Impact: extend debug printout info ] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <alpine.DEB.2.00.0905040136390.15831@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-06Eliminate thousands of warnings with gcc 3.2 buildAndi Kleen
When building with gcc 3.2 I get thousands of warnings such as include/linux/gfp.h: In function `allocflags_to_migratetype': include/linux/gfp.h:105: warning: null format string due to passing a NULL format string to warn_slowpath() in #define __WARN() warn_slowpath(__FILE__, __LINE__, NULL) Split this case out into a separate call. This also shrinks the kernel slightly: text data bss dec hex filename 4802274 707668 712704 6222646 5ef336 vmlinux text data bss dec hex filename 4799027 703572 712704 6215303 5ed687 vmlinux due to removeing one argument from the commonly-called __WARN(). [akpm@linux-foundation.org: reduce scope of `empty'] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-06inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positiveWu Fengguang
There is what we believe to be a false positive reported by lockdep. inotify_inode_queue_event() => take inotify_mutex => kernel_event() => kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim => dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock The plan is to fix this via lockdep annotation, but that is proving to be quite involved. The patch flips the allocation over to GFP_NFS to shut the warning up, for the 2.6.30 release. Hopefully we will fix this for real in 2.6.31. I'll queue a patch in -mm to switch it back to GFP_KERNEL so we don't forget. ================================= [ INFO: inconsistent lock state ] 2.6.30-rc2-next-20090417 #203 --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes: (&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0 {RECLAIM_FS-ON-W} state was registered at: [<ffffffff81079188>] mark_held_locks+0x68/0x90 [<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100 [<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0 [<ffffffff81130652>] kernel_event+0xe2/0x190 [<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230 [<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110 [<ffffffff8110444d>] vfs_create+0xcd/0x140 [<ffffffff8110825d>] do_filp_open+0x88d/0xa20 [<ffffffff810f6b68>] do_sys_open+0x98/0x140 [<ffffffff810f6c50>] sys_open+0x20/0x30 [<ffffffff8100c272>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 690455 hardirqs last enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80 hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0 softirqs last enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220 softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50 other info that might help us debug this: 2 locks held by kswapd0/380: #0: (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180 #1: (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0 stack backtrace: Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203 Call Trace: [<ffffffff810789ef>] print_usage_bug+0x19f/0x200 [<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50 [<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0 [<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0 [<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0 [<ffffffff810f478c>] ? slob_free+0x10c/0x370 [<ffffffff8107c0a1>] lock_acquire+0xe1/0x120 [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0 [<ffffffff81562d43>] mutex_lock_nested+0x63/0x420 [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0 [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0 [<ffffffff81012fe9>] ? sched_clock+0x9/0x10 [<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0 [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0 [<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0 [<ffffffff8110cb23>] d_kill+0x33/0x60 [<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350 [<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0 [<ffffffff810d0cc5>] shrink_slab+0x125/0x180 [<ffffffff810d1540>] kswapd+0x560/0x7a0 [<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0 [<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0 [<ffffffff8106555b>] kthread+0x5b/0xa0 [<ffffffff8100d40a>] child_rip+0xa/0x20 [<ffffffff8100cdd0>] ? restore_args+0x0/0x30 [<ffffffff81065500>] ? kthread+0x0/0xa0 [<ffffffff8100d400>] ? child_rip+0x0/0x20 [eparis@redhat.com: fix audit too] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matt Mackall <mpm@selenic.com> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Eric Paris <eparis@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-05Merge branch 'timers/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clockevents: prevent endless loop in tick_handle_periodic()
2009-05-05Merge branch 'irq/urgent' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "genirq: assert that irq handlers are indeed running in hardirq context"
2009-05-05Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: account system time properly
2009-05-05Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kernel/posix-cpu-timers.c: fix sparse warning dma-debug: remove broken dma memory leak detection for 2.6.30 locking: Documentation: lockdep-design.txt, fix note of state bits
2009-05-05Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: x86, mmiotrace: fix range test tracing: fix ref count in splice pages
2009-05-05sched: rt: document the risk of small values in the bandwidth settingsPeter Zijlstra
Thomas noted that we should disallow sysctl_sched_rt_runtime == 0 for (!RT_GROUP) since the root group always has some RT tasks in it. Further, update the documentation to inspire clue. [ Impact: exclude corner-case sysctl_sched_rt_runtime value ] Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090505155436.863098054@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-02mm: prevent divide error for small values of vm_dirty_bytesAndrea Righi
Avoid setting less than two pages for vm_dirty_bytes: this is necessary to avoid potential division by 0 (like the following) in get_dirty_limits(). [ 49.951610] divide error: 0000 [#1] PREEMPT SMP [ 49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent [ 49.952195] CPU 1 [ 49.952195] Modules linked in: pcspkr [ 49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1 [ 49.952195] RIP: 0010:[<ffffffff802d39a9>] [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP: 0018:ffff88001de03a98 EFLAGS: 00010202 [ 49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3 [ 49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000 [ 49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000 [ 49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001 [ 49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78 [ 49.952195] FS: 00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000 [ 49.952195] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0 [ 49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250) [ 49.952195] Stack: [ 49.952195] ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000 [ 49.952195] 00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218 [ 49.952195] 0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7 [ 49.952195] Call Trace: [ 49.952195] [<ffffffff802d3ed7>] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0 [ 49.952195] [<ffffffff80368f8e>] ? ext3_writeback_write_end+0x9e/0x120 [ 49.952195] [<ffffffff802cc7df>] generic_file_buffered_write+0x12f/0x330 [ 49.952195] [<ffffffff802cce8d>] __generic_file_aio_write_nolock+0x26d/0x460 [ 49.952195] [<ffffffff802cda32>] ? generic_file_aio_write+0x52/0xd0 [ 49.952195] [<ffffffff802cda49>] generic_file_aio_write+0x69/0xd0 [ 49.952195] [<ffffffff80365fa6>] ext3_file_write+0x26/0xc0 [ 49.952195] [<ffffffff803034d1>] do_sync_write+0xf1/0x140 [ 49.952195] [<ffffffff80290d1a>] ? get_lock_stats+0x2a/0x60 [ 49.952195] [<ffffffff80280730>] ? autoremove_wake_function+0x0/0x40 [ 49.952195] [<ffffffff8030411b>] vfs_write+0xcb/0x190 [ 49.952195] [<ffffffff803042d0>] sys_write+0x50/0x90 [ 49.952195] [<ffffffff8022ff6b>] system_call_fastpath+0x16/0x1b [ 49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 <48> f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02 [ 49.952195] RIP [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0 [ 49.952195] RSP <ffff88001de03a98> [ 50.096523] ---[ end trace 008d7aa02f244d7b ]--- Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Rientjes <rientjes@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-02clockevents: prevent endless loop in tick_handle_periodic()john stultz
tick_handle_periodic() can lock up hard when a one shot clock event device is used in combination with jiffies clocksource. Avoid an endless loop issue by requiring that a highres valid clocksource be installed before we call tick_periodic() in a loop when using ONESHOT mode. The result is we will only increment jiffies once per interrupt until a continuous hardware clocksource is available. Without this, we can run into a endless loop, where each cycle through the loop, jiffies is updated which increments time by tick_period or more (due to clock steering), which can cause the event programming to think the next event was before the newly incremented time and fail causing tick_periodic() to be called again and the whole process loops forever. [ Impact: prevent hard lock up ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
2009-05-01x86/irq: use move_irq_desc() in create_irq_nr()Yinghai Lu
move_irq_desc() will try to move irq_desc to the home node if the allocated one is not correct, in create_irq_nr(). ( This can happen on devices that are on different nodes that are using MSI, when drivers are loaded and unloaded randomly. ) v2: fix non-smp build v3: add NUMA_IRQ_DESC to eliminate #ifdefs [ Impact: improve irq descriptor locality on NUMA systems ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <49F95EAE.2050903@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-01Revert "genirq: assert that irq handlers are indeed running in hardirq context"Thomas Gleixner
This reverts commit 044d408409cc4e1bc75c886e27ca85c270db104c. The commit added a warning when handle_IRQ_event() is called outside of hard interrupt context. This breaks the generic tasklet based interrupt resend mechanism which is used when the hardware has no way to retrigger the interrupt. So we get a warning for a use case which is correct and worked for years. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-04-30kernel/posix-cpu-timers.c: fix sparse warningH Hartley Sweeten
Sparse reports the following in kernel/posix-cpu-timers.c: warning: symbol 'firing' shadows an earlier one Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Subrata Modak <subrata@linux.vnet.ibm.com> LKML-Reference: <BD79186B4FD85F4B8E60E381CAEE1909016C1AFE@mi8nycmail19.Mi8.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29sched: account system time properlyEric Dumazet
Andrew Gallatin reported that IRQ and SOFTIRQ times were sometime not reported correctly on recent kernels, and even bisected to commit 457533a7d3402d1d91fbc125c8bd1bd16dcd3cd4 ([PATCH] fix scaled & unscaled cputime accounting) as the first bad commit. Further analysis pointed that commit 79741dd35713ff4f6fd0eafd59fa94e8a4ba922d ([PATCH] idle cputime accounting) was the real cause of the problem. account_process_tick() was not taking into account timer IRQ interrupting the idle task servicing a hard or soft irq. On mostly idle cpu, irqs were thus not accounted and top or mpstat could tell user/admin that cpu was 100 % idle, 0.00 % irq, 0.00 % softirq, while it was not. [ Impact: fix occasionally incorrect CPU statistics in top/mpstat ] Reported-by: Andrew Gallatin <gallatin@myri.com> Re-reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: rick.jones2@hp.com Cc: brice@myri.com Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <49F84BC1.7080602@cosmosbay.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-29sched: Document memory barriers implied by sleep/wake-up primitivesDavid Howells
Add a section to the memory barriers document to note the implied memory barriers of sleep primitives (set_current_state() and wrappers) and wake-up primitives (wake_up() and co.). Also extend the in-code comments on the wake_up() functions to note these implied barriers. [ Impact: add documentation ] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20090428140138.1192.94723.stgit@warthog.procyon.org.uk> Signed-off-by: Ingo Molnar <mingo@elte.hu>