aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2008-06-24KVM: ioapic: fix lost interrupt when changing a device's irqAvi Kivity
The ioapic acknowledge path translates interrupt vectors to irqs. It currently uses a first match algorithm, stopping when it finds the first redirection table entry containing the vector. That fails however if the guest changes the irq to a different line, leaving the old redirection table entry in place (though masked). Result is interrupts not making it to the guest. Fix by always scanning the entire redirection table. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix oops on guest userspace access to guest pagetableAvi Kivity
KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)Marcelo Tosatti
kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: MMU: Fix rmap_write_protect() hugepage iteration bugMarcelo Tosatti
rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: close timer injection race window in __vcpu_runMarcelo Tosatti
If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-24KVM: Fix race between timer migration and vcpu migrationMarcelo Tosatti
A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-23alpha: fix compile error in arch/alpha/mm/init.cThorsten Kranzkowski
Commit 9267b4b3880d00dc2dab90f1d817c856939114f7 ("alpha: fix module load failures on smp (bug #10926)") causes a regression for my ev4 uniprocessor build: CC arch/alpha/mm/init.o /export/data/repositories/linux-2.6/arch/alpha/mm/init.c:34: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘typeof’ make[2]: *** [arch/alpha/mm/init.o] Error 1 make[1]: *** [arch/alpha/mm] Error 2 make: *** [sub-make] Error 2 This fixes it for me (compile and boot tested): Signed-off-by: Thorsten Kranzkowski <dl8bcu@dl8bcu.de> Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-23Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: nfs_updatepage(): don't mark page as dirty if an error occurred NFS: Fix filehandle size comparisons in the mount code NFS: Reduce the NFS mount code stack usage.
2008-06-23NFS: nfs_updatepage(): don't mark page as dirty if an error occurredTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23NFS: Fix filehandle size comparisons in the mount codeTrond Myklebust
Fix a sign issue in xdr_decode_fhstatus3() Fix incorrect comparison in nfs_validate_mount_data() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23NFS: Reduce the NFS mount code stack usage.Trond Myklebust
This appears to fix the Oops reported in http://bugzilla.kernel.org/show_bug.cgi?id=10826 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futexes: fix fault handling in futex_lock_pi
2008-06-23Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: refactor wait_for_completion_timeout() sched: fix wait_for_completion_timeout() spurious failure under heavy load sched: rt: dont stop the period timer when there are tasks wanting to run
2008-06-23Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: don't drop NX bit xen: mask unwanted pte bits in __supported_pte_mask xen: Use wmb instead of rmb in xen_evtchn_do_upcall(). x86: fix NULL pointer deref in __switch_to
2008-06-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mthca: Clear ICM pages before handing to FW
2008-06-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: sb - Fix wrong assertions ALSA: aw2 - Fix Oops at initialization
2008-06-23mm: fix race in COW logicNick Piggin
There is a race in the COW logic. It contains a shortcut to avoid the COW and reuse the page if we have the sole reference on the page, however it is possible to have two racing do_wp_page()ers with one causing the other to mistakenly believe it is safe to take the shortcut when it is not. This could lead to data corruption. Process 1 and process2 each have a wp pte of the same anon page (ie. one forked the other). The page's mapcount is 2. Then they both attempt to write to it around the same time... proc1 proc2 thr1 proc2 thr2 CPU0 CPU1 CPU3 do_wp_page() do_wp_page() trylock_page() can_share_swap_page() load page mapcount (==2) reuse = 0 pte unlock copy page to new_page pte lock page_remove_rmap(page); trylock_page() can_share_swap_page() load page mapcount (==1) reuse = 1 ptep_set_access_flags (allow W) write private key into page read from page ptep_clear_flush() set_pte_at(pte of new_page) Fix this by moving the page_remove_rmap of the old page after the pte clear and flush. Potentially the entire branch could be moved down here, but in order to stay consistent, I won't (should probably move all the *_mm_counter stuff with one patch). Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-23Fix ZERO_PAGE breakage with vmwareLinus Torvalds
Commit 89f5b7da2a6bad2e84670422ab8192382a5aeb9f ("Reinstate ZERO_PAGE optimization in 'get_user_pages()' and fix XIP") broke vmware, as reported by Jeff Chua: "This broke vmware 6.0.4. Jun 22 14:53:03.845: vmx| NOT_IMPLEMENTED /build/mts/release/bora-93057/bora/vmx/main/vmmonPosix.c:774" and the reason seems to be that there's an old bug in how we handle do FOLL_ANON on VM_SHARED areas in get_user_pages(), but since it only triggered if the whole page table was missing, nobody had apparently hit it before. The recent changes to 'follow_page()' made the FOLL_ANON logic trigger not just for whole missing page tables, but for individual pages as well, and exposed this problem. This fixes it by making the test for when FOLL_ANON is used more careful, and also makes the code easier to read and understand by moving the logic to a separate inline function. Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-23removed unused var real_tty on n_tty_ioctl()Gustavo Fernando Padovan
I noted that the 'struct tty_struct *real_tty' is not used in this function, so I removed the code about 'real_tty'. Signed-off-by: Gustavo Fernando Padovan <gustavo@las.ic.unicamp.br> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-23tty_driver: Update required method documentationAlan Cox
Some of the requirement rules are now more relaxed. Also correct a contradiction in the previous update Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-23[patch 2/3] vfs: dcache cleanupsMiklos Szeredi
Comment from Al Viro: add prepend_name() wrapper. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 1/3] vfs: dcache sparse fixesMiklos Szeredi
Fix the following sparse warnings: fs/dcache.c:2183:19: warning: symbol 'filp_cachep' was not declared. Should it be static? fs/dcache.c:115:3: warning: context imbalance in 'dentry_iput' - unexpected unlock fs/dcache.c:188:2: warning: context imbalance in 'dput' - different lock contexts for basic block fs/dcache.c:400:2: warning: context imbalance in 'prune_one_dentry' - different lock contexts for basic block fs/dcache.c:431:22: warning: context imbalance in 'prune_dcache' - different lock contexts for basic block fs/dcache.c:563:2: warning: context imbalance in 'shrink_dcache_sb' - different lock contexts for basic block fs/dcache.c:1385:6: warning: context imbalance in 'd_delete' - wrong count at exit fs/dcache.c:1636:2: warning: context imbalance in '__d_unalias' - unexpected unlock fs/dcache.c:1735:2: warning: context imbalance in 'd_materialise_unique' - different lock contexts for basic block Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 3/3] vfs: make d_path() consistent across mount operationsAndreas Gruenbacher
The path that __d_path() computes can become slightly inconsistent when it races with mount operations: it grabs the vfsmount_lock when traversing mount points but immediately drops it again, only to re-grab it when it reaches the next mount point. The result is that the filename computed is not always consisent, and the file may never have had that name. (This is unlikely, but still possible.) Fix this by grabbing the vfsmount_lock for the whole duration of __d_path(). Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: John Johansen <jjohansen@suse.de> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23IB/mthca: Clear ICM pages before handing to FWEli Cohen
Current memfree FW has a bug which in some cases, assumes that ICM pages passed to it are cleared. This patch uses __GFP_ZERO to allocate all ICM pages passed to the FW. Once firmware with a fix is released, we can make the workaround conditional on firmware version. This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here: http://lists.openfabrics.org/pipermail/general/2008-May/050026.html Cc: <stable@kernel.org> Signed-off-by: Eli Cohen <eli@mellanox.co.il> [ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing ICM memory and memset()ing it to 0. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-06-23[patch 4/4] flock: remove unused fields from file_lock_operationsDenis V. Lunev
fl_insert and fl_remove are not used right now in the kernel. Remove them. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 3/4] vfs: fix ERR_PTR abuse in generic_readlinkMarcin Slusarz
generic_readlink calls ERR_PTR for negative and positive values (vfs_readlink returns length of "link"), but it should not (not an errno) and does not need to. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 2/4] fs: make struct file arg to d_path constJan Engelhardt
Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch 1/4] vfs: path_{get,put}() cleanupsJan Blunck
Here are some more places where path_{get,put}() can be used instead of dput()/mntput() pair. Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch for 2.6.26 4/4] vfs: utimensat(): fix write access check for futimens()Michael Kerrisk
The POSIX.1 draft spec for futimens()/utimensat() says: Only a process with the effective user ID equal to the user ID of the file, *or with write access to the file*, or with appropriate privileges may use futimens() or utimensat() with a null pointer as the times argument or with both tv_nsec fields set to the special value UTIME_NOW. The important piece here is "with write access to the file", and this matters for futimens(), which deals with an argument that is a file descriptor referring to the file whose timestamps are being updated, The standard is saying that the "writability" check is based on the file permissions, not the access mode with which the file is opened. (This behavior is consistent with the semantics of FreeBSD's futimes().) However, Linux is currently doing the latter -- futimens(fd, times) is a library function implemented as utimensat(fd, NULL, times, 0) and within the utimensat() implementation we have the code: f = fget(dfd); // dfd is 'fd' ... if (f) { if (!(f->f_mode & FMODE_WRITE)) goto mnt_drop_write_and_out; The check should instead be based on the file permissions. Thanks to Miklos for pointing out how to do this check. Miklos also pointed out a simplification that could be made to my first version of this patch, since the checks for the pathname and file descriptor cases can now be conflated. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch for 2.6.26 3/4] vfs: utimensat(): fix error checking for ↵Michael Kerrisk
{UTIME_NOW,UTIME_OMIT} case The POSIX.1 draft spec for utimensat() says: Only a process with the effective user ID equal to the user ID of the file or with appropriate privileges may use futimens() or utimensat() with a non-null times argument that does not have both tv_nsec fields set to UTIME_NOW and does not have both tv_nsec fields set to UTIME_OMIT. If this condition is violated, then the error EPERM should result. However, the current implementation does not generate EPERM if one tv_nsec field is UTIME_NOW while the other is UTIME_OMIT. It should give this error for that case. This patch: a) Repairs that problem. b) Removes the now unneeded nsec_special() helper function. c) Adds some comments to explain the checks that are being performed. Thanks to Miklos, who provided comments on the previous iteration of this patch. As a result, this version is a little simpler and and its logic is better structured. Miklos suggested an alternative idea, migrating the is_owner_or_cap() checks into fs/attr.c:inode_change_ok() via the use of an ATTR_OWNER_CHECK flag. Maybe we could do that later, but for now I've gone with this version, which is IMO simpler, and can be more easily read as being correct. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch for 2.6.26 1/4] vfs: utimensat(): ignore tv_sec if tv_nsec == ↵Michael Kerrisk
UTIME_OMIT or UTIME_NOW The POSIX.1 draft spec for utimensat() says that if a times[n].tv_nsec field is UTIME_OMIT or UTIME_NOW, then the value in the corresponding tv_sec field is ignored. See the last sentence of this para, from the spec: If the tv_nsec field of a timespec structure has the special value UTIME_NOW, the file's relevant timestamp shall be set to the greatest value supported by the file system that is not greater than the current time. If the tv_nsec field has the special value UTIME_OMIT, the file's relevant timestamp shall not be changed. In either case, the tv_sec field shall be ignored. However the current Linux implementation requires the tv_sec value to be zero (or the EINVAL error results). This requirement should be removed. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[patch for 2.6.26 2/4] vfs: utimensat(): be consistent with utime() for ↵Michael Kerrisk
immutable and append-only files This patch fixes utimensat() to make its behavior consistent with that of utime()/utimes() when dealing with files marked immutable and append-only. The current utimensat() implementation also returns EPERM if 'times' is non-NULL and the tv_nsec fields are both UTIME_NOW. For consistency, the (times != NULL && times[0].tv_nsec == UTIME_NOW && times[1].tv_nsec == UTIME_NOW) case should be treated like the traditional utimes() case where 'times' is NULL. That is, the call should succeed for a file marked append-only and should give the error EACCES if the file is marked as immutable. The simple way to do this is to set 'times' to NULL if (times[0].tv_nsec == UTIME_NOW && times[1].tv_nsec == UTIME_NOW). This is also the natural approach, since POSIX.1 semantics consider the times == {{x, UTIME_NOW}, {y, UTIME_NOW}} to be exactly equivalent to the case for times == NULL. (Thanks to Miklos for pointing this out.) Patch 3 in this series relies on the simplification provided by this patch. Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23[PATCH] fix cgroup-inflicted breakage in block_dev.cAl Viro
devcgroup_inode_permission() expects MAY_FOO, not FMODE_FOO; kindly keep your misdesign consistent if you positively have to inflict it on the kernel. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-06-23futexes: fix fault handling in futex_lock_piThomas Gleixner
This patch addresses a very sporadic pi-futex related failure in highly threaded java apps on large SMP systems. David Holmes reported that the pi_state consistency check in lookup_pi_state triggered with his test application. This means that the kernel internal pi_state and the user space futex variable are out of sync. First we assumed that this is a user space data corruption, but deeper investigation revieled that the problem happend because the pi-futex code is not handling a fault in the futex_lock_pi path when the user space variable needs to be fixed up. The fault happens when a fork mapped the anon memory which contains the futex readonly for COW or the page got swapped out exactly between the unlock of the futex and the return of either the new futex owner or the task which was the expected owner but failed to acquire the kernel internal rtmutex. The current futex_lock_pi() code drops out with an inconsistent in case it faults and returns -EFAULT to user space. User space has no way to fixup that state. When we wrote this code we thought that we could not drop the hash bucket lock at this point to handle the fault. After analysing the code again it turned out to be wrong because there are only two tasks involved which might modify the pi_state and the user space variable: - the task which acquired the rtmutex - the pending owner of the pi_state which did not get the rtmutex Both tasks drop into the fixup_pi_state() function before returning to user space. The first task which acquired the hash bucket lock faults in the fixup of the user space variable, drops the spinlock and calls futex_handle_fault() to fault in the page. Now the second task could acquire the hash bucket lock and tries to fixup the user space variable as well. It either faults as well or it succeeds because the first task already faulted the page in. One caveat is to avoid a double fixup. After returning from the fault handling we reacquire the hash bucket lock and check whether the pi_state owner has been modified already. Reported-by: David Holmes <david.holmes@sun.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Holmes <david.holmes@sun.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> kernel/futex.c | 93 ++++++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 73 insertions(+), 20 deletions(-)
2008-06-23ALSA: sb - Fix wrong assertionsTakashi Iwai
snd_assert() in save_mixer() and restore_mixer() in sb_mixer.c is just wrong. The debug code wasn't tested at all, obviously... Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-06-23ALSA: aw2 - Fix Oops at initializationTakashi Iwai
The irq handler may be called before the proper initialization of hardware. Call snd_aw2_saa7146_setup() before the irq handler registration. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-06-23Merge branch 'linus' into sched/urgentIngo Molnar
2008-06-22Fix performance regression on lmbench select benchmarkLinus Torvalds
Christian Borntraeger reported that reinstating cond_resched() with CONFIG_PREEMPT caused a performance regression on lmbench: For example select file 500: 23 microseconds 32 microseconds and that's really because we totally unnecessarily do the cond_resched() in the innermost loop of select(), which is just silly. This moves it out from the innermost loop (which only ever loops ove the bits in a single "unsigned long" anyway), which makes the performance regression go away. Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-21Slab: Fix memory leak in fallback_alloc()Christoph Lameter
The zonelist patches caused the loop that checks for available objects in permitted zones to not terminate immediately. One object per zone per allocation may be allocated and then abandoned. Break the loop when we have successfully allocated one object. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-21Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: Ext4: Fix online resize block group descriptor corruption
2008-06-21Merge branch 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6Linus Torvalds
* 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6: hwmon: (lm75) sensor reading bugfix hwmon: (abituguru3) update driver detection hwmon: (w83791d) new maintainer hwmon: (abituguru3) Identify Abit AW8D board as such hwmon: Update the sysfs interface documentation hwmon: (adt7473) Initialize max_duty_at_overheat before use hwmon: (lm85) Fix function RANGE_TO_REG()
2008-06-21Add return value to reserve_bootmem_node()Bernhard Walle
This patch changes the function reserve_bootmem_node() from void to int, returning -ENOMEM if the allocation fails. This fixes a build problem on x86 with CONFIG_KEXEC=y and CONFIG_NEED_MULTIPLE_NODES=y Signed-off-by: Bernhard Walle <bwalle@suse.de> Reported-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: netns: Don't receive new packets in a dead network namespace. sctp: Make sure N * sizeof(union sctp_addr) does not overflow. pppoe: warning fix ipv6: Drop packets for loopback address from outside of the box. ipv6: Remove options header when setsockopt's optlen is 0 mac80211: detect driver tx bugs
2008-06-20netns: Don't receive new packets in a dead network namespace.Eric W. Biederman
Alexey Dobriyan <adobriyan@gmail.com> writes: > Subject: ICMP sockets destruction vs ICMP packets oops > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt. > icmp_reply() wants ICMP socket. > > Steps to reproduce: > > launch shell in new netns > move real NIC to netns > setup routing > ping -i 0 > exit from shell > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30 > PGD 17f3cd067 PUD 17f3ce067 PMD 0 > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC > CPU 0 > Modules linked in: usblp usbcore > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4 > RIP: 0010:[<ffffffff803fce17>] [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP: 0018:ffffffff8057fc30 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900 > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800 > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815 > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28 > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800 > FS: 0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0) > Stack: 0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4 > ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246 > 000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436 > Call Trace: > <IRQ> [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0 > [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360 > [<ffffffff803fd645>] icmp_echo+0x65/0x70 > [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0 > [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0 > [<ffffffff803d71bb>] ip_rcv+0x33b/0x650 > [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340 > [<ffffffff803be57d>] process_backlog+0x9d/0x100 > [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250 > [<ffffffff80237be5>] __do_softirq+0x75/0x100 > [<ffffffff8020c97c>] call_softirq+0x1c/0x30 > [<ffffffff8020f085>] do_softirq+0x65/0xa0 > [<ffffffff80237af7>] irq_exit+0x97/0xa0 > [<ffffffff8020f198>] do_IRQ+0xa8/0x130 > [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60 > [<ffffffff8020bc46>] ret_from_intr+0x0/0xf > <EOI> [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60 > [<ffffffff80212f23>] ? mwait_idle+0x43/0x60 > [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0 > [<ffffffff8040f380>] ? rest_init+0x70/0x80 > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08 > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00 > RIP [<ffffffff803fce17>] icmp_sk+0x17/0x30 > RSP <ffffffff8057fc30> > CR2: 0000000000000000 > ---[ end trace ea161157b76b33e8 ]--- > Kernel panic - not syncing: Aiee, killing interrupt handler! Receiving packets while we are cleaning up a network namespace is a racy proposition. It is possible when the packet arrives that we have removed some but not all of the state we need to fully process it. We have the choice of either playing wack-a-mole with the cleanup routines or simply dropping packets when we don't have a network namespace to handle them. Since the check looks inexpensive in netif_receive_skb let's just drop the incoming packets. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-20sctp: Make sure N * sizeof(union sctp_addr) does not overflow.David S. Miller
As noticed by Gabriel Campana, the kmalloc() length arg passed in by sctp_getsockopt_local_addrs_old() can overflow if ->addr_num is large enough. Therefore, enforce an appropriate limit. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-20pppoe: warning fixStephen Hemminger
Fix warning: drivers/net/pppoe.c: In function 'pppoe_recvmsg': drivers/net/pppoe.c:945: warning: comparison of distinct pointer types lacks a cast because skb->len is unsigned int and total_len is size_t Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-20Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] SN2: security hole in sn2_ptc_proc_write
2008-06-20alpha: resurrect Cypress IDE quirkIvan Kokshaysky
Which was removed in the hope that generic legacy IDE quirk in drivers/pci/probe.c is sufficient for Cypress IDE. It isn't, as this controller has non-standard BAR layout: secondary channel registers are in the BAR0-1 of the second PCI function - not in the BAR2-3 of the same function, as the generic quirk routine assumes. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-20alpha: fix compile failures with gcc-4.3 (bug #10438)Ivan Kokshaysky
Vast majority of these build failures are gcc-4.3 warnings about static functions and objects being referenced from non-static (read: "extern inline") functions, in conjunction with our -Werror. We cannot just convert "extern inline" to "static inline", as people keep suggesting all the time, because "extern inline" logic is crucial for generic kernel build. So - just make sure that all callees of critical "extern inline" functions are also "extern inline"; - use "static inline", wherever it's possible. traps.c: work around gcc-4.3 being too smart about array bounds-checking. TODO: add "gnu_inline" attribute to all our "extern inline" functions to ensure desired behaviour with future compilers. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-06-20alpha: link failure fixIvan Kokshaysky
With built-in scsi disk driver, the final link fails with a following error: `.exit.text' referenced in section `.rodata' of drivers/built-in.o: defined in discarded section `.exit.text' of drivers/built-in.o This happens with -Os (CONFIG_CC_OPTIMIZE_FOR_SIZE=y) with all gcc-4 versions, and also with -O2 and gcc-4.3. The problem is in sd.c:sd_major() being inlined into __exit function exit_sd(), and the compiler generating a jump table in .rodata section for the 'switch' statement in sd_major(). So we have references to discarded section. Fixed with a big hammer in the form of -fno-jump-tables. Note that jump tables vs. discarded sections is a generic problem, other architectures are just lucky not to suffer from it. But with a slightly more complex switch/case statement it can be reproduced on x86 as well. So maybe at some point we should consider -fno-jump-tables as a generic compile option... Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>