aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2008-11-18Merge branch 'master' into nextJames Morris
Conflicts: fs/cifs/misc.c Merge to resolve above, per the patch below. Signed-off-by: James Morris <jmorris@namei.org> diff --cc fs/cifs/misc.c index ec36410,addd1dc..0000000 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@@ -347,13 -338,13 +338,13 @@@ header_assemble(struct smb_hdr *buffer /* BB Add support for establishing new tCon and SMB Session */ /* with userid/password pairs found on the smb session */ /* for other target tcp/ip addresses BB */ - if (current->fsuid != treeCon->ses->linux_uid) { + if (current_fsuid() != treeCon->ses->linux_uid) { cFYI(1, ("Multiuser mode and UID " "did not match tcon uid")); - read_lock(&GlobalSMBSeslock); - list_for_each(temp_item, &GlobalSMBSessionList) { - ses = list_entry(temp_item, struct cifsSesInfo, cifsSessionList); + read_lock(&cifs_tcp_ses_lock); + list_for_each(temp_item, &treeCon->ses->server->smb_ses_list) { + ses = list_entry(temp_item, struct cifsSesInfo, smb_ses_list); - if (ses->linux_uid == current->fsuid) { + if (ses->linux_uid == current_fsuid()) { if (ses->server == treeCon->ses->server) { cFYI(1, ("found matching uid substitute right smb_uid")); buffer->Uid = ses->Suid;
2008-11-15Revert "x86: blacklist DMAR on Intel G31/G33 chipsets"David Woodhouse
This reverts commit e51af6630848406fc97adbd71443818cdcda297b, which was wrongly hoovered up and submitted about a month after a better fix had already been merged. The better fix is commit cbda1ba898647aeb4ee770b803c922f595e97731 ("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do this blacklisting based on the DMI identification for the offending motherboard, since sometimes this chipset (or at least a chipset with the same PCI ID) apparently _does_ actually have an IOMMU. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-14Merge branch 'master' into nextJames Morris
Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14CRED: Make execve() take advantage of copy-on-write credentialsDavid Howells
Make execve() take advantage of copy-on-write credentials, allowing it to set up the credentials in advance, and then commit the whole lot after the point of no return. This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). The credential bits from struct linux_binprm are, for the most part, replaced with a single credentials pointer (bprm->cred). This means that all the creds can be calculated in advance and then applied at the point of no return with no possibility of failure. I would like to replace bprm->cap_effective with: cap_isclear(bprm->cap_effective) but this seems impossible due to special behaviour for processes of pid 1 (they always retain their parent's capability masks where normally they'd be changed - see cap_bprm_set_creds()). The following sequence of events now happens: (a) At the start of do_execve, the current task's cred_exec_mutex is locked to prevent PTRACE_ATTACH from obsoleting the calculation of creds that we make. (a) prepare_exec_creds() is then called to make a copy of the current task's credentials and prepare it. This copy is then assigned to bprm->cred. This renders security_bprm_alloc() and security_bprm_free() unnecessary, and so they've been removed. (b) The determination of unsafe execution is now performed immediately after (a) rather than later on in the code. The result is stored in bprm->unsafe for future reference. (c) prepare_binprm() is called, possibly multiple times. (i) This applies the result of set[ug]id binaries to the new creds attached to bprm->cred. Personality bit clearance is recorded, but now deferred on the basis that the exec procedure may yet fail. (ii) This then calls the new security_bprm_set_creds(). This should calculate the new LSM and capability credentials into *bprm->cred. This folds together security_bprm_set() and parts of security_bprm_apply_creds() (these two have been removed). Anything that might fail must be done at this point. (iii) bprm->cred_prepared is set to 1. bprm->cred_prepared is 0 on the first pass of the security calculations, and 1 on all subsequent passes. This allows SELinux in (ii) to base its calculations only on the initial script and not on the interpreter. (d) flush_old_exec() is called to commit the task to execution. This performs the following steps with regard to credentials: (i) Clear pdeath_signal and set dumpable on certain circumstances that may not be covered by commit_creds(). (ii) Clear any bits in current->personality that were deferred from (c.i). (e) install_exec_creds() [compute_creds() as was] is called to install the new credentials. This performs the following steps with regard to credentials: (i) Calls security_bprm_committing_creds() to apply any security requirements, such as flushing unauthorised files in SELinux, that must be done before the credentials are changed. This is made up of bits of security_bprm_apply_creds() and security_bprm_post_apply_creds(), both of which have been removed. This function is not allowed to fail; anything that might fail must have been done in (c.ii). (ii) Calls commit_creds() to apply the new credentials in a single assignment (more or less). Possibly pdeath_signal and dumpable should be part of struct creds. (iii) Unlocks the task's cred_replace_mutex, thus allowing PTRACE_ATTACH to take place. (iv) Clears The bprm->cred pointer as the credentials it was holding are now immutable. (v) Calls security_bprm_committed_creds() to apply any security alterations that must be done after the creds have been changed. SELinux uses this to flush signals and signal handlers. (f) If an error occurs before (d.i), bprm_free() will call abort_creds() to destroy the proposed new credentials and will then unlock cred_replace_mutex. No changes to the credentials will have been made. (2) LSM interface. A number of functions have been changed, added or removed: (*) security_bprm_alloc(), ->bprm_alloc_security() (*) security_bprm_free(), ->bprm_free_security() Removed in favour of preparing new credentials and modifying those. (*) security_bprm_apply_creds(), ->bprm_apply_creds() (*) security_bprm_post_apply_creds(), ->bprm_post_apply_creds() Removed; split between security_bprm_set_creds(), security_bprm_committing_creds() and security_bprm_committed_creds(). (*) security_bprm_set(), ->bprm_set_security() Removed; folded into security_bprm_set_creds(). (*) security_bprm_set_creds(), ->bprm_set_creds() New. The new credentials in bprm->creds should be checked and set up as appropriate. bprm->cred_prepared is 0 on the first call, 1 on the second and subsequent calls. (*) security_bprm_committing_creds(), ->bprm_committing_creds() (*) security_bprm_committed_creds(), ->bprm_committed_creds() New. Apply the security effects of the new credentials. This includes closing unauthorised files in SELinux. This function may not fail. When the former is called, the creds haven't yet been applied to the process; when the latter is called, they have. The former may access bprm->cred, the latter may not. (3) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) The bprm_security_struct struct has been removed in favour of using the credentials-under-construction approach. (c) flush_unauthorized_files() now takes a cred pointer and passes it on to inode_has_perm(), file_has_perm() and dentry_open(). Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-14CRED: Wrap task credential accesses in the x86 archDavid Howells
Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: James Morris <jmorris@namei.org>
2008-11-12Merge branch 'kvm-updates/2.6.28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates/2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: KVM: Fix pit memory leak if unable to allocate irq source id KVM: ia64: fix vmm_spin_{un}lock for !CONFIG_SMP KVM: VMX: Set IGMT bit in EPT entry KVM: Require the PCI subsystem x86: KVM guest: fix section mismatch warning in kvmclock.c KVM: ia64: Use guest signal mask when blocking KVM: MMU: increase per-vcpu rmap cache alloc size
2008-11-12Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (47 commits) ACPI: pci_link: remove acpi_irq_balance_set() interface fujitsu-laptop: Add DMI callback for Lifebook S6420 ACPI: EC: Don't do transaction from GPE handler in poll mode. ACPI: EC: lower interrupt storm treshold ACPICA: Use spinlock for acpi_{en|dis}able_gpe ACPI: EC: restart failed command ACPI: EC: wait for last write gpe ACPI: EC: make kernel messages more useful when GPE storm is detected ACPI: EC: revert msleep patch thinkpad_acpi: fingers off backlight if video.ko is serving this functionality sony-laptop: fingers off backlight if video.ko is serving this functionality msi-laptop: fingers off backlight if video.ko is serving this functionality fujitsu-laptop: fingers off backlight if video.ko is serving this functionality eeepc-laptop: fingers off backlight if video.ko is serving this functionality compal: fingers off backlight if video.ko is serving this functionality asus-acpi: fingers off backlight if video.ko is serving this functionality Acer-WMI: fingers off backlight if video.ko is serving this functionality ACPI video: if no ACPI backlight support, use vendor drivers ACPI: video: Ignore devices that aren't present in hardware Delete an unwanted return statement at evgpe.c ...
2008-11-11Merge branch 'misc' into releaseLen Brown
2008-11-11ACPI: pci_link: remove acpi_irq_balance_set() interfaceBjorn Helgaas
This removes the acpi_irq_balance_set() interface from the PCI interrupt link driver. x86 used acpi_irq_balance_set() to tell the PCI interrupt link driver to configure links to minimize IRQ sharing. But the link driver can easily figure out whether to turn on IRQ balancing based on the IRQ model (PIC/IOAPIC/etc), so we can get rid of that external interface. It's better for the driver to figure this out at init-time. If we set it externally via the x86 code, the interface reduces modularity, and we depend on the fact that acpi_process_madt() happens before we process the kernel command line. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com>
2008-11-11KVM: Fix pit memory leak if unable to allocate irq source idAvi Kivity
Reported-By: Daniel Marjamäki <danielm77@spray.se> Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-11-11KVM: VMX: Set IGMT bit in EPT entrySheng Yang
There is a potential issue that, when guest using pagetable without vmexit when EPT enabled, guest would use PAT/PCD/PWT bits to index PAT msr for it's memory, which would be inconsistent with host side and would cause host MCE due to inconsistent cache attribute. The patch set IGMT bit in EPT entry to ignore guest PAT and use WB as default memory type to protect host (notice that all memory mapped by KVM should be WB). Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-11KVM: Require the PCI subsystemAvi Kivity
PCI device assignment makes calls to pci code, so require it to be built into the kernel. Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-11-11x86: KVM guest: fix section mismatch warning in kvmclock.cRakib Mullick
WARNING: arch/x86/kernel/built-in.o(.text+0x1722c): Section mismatch in reference from the function kvm_setup_secondary_clock() to the function .devinit.text:setup_secondary_APIC_clock() The function kvm_setup_secondary_clock() references the function __devinit setup_secondary_APIC_clock(). This is often because kvm_setup_secondary_clock lacks a __devinit annotation or the annotation of setup_secondary_APIC_clock is wrong. Signed-off-by: Md.Rakib H. Mullick <rakib.mullick@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-11Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context nohz: disable tick_nohz_kick_tick() for now irq: call __irq_enter() before calling the tick_idle_check x86: HPET: enter hpet_interrupt_handler with interrupts disabled x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP x86: HPET: convert WARN_ON to WARN_ON_ONCE
2008-11-11KVM: MMU: increase per-vcpu rmap cache alloc sizeMarcelo Tosatti
The page fault path can use two rmap_desc structures, if: - walk_addr's dirty pte update allocates one rmap_desc. - mmu_lock is dropped, sptes are zapped resulting in rmap_desc being freed. - fetch->mmu_set_spte allocates another rmap_desc. Increase to 4 for safety. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-10x86: Make NUMA on 32-bit depend on BROKENRafael J. Wysocki
While investigating the failure of hibernation on 32-bit x86 with CONFIG_NUMA set, as described in this message http://marc.info/?l=linux-kernel&m=122634118116226&w=4 I asked some people for help and I was told that it wasn't really worth the effort, because CONFIG_NUMA was generally broken on 32-bit x86 systems and it shouldn't be used in such configs. For this reason, make CONFIG_NUMA depend on BROKEN instead of EXPERIMENTAL on x86-32. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andi Kleen <andi@firstfloor.org> Cc: Pavel Machek <pavel@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-10x86: HPET: enter hpet_interrupt_handler with interrupts disabledMatt Fleming
Some functions that may be called from this handler require that interrupts are disabled. Also, combining IRQF_DISABLED and IRQF_SHARED does not reliably disable interrupts in a handler, so remove IRQF_SHARED from the irq flags (this irq is not shared anyway). Signed-off-by: Matt Fleming <mjf@gentoo.org> Cc: mingo@elte.hu Cc: venkatesh.pallipadi@intel.com Cc: "Will Newton" <will.newton@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-11-10x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMPMatt Fleming
In hpet_next_event() we check that the value we just wrote to HPET_Tn_CMP(timer) has reached the chip. Currently, we're checking that the value we wrote to HPET_Tn_CMP(timer) is in HPET_T0_CMP, which, if timer is anything other than timer 0, is likely to fail. Signed-off-by: Matt Fleming <mjf@gentoo.org> Cc: mingo@elte.hu Cc: venkatesh.pallipadi@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-11-10x86: HPET: convert WARN_ON to WARN_ON_ONCEMatt Fleming
It is possible to flood the console with call traces if the WARN_ON condition is true because of the frequency with which this function is called. Signed-off-by: Matt Fleming <mjf@gentoo.org> Cc: mingo@elte.hu Cc: venkatesh.pallipadi@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-11-08Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: optimize sched_clock() a bit sched: improve sched_clock() performance
2008-11-08sched: optimize sched_clock() a bitIngo Molnar
sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling variant of __cycles_2_ns(). Most of the time sched_clock() is called with irqs disabled already. The few places that call it with irqs enabled need to be updated. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-08sched: improve sched_clock() performanceIngo Molnar
in scheduler-intense workloads native_read_tsc() overhead accounts for 20% of the system overhead: 659567 system_call 41222.9375 686796 schedule 435.7843 718382 __switch_to 665.1685 823875 switch_mm 4526.7857 1883122 native_read_tsc 55385.9412 9761990 total 2.8468 this is large part due to the rdtsc_barrier() that is done before and after reading the TSC. But sched_clock() is not a precise clock in the GTOD sense, using such barriers is completely pointless. So remove the barriers and only use them in vget_cycles(). This improves lat_ctx performance by about 5%. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-07Merge branch 'oprofile-for-tip' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into x86/urgent
2008-11-07Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, xen: fix use of pgd_page now that it really does return a page
2008-11-07oprofile: Fix p6 counter overflow checkAndi Kleen
Fix the counter overflow check for CPUs with counter width > 32 I had a similar change in a different patch that I didn't submit and I didn't notice the problem earlier because it was always tested together. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
2008-11-07xen: make sure stray alias mappings are gone before pinningJeremy Fitzhardinge
Xen requires that all mappings of pagetable pages are read-only, so that they can't be updated illegally. As a result, if a page is being turned into a pagetable page, we need to make sure all its mappings are RO. If the page had been used for ioremap or vmalloc, it may still have left over mappings as a result of not having been lazily unmapped. This change makes sure we explicitly mop them all up before pinning the page. Unlike aliases created by kmap, the there can be vmalloc aliases even for non-high pages, so we must do the flush unconditionally. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Linux Memory Management List <linux-mm@kvack.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "x86: default to reboot via ACPI" x86: align DirectMap in /proc/meminfo AMD IOMMU: fix lazy IO/TLB flushing in unmap path x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR x86: remove VISWS and PARAVIRT around NR_IRQS puzzle x86: mention ACPI in top-level Kconfig menu x86: size NR_IRQS on 32-bit systems the same way as 64-bit x86: don't allow nr_irqs > NR_IRQS x86/docs: remove noirqbalance param docs x86: don't use tsc_khz to calculate lpj if notsc is passed x86, voyager: fix smp_intr_init() compile breakage AMD IOMMU: fix detection of NP capable IOMMUs
2008-11-06x86, xen: fix use of pgd_page now that it really does return a pageJeremy Fitzhardinge
Impact: fix 32-bit Xen guest boot crash On 32-bit PAE, pud_page, for no good reason, didn't really return a struct page *. Since Jan Beulich's fix "i386/PAE: fix pud_page()", pud_page does return a struct page *. Because PAE has 3 pagetable levels, the pud level is folded into the pgd level, so pgd_page() is the same as pud_page(), and now returns a struct page *. Update the xen/mmu.c code which uses pgd_page() accordingly. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06Revert "x86: default to reboot via ACPI"Eduardo Habkost
This reverts commit c7ffa6c26277b403920e2255d10df849bd613380. the assumptio of this change was that this would not break any existing machine. Andrey Borzenkov reported troubles with the ACPI reboot method: the system would hang on reboot, necessiating a power cycle. Probably more systems are affected as well. Also, there are patches queued up for v2.6.29 to disable virtualization on emergency_restart() - which was the original motivation of this change. Reported-by: Andrey Borzenkov <arvidjaar@mail.ru> Bisected-by: Andrey Borzenkov <arvidjaar@mail.ru> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06x86: align DirectMap in /proc/meminfoHugh Dickins
Impact: right-align /proc/meminfo consistent with other fields When the split-LRU patches added Inactive(anon) and Inactive(file) lines to /proc/meminfo, all counts were moved two columns rightwards to fit in. Now move x86's DirectMap lines two columns rightwards to line up. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06Merge branch 'iommu-fixes-2.6.28' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent
2008-11-06AMD IOMMU: fix lazy IO/TLB flushing in unmap pathJoerg Roedel
Lazy flushing needs to take care of the unmap path too which is not yet implemented and leads to stale IO/TLB entries. This is fixed by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2008-11-06x86: add smp_mb() before sending INVALIDATE_TLB_VECTORSuresh Siddha
Impact: fix rare x2apic hang On x86, x2apic mode accesses for sending IPI's don't have serializing semantics. If the IPI receivner refers(in lock-free fashion) to some memory setup by the sender, the need for smp_mb() before sending the IPI becomes critical in x2apic mode. Add the smp_mb() in native_flush_tlb_others() before sending the IPI. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06x86: remove VISWS and PARAVIRT around NR_IRQS puzzleYinghai Lu
Impact: fix warning message when PARAVIRT is set in config Remove stale #ifdef components from our IRQ sizing logic. x86/Voyager is the only holdout. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06x86: mention ACPI in top-level Kconfig menuBjorn Helgaas
Impact: clarify menuconfig text Mention ACPI in the top-level menu to give a clue as to where it lives. This matches what ia64 does. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06x86: size NR_IRQS on 32-bit systems the same way as 64-bitYinghai Lu
Impact: make NR_IRQS big enough for system with lots of apic/pins If lots of IO_APIC's are there (or can be there), size the same way as 64-bit, depending on MAX_IO_APICS and NR_CPUS. This fixes the boot problem reported by Ben Hutchings on a 32-bit server with 5 IO-APICs and 240 IO-APIC pins. Signed-off-by: Yinghai <yinghai@kernel.org> Tested-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06x86: don't allow nr_irqs > NR_IRQSBen Hutchings
Impact: fix boot hang on 32-bit systems with more than 224 IO-APIC pins On some 32-bit systems with a lot of IO-APICs probe_nr_irqs() can return a value larger than NR_IRQS. This will lead to probe_irq_on() overrunning the irq_desc array. I hit this when running net-next-2.6 (close to 2.6.28-rc3) on a Supermicro dual Xeon system. NR_IRQS is 224 but probe_nr_irqs() detects 5 IOAPICs and returns 240. Here are the log messages: Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0]) Tue Nov 4 16:53:47 2008 IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x02] address[0xfec81000] gsi_base[24]) Tue Nov 4 16:53:47 2008 IOAPIC[1]: apic_id 2, version 32, address 0xfec81000, GSI 24-47 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x03] address[0xfec81400] gsi_base[48]) Tue Nov 4 16:53:47 2008 IOAPIC[2]: apic_id 3, version 32, address 0xfec81400, GSI 48-71 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x04] address[0xfec82000] gsi_base[72]) Tue Nov 4 16:53:47 2008 IOAPIC[3]: apic_id 4, version 32, address 0xfec82000, GSI 72-95 Tue Nov 4 16:53:47 2008 ACPI: IOAPIC (id[0x05] address[0xfec82400] gsi_base[96]) Tue Nov 4 16:53:47 2008 IOAPIC[4]: apic_id 5, version 32, address 0xfec82400, GSI 96-119 Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) Tue Nov 4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Tue Nov 4 16:53:47 2008 Enabling APIC mode: Flat. Using 5 I/O APICs Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-05sched: re-tune balancingIngo Molnar
Impact: improve wakeup affinity on NUMA systems, tweak SMP systems Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain balancing defaults on NUMA and SMP systems. Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason why we would not want to have wakeup affinity across nodes as well. (we already do this in the standard NUMA template.) lat_ctx on a NUMA box is particularly happy about this change: before: | phoenix:~/l> ./lat_ctx -s 0 2 | "size=0k ovr=2.60 | 2 5.70 after: | phoenix:~/l> ./lat_ctx -s 0 2 | "size=0k ovr=2.65 | 2 2.07 a 2.75x speedup. pipe-test is similarly happy about it too: | phoenix:~/sched-tests> ./pipe-test | 18.26 usecs/loop. | 14.70 usecs/loop. | 14.38 usecs/loop. | 10.55 usecs/loop. # +WAKE_AFFINE on domain0+domain1 | 8.63 usecs/loop. | 8.59 usecs/loop. | 9.03 usecs/loop. | 8.94 usecs/loop. | 8.96 usecs/loop. | 8.63 usecs/loop. Also: - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings) - enable SD_WAKE_BALANCE on SMP domains Sysbench+postgresql improves all around the board, quite significantly: .28-rc3-11474e2c .28-rc3-11474e2c-tune ------------------------------------------------- 1: 571 688 +17.08% 2: 1236 1206 -2.55% 4: 2381 2642 +9.89% 8: 4958 5164 +3.99% 16: 9580 9574 -0.07% 32: 7128 8118 +12.20% 64: 7342 8266 +11.18% 128: 7342 8064 +8.95% 256: 7519 7884 +4.62% 512: 7350 7731 +4.93% ------------------------------------------------- SUM: 55412 59341 +6.62% So it's a win both for the runup portion, the peak area and the tail. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-04x86: don't use tsc_khz to calculate lpj if notsc is passedAlok Kataria
Impact: fix udelay when "notsc" boot parameter is passed With notsc passed on commandline, tsc may not be used for udelays, make sure that we do not use tsc_khz to calculate the lpj value in such cases. Reported-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-03Merge branch 'io-mappings-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'io-mappings-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: io mapping: clean up #ifdefs io mapping: improve documentation i915: use io-mapping interfaces instead of a variety of mapping kludges resources: add io-mapping functions to dynamically map large device apertures x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmaps
2008-11-03io mapping: clean up #ifdefsKeith Packard
Impact: cleanup clean up ifdefs: change #ifdef CONFIG_X86_32/64 to CONFIG_HAVE_ATOMIC_IOMAP. flip around the #ifdef sections to clean up the structure. Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-03x86, voyager: fix smp_intr_init() compile breakageJames Bottomley
Impact: fix x86/Voyager build Looks like this became static on the rest of x86. Fix it up by adding an external definition to mach-voyager/setup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-01Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix AMDC1E and XTOPOLOGY conflict in cpufeature x86: build fix
2008-11-01x86: Clean up late e820 resource allocationLinus Torvalds
This makes the late e820 resources use 'insert_resource_expand_to_fit()' instead of doing a 'reserve_region_with_split()', and also avoids marking them as IORESOURCE_BUSY. This results in us being perfectly happy to use pre-existing PCI resources even if they were marked as being in a reserved region, while still avoiding any _new_ allocations in the reserved regions. It also makes for a simpler and more accurate resource tree. Example resource allocation from Jonathan Corbet, who has firmware that has an e820 reserved entry that covered a big range (e0000000-fed003ff), and that had various PCI resources in it set up by firmware. With old kernels, the reserved range would force us to re-allocate all pre-existing PCI resources, and his reserved range would end up looking like this: e0000000-fed003ff : reserved fec00000-fec00fff : IOAPIC 0 fed00000-fed003ff : HPET 0 where only the pre-allocated special regions (IOAPIC and HPET) were kept around. With 2.6.28-rc2, which uses 'reserve_region_with_split()', Jonathan's resource tree looked like this: e0000000-fe7fffff : reserved fe800000-fe8fffff : PCI Bus 0000:01 fe800000-fe8fffff : reserved fe900000-fe9d9aff : reserved fe9d9b00-fe9d9bff : 0000:00:1f.3 fe9d9b00-fe9d9bff : reserved fe9d9c00-fe9d9fff : 0000:00:1a.7 fe9d9c00-fe9d9fff : reserved fe9da000-fe9dafff : 0000:00:03.3 fe9da000-fe9dafff : reserved fe9db000-fe9dbfff : 0000:00:19.0 fe9db000-fe9dbfff : reserved fe9dc000-fe9dffff : 0000:00:1b.0 fe9dc000-fe9dffff : reserved fe9e0000-fe9fffff : 0000:00:19.0 fe9e0000-fe9fffff : reserved fea00000-fea7ffff : 0000:00:02.0 fea00000-fea7ffff : reserved fea80000-feafffff : 0000:00:02.1 fea80000-feafffff : reserved feb00000-febfffff : 0000:00:02.0 feb00000-febfffff : reserved fec00000-fed003ff : reserved fec00000-fec00fff : IOAPIC 0 fed00000-fed003ff : HPET 0 and because the reserved entry had been split and moved into the individual resources, and because it used the IORESOURCE_BUSY flag, the drivers that actually wanted to _use_ those resources couldn't actually attach to them: e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff] HDA Intel 0000:00:1b.0: BAR 0: can't reserve mem region [0xfe9dc000-0xfe9dffff] with this patch, the resource tree instead becomes e0000000-fed003ff : reserved fe800000-fe8fffff : PCI Bus 0000:01 fe9d9b00-fe9d9bff : 0000:00:1f.3 fe9d9c00-fe9d9fff : 0000:00:1a.7 fe9d9c00-fe9d9fff : ehci_hcd fe9da000-fe9dafff : 0000:00:03.3 fe9db000-fe9dbfff : 0000:00:19.0 fe9db000-fe9dbfff : e1000e fe9dc000-fe9dffff : 0000:00:1b.0 fe9dc000-fe9dffff : ICH HD audio fe9e0000-fe9fffff : 0000:00:19.0 fe9e0000-fe9fffff : e1000e fea00000-fea7ffff : 0000:00:02.0 fea80000-feafffff : 0000:00:02.1 feb00000-febfffff : 0000:00:02.0 fec00000-fec00fff : IOAPIC 0 fed00000-fed003ff : HPET 0 ie the one reserved region now ends up surrounding all the PCI resources that were allocated inside of it by firmware, and because it is not marked BUSY, drivers have no problem attaching to the pre-allocated resources. Reported-and-tested-by: Jonathan Corbet <corbet@lwn.net> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-31x86: fix AMDC1E and XTOPOLOGY conflict in cpufeatureVenki Pallipadi
Impact: fix xsave slowdown regression Fix two features from conflicting in feature bits. Fixes this performance regression: Subject: cpu2000(both float and int) 13% regression with 2.6.28-rc1 http://lkml.org/lkml/2008/10/28/36 Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Bisected-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-31x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmapsKeith Packard
Impact: introduce new APIs, separate kmap code from CONFIG_HIGHMEM This takes the code used for CONFIG_HIGHMEM memory mappings except that it's designed for dynamic IO resource mapping. These fixmaps are available even with CONFIG_HIGHMEM turned off. Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-31x86: build fixIngo Molnar
Impact: build fix on certain UP configs fix: arch/x86/kernel/cpu/common.c: In function 'cpu_init': arch/x86/kernel/cpu/common.c:1141: error: 'boot_cpu_id' undeclared (first use in this function) arch/x86/kernel/cpu/common.c:1141: error: (Each undeclared identifier is reported only once arch/x86/kernel/cpu/common.c:1141: error: for each function it appears in.) Pull in asm/smp.h on UP, so that we get the definition of boot_cpu_id. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: lguest: fix irq vectors. lguest: fix early_ioremap. lguest: fix example launcher compile after moved asm-x86 dir.
2008-10-30Merge branch 'x86-fixes-for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: cpu_index build fix x86/voyager: fix missing cpu_index initialisation x86/voyager: fix compile breakage caused by dc1e35c6e95e8923cf1d3510438b63c600fee1e2 x86: fix /dev/mem mmap breakage when PAT is disabled x86/voyager: fix compile breakage casued by x86: move prefill_possible_map calling early x86: use CONFIG_X86_SMP instead of CONFIG_SMP x86/voyager: fix boot breakage caused by x86: boot secondary cpus through initial_code x86, uv: fix compile error in uv_hub.h i386/PAE: fix pud_page() x86: remove debug code from arch_add_memory() x86: start annotating early ioremap pointers with __iomem x86: two trivial sparse annotations x86: fix init_memory_mapping for [dc000000 - e0000000) - v2
2008-10-31lguest: fix irq vectors.Rusty Russell
do_IRQ: cannot handle IRQ -1 vector 0x20 cpu 0 ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/irq_32.c:219! We're not ISA: we have a 1:1 mapping from vectors to irqs. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>