aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2008-01-24Merge branch 'linux-2.6'Paul Mackerras
2008-01-24[POWERPC] Provide a way to protect 4k subpages when using 64k pagesPaul Mackerras
Using 64k pages on 64-bit PowerPC systems makes life difficult for emulators that are trying to emulate an ISA, such as x86, which use a smaller page size, since the emulator can no longer use the MMU and the normal system calls for controlling page protections. Of course, the emulator can emulate the MMU by checking and possibly remapping the address for each memory access in software, but that is pretty slow. This provides a facility for such programs to control the access permissions on individual 4k sub-pages of 64k pages. The idea is that the emulator supplies an array of protection masks to apply to a specified range of virtual addresses. These masks are applied at the level where hardware PTEs are inserted into the hardware page table based on the Linux PTEs, so the Linux PTEs are not affected. Note that this new mechanism does not allow any access that would otherwise be prohibited; it can only prohibit accesses that would otherwise be allowed. This new facility is only available on 64-bit PowerPC and only when the kernel is configured for 64k pages. The masks are supplied using a new subpage_prot system call, which takes a starting virtual address and length, and a pointer to an array of protection masks in memory. The array has a 32-bit word per 64k page to be protected; each 32-bit word consists of 16 2-bit fields, for which 0 allows any access (that is otherwise allowed), 1 prevents write accesses, and 2 or 3 prevent any access. Implicit in this is that the regions of the address space that are protected are switched to use 4k hardware pages rather than 64k hardware pages (on machines with hardware 64k page support). In fact the whole process is switched to use 4k hardware pages when the subpage_prot system call is used, but this could be improved in future to switch only the affected segments. The subpage protection bits are stored in a 3 level tree akin to the page table tree. The top level of this tree is stored in a structure that is appended to the top level of the page table tree, i.e., the pgd array. Since it will often only be 32-bit addresses (below 4GB) that are protected, the pointers to the first four bottom level pages are also stored in this structure (each bottom level page contains the protection bits for 1GB of address space), so the protection bits for addresses below 4GB can be accessed with one fewer loads than those for higher addresses. Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-22[MIPS] SMTC: Fix build error.Frank Rowand
Fix compile warning (which becomes compile error due to -Werror). Type of argument "flags" for spin_lock_irqsave() was incorrect in some functions. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-01-21[POWERPC] Separate MPC52xx PSC FIFO registers from rest of PSCJohn Rigby
This is in preparation for the addition of MPC512x PSC support. The main difference in the 512x is in the fifo registers. Signed-off-by: John Rigby <jrigby@freescale.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-01-18[POWERPC] mpc5200: eliminate mpc52xx_*_map_*() functions.Grant Likely
mpc5200 platform code defines a bunch of map functions which duplicate the functionality of of_iomap(). Remove them and use of_iomap() instead. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-01-18[POWERPC] mpc5200: Add common mpc52xx_setup_pci() routineMarian Balakowicz
This patch moves a generic pci init code from lite5200 platform file to a common mpc52xx_setup_pci() routine and adds additional compatibility property verification. Signed-off-by: Marian Balakowicz <m8@semihalf.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-01-17CRIS v10: vmlinux.lds.S: ix kernel oops on boot and use common definesJesper Nilsson
- Move alignment to page size of init data outside ifdef for BLK_DEV_INITRD. The reservation up to page size of memory after init data was previously not done if BLK_DEV_INITRD was undefined. This caused a kernel oops when init memory pages were freed after startup, data placed in the same page as the last init memory would also be freed and reused, with disastrous results. - Use macros for initcalls and .text sections. - Replace hardcoded page size constant with PAGE_SIZE define. - Change include/asm-cris/page.h to use the _AC macro to instead of testing __ASSEMBLY__. Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Mikael Starvik <mikael.starvik@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-17Merge branch 'for-2.6.25' of ↵Paul Mackerras
master.kernel.org:/pub/scm/linux/kernel/git/olof/pasemi into for-2.6.25
2008-01-17[POWERPC] Add hugepagesz boot-time parameterJon Tollefson
This adds the hugepagesz boot-time parameter for ppc64. It lets one pick the size for huge pages. The choices available are 64K and 16M when the base page size is 4k. It defaults to 16M (previously the only only choice) if nothing or an invalid choice is specified. Tested 64K huge pages successfully with the libhugetlbfs 1.2. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-17[POWERPC] iSeries: eliminate pci_dn bussubnoStephen Rothwell
xlate_iomm_address() really wants the ds_addr to pass to the HV, so store that value (instead of the BAR number) when we allocate the device bars. This is not a fast path, so we can look up the device_node property there instead of using the bussubno field of the pci_dn. The other user of iseries_ds_addr() was already scanning the device tree, so looking up a property will not slow it down any more. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-17[POWERPC] The pci_dn pcidev is only used by EEHStephen Rothwell
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-17[POWERPC] The pci_dn class_code is only used by EEHStephen Rothwell
... so move it into the #ifdef CONFIG_EEH section. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-17[POWERPC] Add of_find_matching_node() helper functionGrant Likely
Similar to of_find_compatible_node(), of_find_matching_node() and for_each_matching_node() allow you to iterate over the device tree looking for specific nodes, except that they take of_device_id tables instead of strings. This also moves of_match_node() from driver/of/device.c to driver/of/base.c to colocate it with the of_find_matching_node which depends on it. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-17[POWERPC] Kill sparse warning in HPTE_V_COMPARE()Geert Uytterhoeven
Fixes sparse warning: constant 0xffffffffffffff80 is so big it is unsigned long Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-16lockdep: fix workqueue creation API lockdep interactionJohannes Berg
Dave Young reported warnings from lockdep that the workqueue API can sometimes try to register lockdep classes with the same key but different names. This is not permitted in lockdep. Unfortunately, I was unaware of that restriction when I wrote the code to debug workqueue problems with lockdep and used the workqueue name as the lockdep class name. This can obviously lead to the problem if the workqueue name is dynamic. This patch solves the problem by always using a constant name for the workqueue's lockdep class, namely either the constant name that was passed in or a string consisting of the variable name. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2008-01-15libata: correct handling of TSS DVDAlan Cox
Devices that misreport the validity bit for word 93 look like SATA. If they are on the blacklist then we must not test for SATA but assume 40 wire in the 40 wire case (The TSSCorp reports 80 wire on SATA it seems!) Signed-off-by: Alan Cox <alan@redhat.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2008-01-15libata: pata_platform: make probe and remove functions device type neutralAnton Vorontsov
Split pata_platform_{probe,remove} into two pieces: 1. pata_platform_{probe,remove} -- platform_device-dependant bits; 2. __ptata_platform_{probe,remove} -- device type neutral bits. This is done to not duplicate code for the OF-platform driver. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Olof Johansson <olof@lixom.net>
2008-01-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: x86: fix RTC_AIE with CONFIG_HPET_EMULATE_RTC x86: asm-x86/msr.h: pull in linux/types.h x86: fix boot crash on HIGHMEM4G && SPARSEMEM
2008-01-15Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: [POWERPC] Fix boot failure on POWER6 [POWERPC] Workaround for iommu page alignment
2008-01-15x86: asm-x86/msr.h: pull in linux/types.hMike Frysinger
Since the msr.h header uses types like __u32, it should pull in linux/types.h. [ mingo@elte.hu: affects user-space that includes this header. We dont actually like user-space including raw kernel headers but it's a longstanding practice and it's easy for the kernel to be nice about this. ] Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-15[POWERPC] Fix boot failure on POWER6Paul Mackerras
Commit 473980a99316c0e788bca50996375a2815124ce1 added a call to clear the SLB shadow buffer before registering it. Unfortunately this means that we clear out the entries that slb_initialize has previously set in there. On POWER6, the hypervisor uses the SLB shadow buffer when doing partition switches, and that means that after the next partition switch, each non-boot CPU has no SLB entries to map the kernel text and data, which causes it to crash. This fixes it by reverting most of 473980a9 and instead clearing the 3rd entry explicitly in slb_initialize. This fixes the problem that 473980a9 was trying to solve, but without breaking POWER6. Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-14Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: [MIPS] Cacheops.h: Fix typo. [MIPS] Cobalt: Qube1 has no serial port so don't use it [MIPS] Cobalt: Fix ethernet interrupts for RaQ1 [MIPS] Kconfig fixes for BCM47XX platform
2008-01-14Revert "writeback: introduce writeback_control.more_io to indicate more io"Linus Torvalds
This reverts commit 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b, as requested by Fengguang Wu. It's not quite fully baked yet, and while there are patches around to fix the problems it caused, they should get more testing. Says Fengguang: "I'll resend them both for -mm later on, in a more complete patchset". See http://bugzilla.kernel.org/show_bug.cgi?id=9738 for some of this discussion. Requested-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-15[MIPS] Cacheops.h: Fix typo.Ralf Baechle
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2008-01-14Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6Linus Torvalds
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: i2c-sibyte: Fix an error path i2c: Driver IDs are optional i2c: Spelling fixes i2c-omap: Fix NULL pointer dereferencing
2008-01-14i2c: Driver IDs are optionalJean Delvare
Document the fact that I2C driver IDs are optional. Signed-off-by: Jean Delvare <khali@linux-fr.org>
2008-01-14CRIS: define __ARCH_WANT_SYS_RT_SIGSUSPEND in unistd.h for CRISJesper Nilsson
This allows us to use the commong sys_rt_sigsuspend instead of having our own. Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Mikael Starvik <mikael.starvik@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-13Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* master.kernel.org:/home/rmk/linux-2.6-arm: [ARM] vfp: fix fuitod/fsitod instructions [ARM] pxa: silence warnings from cpu_is_xxx() macros
2008-01-13Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: [POWERPC] Fix CPU hotplug when using the SLB shadow buffer [POWERPC] efika: add phy-handle property for fec_mpc52xx
2008-01-13Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: pnpacpi: print resource shortage message only once PM: ACPI and APM must not be enabled at the same time ACPI: apply quirk_ich6_lpc_acpi to more ICH8 and ICH9 ACPICA: fix acpi_serialize hang regression ACPI : Not register gsi for PCI IDE controller in legacy mode ACPI: Reintroduce run time configurable max_cstate for !CPU_IDLE case ACPI: Make sysfs interface in ACPI power optional. ACPI: EC: Enable boot EC before bus_scan increase PNP_MAX_PORT to 40 from 24
2008-01-13remove task_ppid_nr_nsRoland McGrath
task_ppid_nr_ns is called in three places. One of these should never have called it. In the other two, using it broke the existing semantics. This was presumably accidental. If the function had not been there, it would have been much more obvious to the eye that those patches were changing the behavior. We don't need this function. In task_state, the pid of the ptracer is not the ppid of the ptracer. In do_task_stat, ppid is the tgid of the real_parent, not its pid. I also moved the call outside of lock_task_sighand, since it doesn't need it. In sys_getppid, ppid is the tgid of the real_parent, not its pid. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6.24Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6.24: sh: Force __access_ok() to obey address space limit. sh: Fix argument page dcache flushing regression.
2008-01-11Pull bugzilla-9535 into release branchLen Brown
2008-01-11Pull bugzilla-9194 into release branchLen Brown
2008-01-11PM: ACPI and APM must not be enabled at the same timeLen Brown
ACPI and APM used "pm_active" to guarantee that they would not be simultaneously active. But pm_active was recently moved under CONFIG_PM_LEGACY, so that without CONFIG_PM_LEGACY, pm_active became a NOP -- allowing ACPI and APM to both be simultaneously enabled. This caused unpredictable results, including boot hangs. Further, the code under CONFIG_PM_LEGACY is scheduled for removal. So replace pm_active with pm_flags. pm_flags depends only on CONFIG_PM, which is present for both CONFIG_APM and CONFIG_ACPI. http://bugzilla.kernel.org/show_bug.cgi?id=9194 Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2008-01-11Don't blatt first element of prv in sg_chain()Rusty Russell
I realize that sg chaining is a ploy to make the rest of the kernel devs feel the pain of the SCSI subsystem. But this was a little unsubtle. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-01-11[POWERPC] Fix CPU hotplug when using the SLB shadow bufferMichael Neuling
Before we register the SLB shadow buffer, we need to invalidate the entries in the buffer, otherwise we can end up stale entries from when we previously offlined the CPU. This does this invalidate as well as unregistering the buffer with PHYP before we offline the cpu. Tested and fixes crashes seen on 970MP (thanks to tonyb) and POWER5. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-01-11ACPI: apply quirk_ich6_lpc_acpi to more ICH8 and ICH9Zhao Yakui
It is important that these resources be reserved to avoid conflicts with well known ACPI registers. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2008-01-11sh: Force __access_ok() to obey address space limit.Paul Mundt
When the thread_info->addr_limit changes were introduced, __access_ok() was missed in the conversion, allowing user processes to perform P1/P2 accesses under certain conditions. This has already been corrected with the nommu refactoring in later kernels. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2008-01-10[ARM] pxa: silence warnings from cpu_is_xxx() macrosRussell King
If only a single CPU type is selected, __cpu_is_xxx() doesn't use its argument. This causes the compiler to issue a warning about an unused variable in the parent function. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-01-08[SOCK]: Adds a rcu_dereference() in sk_filterEric Dumazet
It seems commit fda9ef5d679b07c9d9097aaf6ef7f069d794a8f9 introduced a RCU protection for sk_filter(), without a rcu_dereference() Either we need a rcu_dereference(), either a comment should explain why we dont need it. I vote for the former. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08[XFRM]: xfrm_algo_clone() allocates too much memoryEric Dumazet
alg_key_len is the length in bits of the key, not in bytes. Best way to fix this is to move alg_len() function from net/xfrm/xfrm_user.c to include/net/xfrm.h, and to use it in xfrm_algo_clone() alg_len() is renamed to xfrm_alg_len() because of its global exposition. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08[NET]: Clone the sk_buff 'iif' field in __skb_clone()Paul Moore
Both NetLabel and SELinux (other LSMs may grow to use it as well) rely on the 'iif' field to determine the receiving network interface of inbound packets. Unfortunately, at present this field is not preserved across a skb clone operation which can lead to garbage values if the cloned skb is sent back through the network stack. This patch corrects this problem by properly copying the 'iif' field in __skb_clone() and removing the 'iif' field assignment from skb_act_clone() since it is no longer needed. Also, while we are here, put the assignments in the same order as the offsets to reduce cacheline bounces. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08[NET]: Add NAPI_STATE_DISABLE.David S. Miller
Create a bit to signal that a napi_disable() is in progress. This sets up infrastructure such that net_rx_action() can generically break out of the ->poll() loop on a NAPI context that has a pending napi_disable() yet is being bombed with packets (and thus would otherwise poll endlessly and not allow the napi_disable() to finish). Now, what napi_disable() does is first set the NAPI_STATE_DISABLE bit (to indicate that a disable is pending), then it polls for the NAPI_STATE_SCHED bit, and once the NAPI_STATE_SCHED bit is acquired the NAPI_STATE_DISABLE bit is cleared. Here, the test_and_set_bit() provides the necessary memory barrier between the various bitops. napi_schedule_prep() now tests for a pending disable as it's first action and won't try to obtain the NAPI_STATE_SCHED bit if a disable is pending. As a result, we can remove the netif_running() check in netif_rx_schedule_prep() because the NAPI disable pending state serves this purpose. And, it does so in a NAPI centric manner which is what we really want. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08[NET]: Do not grab device reference when scheduling a NAPI poll.David S. Miller
It is pointless, because everything that can make a device go away will do a napi_disable() first. The main impetus behind this is that now we can legally do a NAPI completion in generic code like net_rx_action() which a following changeset needs to do. net_rx_action() can only perform actions in NAPI centric ways, because there may be a one to many mapping between NAPI contexts and network devices (SKY2 is one example). We also want to get rid of this because it's an extra atomic in the NAPI paths, and also because it is one of the last instances where the NAPI interfaces care about net devices. The one remaining netdev detail the NAPI stuff cares about is the netif_running() check which will be killed off in a subsequent changeset. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08[SCTP]: Fix the name of the authentication event.Vlad Yasevich
The even should be called SCTP_AUTHENTICATION_INDICATION. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-08pl2303: Fix mode switching regressionAlan Cox
Cleaning out all the incorrect 'no change made' checks for termios settings showed up a problem with the PL2303. The hardware here seems to lose sync and bits if you tell it to make no changes. This shows up with a real world application. To fix this the driver check for meaningful hardware changes is restored but doing the tests correctly and as a tty layer function so it doesn't get duplicated wrongly everywhere if other drivers turn out to need it. Signed-off-by: Alan Cox <alan@redhat.com> Tested-by: Mirko Parthey <mirko.parthey@informatik.tu-chemnitz.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-08KEYS: fix macroSebastian Siewior
Commit 664cceb0093b755739e56572b836a99104ee8a75 changed the parameters of the function make_key_ref(). The macros that are used in case CONFIG_KEY is not defined did not change. Cc: David Howells <dhowells@redhat.com> Signed-off-by: Sebastian Siewior <sebastian@breakpoint.cc> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-07sh: Fix argument page dcache flushing regression.Carmelo Amoroso
In the do_execve() path, argument page handling used to explicitly call flush_dcache_page() for each page, this has since been reworked and uses flush_kernel_dcache_page() instead, which is presently a nop. Doing a simple modprobe/rmmod in a loop under busybox consistently manages to crash without providing a sane flush_kernel_dcache_page() implementation, so, plug in a simple implementation. Signed-off-by: Carmelo Amoroso <carmelo73@gmail.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2008-01-06CPU hotplug: fix cpu_is_offline() on !CONFIG_HOTPLUG_CPUIngo Molnar
make randconfig bootup testing found that the cpufreq code crashes on bootup, if the powernow-k8 driver is enabled and if maxcpus=1 passed on the boot line to a !CONFIG_HOTPLUG_CPU kernel. First lockdep found out that there's an inconsistent unlock sequence: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at: [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42 but there are no more locks to release! Call Trace: [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42 [<ffffffff80251c29>] print_unlock_inbalance_bug+0x104/0x12c [<ffffffff80252f3a>] mark_held_locks+0x56/0x94 [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42 [<ffffffff807008b6>] cpufreq_add_dev+0x2a8/0x5c4 ... then shortly afterwards the cpufreq code crashed on an assert: ------------[ cut here ]------------ kernel BUG at drivers/cpufreq/cpufreq.c:1068! invalid opcode: 0000 [1] SMP [...] Call Trace: [<ffffffff805145d6>] sysdev_driver_unregister+0x5b/0x91 [<ffffffff806ff520>] cpufreq_register_driver+0x15d/0x1a2 [<ffffffff80cc0596>] powernowk8_init+0x86/0x94 [...] ---[ end trace 1e9219be2b4431de ]--- the bug was caused by maxcpus=1 bootup, which brought up the secondary core as !cpu_online() but !cpu_is_offline() either, which on on !CONFIG_HOTPLUG_CPU is always 0 (include/linux/cpu.h): /* CPUs don't go offline once they're online w/o CONFIG_HOTPLUG_CPU */ static inline int cpu_is_offline(int cpu) { return 0; } but the cpufreq code uses cpu_online() and cpu_is_offline() in a mixed way - the low-level drivers use cpu_online(), while the cpufreq core uses cpu_is_offline(). This opened up the possibility to add the non-initialized sysdev device of the secondary core: cpufreq-core: trying to register driver powernow-k8 cpufreq-core: adding CPU 0 powernow-k8: BIOS error - no PSB or ACPI _PSS objects cpufreq-core: initialization failed cpufreq-core: adding CPU 1 cpufreq-core: initialization failed which then blew up. The fix is to make cpu_is_offline() always the negation of cpu_online(). With that fix applied the kernel boots up fine without crashing: Calling initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94() powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00) powernow-k8: BIOS error - no PSB or ACPI _PSS objects initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94() returned -19. initcall 0xffffffff80cc0510 ran for 19 msecs: powernowk8_init+0x0/0x94() Calling initcall 0xffffffff80cc328f: init_lapic_nmi_sysfs+0x0/0x39() We could fix this by making CPU enumeration aware of max_cpus, but that would be more fragile IMO, and the cpu_online(cpu) != cpu_is_offline(cpu) possibility was quite confusing and a continuous source of bugs too. Most distributions have kernels with CPU hotplug enabled, so this bug remained hidden for a long time. Bug forensics: The broken cpu_is_offline() API variant was introduced via: commit a59d2e4e6977e7b94e003c96a41f07e96cddc340 Author: Rusty Russell <rusty@rustcorp.com.au> Date: Mon Mar 8 06:06:03 2004 -0800 [PATCH] minor cleanups for hotplug CPUs ( this predates linux-2.6.git, this commit is available from Thomas's historic git tree. ) Then 1.5 years later the cpufreq code made use of it: commit c32b6b8e524d2c337767d312814484d9289550cf Author: Ashok Raj <ashok.raj@intel.com> Date: Sun Oct 30 14:59:54 2005 -0800 [PATCH] create and destroy cpufreq sysfs entries based on cpu notifiers + if (cpu_is_offline(cpu)) + return 0; which is a correct use of the subtly broken new API. v2.6.15 then shipped with this bug included. then it took two more years for random-kernel qa to hit it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>