aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2006-06-29[PATCH] genirq: add irq-chip supportThomas Gleixner
Enable platforms to use the irq-chip and irq-flow abstractions: allow setting of the chip, the type and provide highlevel handlers for common irq-flows. [rostedt@goodmis.org: misroute-irq: Don't call desc->chip->end because of edge interrupts] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq MSI fixesIngo Molnar
This is a fixed up and cleaned up replacement for genirq-msi-fixes.patch, which should solve the i386 4KSTACKS problem. I also added Ben's idea of pushing the __do_IRQ() check into generic_handle_irq(). I booted this with MSI enabled, but i only have MSI devices, not MSI-X devices. I'd still expect MSI-X to work now. irqchip migration helper: call __do_IRQ() if a descriptor is attached to an irqtype-style controller. This also fixes MSI-X IRQ handling on i386 and x86_64. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: coreThomas Gleixner
Core genirq support: add the irq-chip and irq-flow abstractions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add IRQ_NOAUTOEN supportThomas Gleixner
Enable platforms to disable the automatic enabling of freshly set up irqs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add IRQ_NOREQUEST supportThomas Gleixner
Enable platforms to disable request_irq() for certain interrupts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add IRQ_NOPROBE supportThomas Gleixner
Introduce IRQ_NOPROBE: enables platforms to control chip-probing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add genirq sw IRQ-retriggerThomas Gleixner
Enable platforms that do not have a hardware-assisted hardirq-resend mechanism to resend them via a softirq-driven IRQ emulation mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: doc: comment include/linux/irq.h structuresIngo Molnar
Better document the hw_interrupt_type and irq_desc structures. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add ->retrigger() irq op to consolidate hw_irq_resend()Ingo Molnar
Add ->retrigger() irq op to consolidate hw_irq_resend() implementations. (Most architectures had it defined to NOP anyway.) NOTE: ia64 needs testing. i386 and x86_64 tested. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: turn ARCH_HAS_IRQ_PER_CPU into CONFIG_IRQ_PER_CPUIngo Molnar
Cleanup: change ARCH_HAS_IRQ_PER_CPU into a Kconfig method. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: merge pending_irq_cpumask[] into irq_desc[]Ingo Molnar
Consolidation: remove the pending_irq_cpumask[NR_IRQS] array and move it into the irq_desc[NR_IRQS].pending_mask field. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: merge irq_dir[], smp_affinity_entry[] into irq_desc[]Ingo Molnar
Consolidation: remove the irq_dir[NR_IRQS] and the smp_affinity_entry[NR_IRQS] arrays and move them into the irq_desc[] array. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: include/linux/irq.hIngo Molnar
Small cleanups in include/linux/irq.h. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: reduce irq_desc_t use, mark it obsoleteIngo Molnar
Cleanup: remove irq_desc_t use from the generic IRQ code, and mark it obsolete. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: misc code cleanupsIngo Molnar
Assorted code cleanups to the generic IRQ code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: remove fastcallIngo Molnar
Now that i386 defaults to regparm, explicit uses of fastcall are not needed anymore. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: remove irq_descp()Ingo Molnar
Cleanup: remove irq_descp() - explicit use of irq_desc[] is shorter and more readable. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: merge irq_affinity[] into irq_desc[]Ingo Molnar
Consolidation: remove the irq_affinity[NR_IRQS] array and move it into the irq_desc[NR_IRQS].affinity field. [akpm@osdl.org: sparc64 build fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: rename desc->handler to desc->chipIngo Molnar
This patch-queue improves the generic IRQ layer to be truly generic, by adding various abstractions and features to it, without impacting existing functionality. While the queue can be best described as "fix and improve everything in the generic IRQ layer that we could think of", and thus it consists of many smaller features and lots of cleanups, the one feature that stands out most is the new 'irq chip' abstraction. The irq-chip abstraction is about describing and coding and IRQ controller driver by mapping its raw hardware capabilities [and quirks, if needed] in a straightforward way, without having to think about "IRQ flow" (level/edge/etc.) type of details. This stands in contrast with the current 'irq-type' model of genirq architectures, which 'mixes' raw hardware capabilities with 'flow' details. The patchset supports both types of irq controller designs at once, and converts i386 and x86_64 to the new irq-chip design. As a bonus side-effect of the irq-chip approach, chained interrupt controllers (master/slave PIC constructs, etc.) are now supported by design as well. The end result of this patchset intends to be simpler architecture-level code and more consolidation between architectures. We reused many bits of code and many concepts from Russell King's ARM IRQ layer, the merging of which was one of the motivations for this patchset. This patch: rename desc->handler to desc->chip. Originally i did not want to do this, because it's a big patch. But having both "desc->handler", "desc->handle_irq" and "action->handler" caused a large degree of confusion and made the code appear alot less clean than it truly is. I have also attempted a dual approach as well by introducing a desc->chip alias - but that just wasnt robust enough and broke frequently. So lets get over with this quickly. The conversion was done automatically via scripts and converts all the code in the kernel. This renaming patch is the first one amongst the patches, so that the remaining patches can stay flexible and can be merged and split up without having some big monolithic patch act as a merge barrier. [akpm@osdl.org: build fix] [akpm@osdl.org: another build fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] i4l: remove unneeded include/linux/isdn/tpam.hKarsten Keil
The TPAM isdn driver was removed in 2.6.12, but include/linux/isdn/tpam.h was missed. Signed-off-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] Keys: Allow in-kernel key requestor to pass auxiliary data to upcallerDavid Howells
The proposed NFS key type uses its own method of passing key requests to userspace (upcalling) rather than invoking /sbin/request-key. This is because the responsible userspace daemon should already be running and will be contacted through rpc_pipefs. This patch permits the NFS filesystem to pass auxiliary data to the upcall operation (struct key_type::request_key) so that the upcaller can use a pre-existing communications channel more easily. Signed-off-by: David Howells <dhowells@redhat.com> Acked-By: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28Merge master.kernel.org:/pub/scm/linux/kernel/git/wim/linux-2.6-watchdogLinus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog: [WATCHDOG] Documentation/watchdog update [WATCHDOG] convert AT91RM9200 watchdog to platform driver [WATCHDOG] add WDIOC_GETTIMELEFT ioctl [WATCHDOG] Pre-Timeout flags
2006-06-28[PATCH] Fix plist include dependencyThomas Gleixner
plist.h uses container_of, which is defined in kernel.h. Include kernel.h in plist.h as the kernel.h include does not longer happen automatically on all architectures. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[PATCH] SPI: infrastructure to initialize spi_device.mode earlyDavid Brownell
This patch adds earlier initialization of spi_device.mode, as needed on boards using nondefault chipselect polarity. An example would be ones using the RS5C348 RTC without an external signal inverter between the RTC chipselect and the SPI controller. Without this mechanism, the first setup() call for that chip would wrongly enable chips, corrupting transfers to/from other chips sharing that SPI bus. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[PATCH] ide: fix error handling for drives which clear the FIFO on errorAlan Cox
If the controller FIFO cleared automatically on error we must not try and drain it as this will hang some chips. Based in concept on a broken patch from -mm some while back Signed-off-by: Alan Cox <alan@redhat.com> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[PATCH] ac97_codec: make bitfield unsignedRandy Dunlap
Make a 1-bit bitfield unsigned (no space for sign bit). Removes 24 sparse warnings from this one file: include/linux/ac97_codec.h:262:13: error: dubious one-bit signed bitfield Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[PATCH] remove active field from tty buffer structurePaul Fulghum
Remove 'active' field from tty buffer structure. This was added in 2.6.16 as part of a patch to make the new tty buffering SMP safe. This field is unnecessary with the more intelligently written flush_to_ldisc that adds receive_room handling. Removing this field reverts to simpler logic where the tail buffer is always the 'active' buffer, which should not be freed by flush_to_ldisc. (active == buffer being filled with new data) The result is simpler, smaller, and faster tty buffer code. Signed-off-by: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[PATCH] remove TTY_DONT_FLIPPaul Fulghum
Remove TTY_DONT_FLIP tty flag. This flag was introduced in 2.1.X kernels to prevent the N_TTY line discipline functions read_chan() and n_tty_receive_buf() from running at the same time. 2.2.15 introduced tty->read_lock to protect access to the N_TTY read buffer, which is the only state requiring protection between these two functions. The current TTY_DONT_FLIP implementation is broken for SMP, and is not universally honored by drivers that send data directly to the line discipline receive_buf function. Because TTY_DONT_FLIP is not necessary, is broken in implementation, and is not universally honored, it is removed. Signed-off-by: Paul Fulghum <paulkf@microgate.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[PATCH] Add EXPORT_UNUSED_SYMBOL and EXPORT_UNUSED_SYMBOL_GPLArjan van de Ven
Temporarily add EXPORT_UNUSED_SYMBOL and EXPORT_UNUSED_SYMBOL_GPL. These will be used as a transition measure for symbols that aren't used in the kernel and are on the way out. When a module uses such a symbol, a warning is printk'd at modprobe time. The main reason for removing unused exports is size: eacho export takes roughly between 100 and 150 bytes of kernel space in the binary. This patch gives users the option to immediately get this size gain via a config option. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[PATCH] mark address_space_operations constChristoph Hellwig
Same as with already do with the file operations: keep them in .rodata and prevents people from doing runtime patching. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27Merge git://git.infradead.org/mtd-2.6Linus Torvalds
* git://git.infradead.org/mtd-2.6: [MTD] NAND: Select chip before checking write protect status [MTD] CORE mtdchar.c: fix off-by-one error in lseek() [MTD] NAND: Fix typo in mtd/nand/ts7250.c [JFFS2][XATTR] coexistence between xattr and write buffering support. [JFFS2][XATTR] Fix wrong copyright [JFFS2][XATTR] Re-define xd->refcnt as atomic_t [JFFS2][XATTR] Fix memory leak with jffs2_xattr_ref [JFFS2][XATTR] rid unnecessary writing of delete marker. [JFFS2][XATTR] Fix ACL bug when updating null xattr by null ACL. [JFFS2][XATTR] using 'delete marker' for xdatum/xref deletion [MTD] Fix off-by-one error in physmap.c [MTD] Remove unused 'nr_banks' variable from ixp2000 map driver [MTD NAND] s3c2412 support in s3c2410.c [MTD] Initialize 'writesize' [MTD] NAND: ndfc fix address offset thinko [MTD] NAND: S3C2410 convert prinks to dev_*()s [MTD] NAND: Missing fixups
2006-06-27Merge branch 'upstream-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: [PATCH] ata_piix: add ICH6/7/8 to Kconfig [PATCH] sata_sil: disable hotplug interrupts on two ATI IXPs [PATCH] libata: cosmetic updates [PATCH] ata: add some NVIDIA chipset IDs [PATCH] libata reduce timeouts [PATCH] libata: implement ata_port_max_devices() [PATCH] libata: make two functions global [PATCH] libata: update ata_do_simple_cmd() [PATCH] libata: move ata_do_simple_cmd() below ata_exec_internal() [PATCH] libata: clear EH action on device detach [PATCH] libata: implement and use ata_deh_dev_action() [PATCH] libata: move ata_eh_clear_action() upward [PATCH] libata.h needs scatterlist.h [libata] sata_vsc: partially revert a PCI ID-related commit [libata] Bump versions
2006-06-27[PATCH] drivers/char/ipmi/ipmi_msghandler.c: make proc_ipmi_root staticAdrian Bunk
Make struct proc_ipmi_root static. Besides this, tremove removes an unused #ifdef CONFIG_PROC_FS from include/linux/ipmi.h. Acked-by: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] rtmutex: Propagate priority settings into PI lock chainsThomas Gleixner
When the priority of a task, which is blocked on a lock, changes we must propagate this change into the PI lock chain. Therefor the chain walk code is changed to get rid of the references to current to avoid false positives in the deadlock detector, as setscheduler might be called by a task which holds the lock on which the task whose priority is changed is blocked. Also add some comments about the get/put_task_struct usage to avoid confusion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: futex_lock_pi/futex_unlock_pi supportIngo Molnar
This adds the actual pi-futex implementation, based on rt-mutexes. [dino@in.ibm.com: fix an oops-causing race] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex testerThomas Gleixner
RT-mutex tester: scriptable tester for rt mutexes, which allows userspace scripting of mutex unit-tests (and dynamic tests as well), using the actual rt-mutex implementation of the kernel. [akpm@osdl.org: fixlet] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex debugIngo Molnar
Runtime debugging functionality for rt-mutexes. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex coreIngo Molnar
Core functions for the rt-mutex subsystem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: scheduler support for piIngo Molnar
Add framework to boost/unboost the priority of RT tasks. This consists of: - caching the 'normal' priority in ->normal_prio - providing a functions to set/get the priority of the task - make sched_setscheduler() aware of boosting The effective_prio() cleanups also fix a priority-calculation bug pointed out by Andrey Gelman, in set_user_nice(). has_rt_policy() fix: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrey Gelman <agelman@012.net.il> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: add plist implementationIngo Molnar
Add the priority-sorted list (plist) implementation. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: introduce debug_check_no_locks_freed()Ingo Molnar
Add debug_check_no_locks_freed(), as a central inline to add bad-lock-free-debugging functionality to. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: futex code cleanupsIngo Molnar
We are pleased to announce "lightweight userspace priority inheritance" (PI) support for futexes. The following patchset and glibc patch implements it, ontop of the robust-futexes patchset which is included in 2.6.16-mm1. We are calling it lightweight for 3 reasons: - in the user-space fastpath a PI-enabled futex involves no kernel work (or any other PI complexity) at all. No registration, no extra kernel calls - just pure fast atomic ops in userspace. - in the slowpath (in the lock-contention case), the system call and scheduling pattern is in fact better than that of normal futexes, due to the 'integrated' nature of FUTEX_LOCK_PI. [more about that further down] - the in-kernel PI implementation is streamlined around the mutex abstraction, with strict rules that keep the implementation relatively simple: only a single owner may own a lock (i.e. no read-write lock support), only the owner may unlock a lock, no recursive locking, etc. Priority Inheritance - why, oh why??? ------------------------------------- Many of you heard the horror stories about the evil PI code circling Linux for years, which makes no real sense at all and is only used by buggy applications and which has horrible overhead. Some of you have dreaded this very moment, when someone actually submits working PI code ;-) So why would we like to see PI support for futexes? We'd like to see it done purely for technological reasons. We dont think it's a buggy concept, we think it's useful functionality to offer to applications, which functionality cannot be achieved in other ways. We also think it's the right thing to do, and we think we've got the right arguments and the right numbers to prove that. We also believe that we can address all the counter-arguments as well. For these reasons (and the reasons outlined below) we are submitting this patch-set for upstream kernel inclusion. What are the benefits of PI? The short reply: ---------------- User-space PI helps achieving/improving determinism for user-space applications. In the best-case, it can help achieve determinism and well-bound latencies. Even in the worst-case, PI will improve the statistical distribution of locking related application delays. The longer reply: ----------------- Firstly, sharing locks between multiple tasks is a common programming technique that often cannot be replaced with lockless algorithms. As we can see it in the kernel [which is a quite complex program in itself], lockless structures are rather the exception than the norm - the current ratio of lockless vs. locky code for shared data structures is somewhere between 1:10 and 1:100. Lockless is hard, and the complexity of lockless algorithms often endangers to ability to do robust reviews of said code. I.e. critical RT apps often choose lock structures to protect critical data structures, instead of lockless algorithms. Furthermore, there are cases (like shared hardware, or other resource limits) where lockless access is mathematically impossible. Media players (such as Jack) are an example of reasonable application design with multiple tasks (with multiple priority levels) sharing short-held locks: for example, a highprio audio playback thread is combined with medium-prio construct-audio-data threads and low-prio display-colory-stuff threads. Add video and decoding to the mix and we've got even more priority levels. So once we accept that synchronization objects (locks) are an unavoidable fact of life, and once we accept that multi-task userspace apps have a very fair expectation of being able to use locks, we've got to think about how to offer the option of a deterministic locking implementation to user-space. Most of the technical counter-arguments against doing priority inheritance only apply to kernel-space locks. But user-space locks are different, there we cannot disable interrupts or make the task non-preemptible in a critical section, so the 'use spinlocks' argument does not apply (user-space spinlocks have the same priority inversion problems as other user-space locking constructs). Fact is, pretty much the only technique that currently enables good determinism for userspace locks (such as futex-based pthread mutexes) is priority inheritance: Currently (without PI), if a high-prio and a low-prio task shares a lock [this is a quite common scenario for most non-trivial RT applications], even if all critical sections are coded carefully to be deterministic (i.e. all critical sections are short in duration and only execute a limited number of instructions), the kernel cannot guarantee any deterministic execution of the high-prio task: any medium-priority task could preempt the low-prio task while it holds the shared lock and executes the critical section, and could delay it indefinitely. Implementation: --------------- As mentioned before, the userspace fastpath of PI-enabled pthread mutexes involves no kernel work at all - they behave quite similarly to normal futex-based locks: a 0 value means unlocked, and a value==TID means locked. (This is the same method as used by list-based robust futexes.) Userspace uses atomic ops to lock/unlock these mutexes without entering the kernel. To handle the slowpath, we have added two new futex ops: FUTEX_LOCK_PI FUTEX_UNLOCK_PI If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to TID fails], then FUTEX_LOCK_PI is called. The kernel does all the remaining work: if there is no futex-queue attached to the futex address yet then the code looks up the task that owns the futex [it has put its own TID into the futex value], and attaches a 'PI state' structure to the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware, kernel-based synchronization object. The 'other' task is made the owner of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the futex value. Then this task tries to lock the rt-mutex, on which it blocks. Once it returns, it has the mutex acquired, and it sets the futex value to its own TID and returns. Userspace has no other work to perform - it now owns the lock, and futex value contains FUTEX_WAITERS|TID. If the unlock side fastpath succeeds, [i.e. userspace manages to do a TID -> 0 atomic transition of the futex value], then no kernel work is triggered. If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the behalf of userspace - and it also unlocks the attached pi_state->rt_mutex and thus wakes up any potential waiters. Note that under this approach, contrary to other PI-futex approaches, there is no prior 'registration' of a PI-futex. [which is not quite possible anyway, due to existing ABI properties of pthread mutexes.] Also, under this scheme, 'robustness' and 'PI' are two orthogonal properties of futexes, and all four combinations are possible: futex, robust-futex, PI-futex, robust+PI-futex. glibc support: -------------- Ulrich Drepper and Jakub Jelinek have written glibc support for PI-futexes (and robust futexes), enabling robust and PI (PTHREAD_PRIO_INHERIT) POSIX mutexes. (PTHREAD_PRIO_PROTECT support will be added later on too, no additional kernel changes are needed for that). [NOTE: The glibc patch is obviously inofficial and unsupported without matching upstream kernel functionality.] the patch-queue and the glibc patch can also be downloaded from: http://redhat.com/~mingo/PI-futex-patches/ Many thanks go to the people who helped us create this kernel feature: Steven Rostedt, Esben Nielsen, Benedikt Spranger, Daniel Walker, John Cooper, Arjan van de Ven, Oleg Nesterov and others. Credits for related prior projects goes to Dirk Grambow, Inaky Perez-Gonzalez, Bill Huey and many others. Clean up the futex code, before adding more features to it: - use u32 as the futex field type - that's the ABI - use __user and pointers to u32 instead of unsigned long - code style / comment style cleanups - rename hash-bucket name from 'bh' to 'hb'. I checked the pre and post futex.o object files to make sure this patch has no code effects. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] sched: mc/smt power savings sched policySiddha, Suresh B
sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in /sys/devices/system/cpu/ control the MC/SMT power savings policy for the scheduler. Based on the values (1-enable, 0-disable) for these controls, sched groups cpu power will be determined for different domains. When power savings policy is enabled and under light load conditions, scheduler will minimize the physical packages/cpu cores carrying the load and thus conserving power(with a perf impact based on the workload characteristics... see OLS 2005 CMP kernel scheduler paper for more details..) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Con Kolivas <kernel@kolivas.org> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] sched_domain: handle kmalloc failureSrivatsa Vaddagiri
Try to handle mem allocation failures in build_sched_domains by bailing out and cleaning up thus-far allocated memory. The patch has a direct consequence that we disable load balancing completely (even at sibling level) upon *any* memory allocation failure. [Lee.Schermerhorn@hp.com: bugfix] Signed-off-by: Srivatsa Vaddagir <vatsa@in.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] sched: implement smpnicePeter Williams
Problem: The introduction of separate run queues per CPU has brought with it "nice" enforcement problems that are best described by a simple example. For the sake of argument suppose that on a single CPU machine with a nice==19 hard spinner and a nice==0 hard spinner running that the nice==0 task gets 95% of the CPU and the nice==19 task gets 5% of the CPU. Now suppose that there is a system with 2 CPUs and 2 nice==19 hard spinners and 2 nice==0 hard spinners running. The user of this system would be entitled to expect that the nice==0 tasks each get 95% of a CPU and the nice==19 tasks only get 5% each. However, whether this expectation is met is pretty much down to luck as there are four equally likely distributions of the tasks to the CPUs that the load balancing code will consider to be balanced with loads of 2.0 for each CPU. Two of these distributions involve one nice==0 and one nice==19 task per CPU and in these circumstances the users expectations will be met. The other two distributions both involve both nice==0 tasks being on one CPU and both nice==19 being on the other CPU and each task will get 50% of a CPU and the user's expectations will not be met. Solution: The solution to this problem that is implemented in the attached patch is to use weighted loads when determining if the system is balanced and, when an imbalance is detected, to move an amount of weighted load between run queues (as opposed to a number of tasks) to restore the balance. Once again, the easiest way to explain why both of these measures are necessary is to use a simple example. Suppose that (in a slight variation of the above example) that we have a two CPU system with 4 nice==0 and 4 nice=19 hard spinning tasks running and that the 4 nice==0 tasks are on one CPU and the 4 nice==19 tasks are on the other CPU. The weighted loads for the two CPUs would be 4.0 and 0.2 respectively and the load balancing code would move 2 tasks resulting in one CPU with a load of 2.0 and the other with load of 2.2. If this was considered to be a big enough imbalance to justify moving a task and that task was moved using the current move_tasks() then it would move the highest priority task that it found and this would result in one CPU with a load of 3.0 and the other with a load of 1.2 which would result in the movement of a task in the opposite direction and so on -- infinite loop. If, on the other hand, an amount of load to be moved is calculated from the imbalance (in this case 0.1) and move_tasks() skips tasks until it find ones whose contributions to the weighted load are less than this amount it would move two of the nice==19 tasks resulting in a system with 2 nice==0 and 2 nice=19 on each CPU with loads of 2.1 for each CPU. One of the advantages of this mechanism is that on a system where all tasks have nice==0 the load balancing calculations would be mathematically identical to the current load balancing code. Notes: struct task_struct: has a new field load_weight which (in a trade off of space for speed) stores the contribution that this task makes to a CPU's weighted load when it is runnable. struct runqueue: has a new field raw_weighted_load which is the sum of the load_weight values for the currently runnable tasks on this run queue. This field always needs to be updated when nr_running is updated so two new inline functions inc_nr_running() and dec_nr_running() have been created to make sure that this happens. This also offers a convenient way to optimize away this part of the smpnice mechanism when CONFIG_SMP is not defined. int try_to_wake_up(): in this function the value SCHED_LOAD_BALANCE is used to represent the load contribution of a single task in various calculations in the code that decides which CPU to put the waking task on. While this would be a valid on a system where the nice values for the runnable tasks were distributed evenly around zero it will lead to anomalous load balancing if the distribution is skewed in either direction. To overcome this problem SCHED_LOAD_SCALE has been replaced by the load_weight for the relevant task or by the average load_weight per task for the queue in question (as appropriate). int move_tasks(): The modifications to this function were complicated by the fact that active_load_balance() uses it to move exactly one task without checking whether an imbalance actually exists. This precluded the simple overloading of max_nr_move with max_load_move and necessitated the addition of the latter as an extra argument to the function. The internal implementation is then modified to move up to max_nr_move tasks and max_load_move of weighted load. This slightly complicates the code where move_tasks() is called and if ever active_load_balance() is changed to not use move_tasks() the implementation of move_tasks() should be simplified accordingly. struct sched_group *find_busiest_group(): Similar to try_to_wake_up(), there are places in this function where SCHED_LOAD_SCALE is used to represent the load contribution of a single task and the same issues are created. A similar solution is adopted except that it is now the average per task contribution to a group's load (as opposed to a run queue) that is required. As this value is not directly available from the group it is calculated on the fly as the queues in the groups are visited when determining the busiest group. A key change to this function is that it is no longer to scale down *imbalance on exit as move_tasks() uses the load in its scaled form. void set_user_nice(): has been modified to update the task's load_weight field when it's nice value and also to ensure that its run queue's raw_weighted_load field is updated if it was runnable. From: "Siddha, Suresh B" <suresh.b.siddha@intel.com> With smpnice, sched groups with highest priority tasks can mask the imbalance between the other sched groups with in the same domain. This patch fixes some of the listed down scenarios by not considering the sched groups which are lightly loaded. a) on a simple 4-way MP system, if we have one high priority and 4 normal priority tasks, with smpnice we would like to see the high priority task scheduled on one cpu, two other cpus getting one normal task each and the fourth cpu getting the remaining two normal tasks. but with current smpnice extra normal priority task keeps jumping from one cpu to another cpu having the normal priority task. This is because of the busiest_has_loaded_cpus, nr_loaded_cpus logic.. We are not including the cpu with high priority task in max_load calculations but including that in total and avg_load calcuations.. leading to max_load < avg_load and load balance between cpus running normal priority tasks(2 Vs 1) will always show imbalanace as one normal priority and the extra normal priority task will keep moving from one cpu to another cpu having normal priority task.. b) 4-way system with HT (8 logical processors). Package-P0 T0 has a highest priority task, T1 is idle. Package-P1 Both T0 and T1 have 1 normal priority task each.. P2 and P3 are idle. With this patch, one of the normal priority tasks on P1 will be moved to P2 or P3.. c) With the current weighted smp nice calculations, it doesn't always make sense to look at the highest weighted runqueue in the busy group.. Consider a load balance scenario on a DP with HT system, with Package-0 containing one high priority and one low priority, Package-1 containing one low priority(with other thread being idle).. Package-1 thinks that it need to take the low priority thread from Package-0. And find_busiest_queue() returns the cpu thread with highest priority task.. And ultimately(with help of active load balance) we move high priority task to Package-1. And same continues with Package-0 now, moving high priority task from package-1 to package-0.. Even without the presence of active load balance, load balance will fail to balance the above scenario.. Fix find_busiest_queue to use "imbalance" when it is lightly loaded. [kernel@kolivas.org: sched: store weighted load on up] [kernel@kolivas.org: sched: add discrete weighted cpu load function] [suresh.b.siddha@intel.com: sched: remove dead code] Signed-off-by: Peter Williams <pwil3058@bigpond.com.au> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Con Kolivas <kernel@kolivas.org> Cc: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] chardev: GPIO for SCx200 & PC-8736x: use dev_dbg in common moduleJim Cromie
Use of dev_dbg() and friends is considered good practice. dev_dbg() needs a struct device *devp, but nsc_gpio is only a helper module, so it doesnt have/need its own. To provide devp to the user-modules (scx200 & pc8736x _gpio), we add it to the vtable, and set it during init. Also squeeze nsc_gpio_dump()'s format a little. [ 199.259879] pc8736x_gpio.0: io09: 0x0044 TS OD PUE EDGE LO DEBOUNCE Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] chardev: GPIO for SCx200 & PC-8736x: migrate file-ops to common moduleJim Cromie
Now that the read(), write() file-ops are dispatching gpio-ops via the vtable, they are generic, and can be moved 'verbatim' to the nsc_gpio common-support module. After the move, various symbols are renamed to update 'scx200_' to 'nsc_', and headers are adjusted accordingly. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] chardev: GPIO for SCx200 & PC-8736x: add gpio-ops vtableJim Cromie
Abstract the gpio operations into a new nsc_gpio_ops vtable. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] chardev: GPIO for SCx200 & PC-8736x: device minor numbers are ↵Jim Cromie
unsigned ints Per kernel headers, device minor numbers are unsigned ints. Do the same in this driver. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] chardev: GPIO for SCx200 & PC-8736x: whitespace pre-cleanJim Cromie
GPIO SUPPORT FOR SCx200 & PC8736x The patch-set reworks the 2.4 vintage scx200_gpio driver for modern 2.6, and refactors GPIO support to reuse it in a new driver for the GPIO on PC-8736x chips. Its handy for the Soekris.com net-4801, which has both chips. These patches have been seen recently on Kernel-Mentors, and then Kernel-Newbies ML, where Jesper Juhl kindly reviewed it. His feedback has been incorporated. Thanks Jesper ! Its also gone to soekris-tech@soekris.com for possible testing by linux folks, I've gotten 1 promise so far. Theyre mostly BSD folk over there, but we'll see.. Device-file & Sysfs The driver preserves the existing device-file interface, including the write/cmd set, but adds v to 'view' the pin-settings & configs by inducing, via gpio_dump(), a dev_info() call. Its a fairly crappy way to get status, but it sticks to the syslog approach, conservatively. Allowing users to voluntarily trigger logging is good, it gives them a familiar way to confirm their app's control & use of the pins, and I've thus reduced the pin-mode-updates from dev_info to dev_dbg. I've recently bolted on a proto sysfs interface for both new drivers. Im not including those patches here; they (the patch + doc-pre-patch) are still quite raw (and unreviewed on KNML), and since they 'invent' a convention for GPIO, a proper vetting is needed. Since this patchset is much bigger than my previous ones, Id like to keep things simpler, and address it 1st, before bolting on more stuff. The driver-split The Geode CPU and the PC-87366 Super-IO chip have GPIO units which share a common pin-architecture (same pin features, with same bits controlling), but with different addressing mechanics and port organizations. The vintage driver expresses the pin capabilities with pin-mode commands [OoPpTt],etc that change the pin configurations, and since the 2 chips share pin-arch, we can reuse the read(), write() commands, once the implementation is suitably adjusted. The patchset adds a vtable: struct nsc_gpio_ops, to abstract the existing gpio operations, then adjusts fileops.write() code to invoke operations via that vtable. Driver specific open()s set private_data to the vtable so its available for use by write(). The vtable gets the gpio_dump() too, since its user-friendly, and (could be construed as) part of the current device-file interface. To support use of dev_dbg() in write() & _dump(), the vtable gets a dev ptr too, set by both scx200 & pc8736x _gpio drivers. heres how the pins are presented in syslog: [ 1890.176223] scx200_gpio.0: io00: 0x0044 TS OD PUE EDGE LO DEBOUNCE [ 1890.287223] scx200_gpio.0: io01: 0x0003 OE PP PUD EDGE LO nsc_gpio.c: new file is new home of several file-ops methods, which are modified to get their vtable from filp->private_data, and use it where needed. scx200_gpio.c: keeps some of its existing gpio routines, but now wires them up via the vtable (they're invoked by nsc_gpio.c:nsc_gpio_write() thru this vtable). A driver-spcific open() initializes filp->private_data with the vtable. Once the split is clean, and the scx200_gpio driver is working, we copy and modify the function and variable names, and rework the access-method bodies for the different addressing scheme. Heres a working overview of the patchset: # series file for GPIO # Spring Cleaning gpio-scx/patch.preclean # scripts/Lindent fixes, editor-ctrl comments # API Modernization gpio-scx/patch.api26 # what I learned from LDD3 gpio-scx/patch.platform-dev-2 # get pdev, support for dev_dbg() gpio-scx/patch.unsigned-minor # fix to match std practice # Debuggability gpio-scx/patch.dump-diet # shrink gpio_dump() gpio-scx/patch.viewpins # add new 'command' to call dump() gpio-scx/patch.init-refactor # pull shadow-register init to sub # Access-Abstraction (add vtable) gpio-scx/patch.access-vtable # introduce nsg_gpio_ops vtable, w dump gpio-scx/patch.vtable-calls # add & use the vtable in scx200_gpio gpio-scx/patch.nscgpio-shell # add empty driver for common-fops # move code under abstraction gpio-scx/patch.migrate-fops # move file-ops methods from scx200_gpio gpio-scx/patch.common-dump # mv scx200.c:scx200_gpio_dump() to nsc_gpio.c gpio-scx/patch.add-pc8736x-gpio # add new driver, like old, w chip adapt # gpio-scx/patch.add-DEBUG # enable all dev_dbg()s # Cleanups # finish printk -> dev_dbg() etc gpio-scx/patch.pdev-pc8736x # new drvr needs pdev too, gpio-scx/patch.devdbg-nscgpio # add device to 'vtable', use in dev_dbg() # gpio-scx/patch.pin-config-view # another 'c' 'command' # gpio-scx/quiet-getset # take out excess dbg stuff (pretty quiet now) gpio-scx/patch.shadow-current # imitate scx200_gpio's shadow regs in pc87* # post KMentors-post patches .. gpio-scx/patch.mutexes # use mutexes for config-locks gpio-scx/patch.viewpins-values # extend dump to obsolete separate 'c' cmd gpio-scx/patch.kconfig # add stuff for kbuild # TBC # combine api26 with pdev, which is just one step. # merge c&v commands to single do-all-fn # delay viewpins, dump-diet should also un-ifdef it too. diff.sys-gpio-rollup-1 This patch: Removed editor format-control comments, and used scripts/Lindent to clean up whitespace, then deleted the bogus chunks :-( Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>