aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2006-06-30[PATCH] zoned vm counters: zone_reclaim: remove ↵Christoph Lameter
/proc/sys/vm/zone_reclaim_interval The zone_reclaim_interval was necessary because we were not able to determine how many unmapped pages exist in a zone. Therefore we had to scan in intervals to figure out if any pages were unmapped. With the zoned counters and NR_ANON_PAGES we now know the number of pagecache pages and the number of mapped pages in a zone. So we can simply skip the reclaim if there is an insufficient number of unmapped pages. We use SWAP_CLUSTER_MAX as the boundary. Drop all support for /proc/sys/vm/zone_reclaim_interval. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[NETLINK]: Encapsulate eff_cap usage within security framework.Darrel Goeddel
This patch encapsulates the usage of eff_cap (in netlink_skb_params) within the security framework by extending security_netlink_recv to include a required capability parameter and converting all direct usage of eff_caps outside of the lsm modules to use the interface. It also updates the SELinux implementation of the security_netlink_send and security_netlink_recv hooks to take advantage of the sid in the netlink_skb_params struct. This also enables SELinux to perform auditing of netlink capability checks. Please apply, for 2.6.18 if possible. Signed-off-by: Darrel Goeddel <dgoeddel@trustedcs.com> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpcLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (43 commits) [POWERPC] Use little-endian bit from firmware ibm,pa-features property [POWERPC] Make sure smp_processor_id works very early in boot [POWERPC] U4 DART improvements [POWERPC] todc: add support for Time-Of-Day-Clock [POWERPC] Make lparcfg.c work when both iseries and pseries are selected [POWERPC] Fix idr locking in init_new_context [POWERPC] mpc7448hpc2 (taiga) board config file [POWERPC] Add tsi108 pci and platform device data register function [POWERPC] Add general support for mpc7448hpc2 (Taiga) platform [POWERPC] Correct the MAX_CONTEXT definition powerpc: minor cleanups for mpc86xx [POWERPC] Make sure we select CONFIG_NEW_LEDS if ADB_PMU_LED is set [POWERPC] Simplify the code defining the 64-bit CPU features [POWERPC] powerpc: kconfig warning fix [POWERPC] Consolidate some of kernel/misc*.S [POWERPC] Remove unused function call_with_mmu_off [POWERPC] update asm-powerpc/time.h [POWERPC] Clean up it_lp_queue.h [POWERPC] Skip the "copy down" of the kernel if it is already at zero. [POWERPC] Add the use of the firmware soft-reset-nmi to kdump. ...
2006-06-29Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: [PATCH] i386: export memory more than 4G through /proc/iomem [PATCH] 64bit Resource: finally enable 64bit resource sizes [PATCH] 64bit Resource: convert a few remaining drivers to use resource_size_t where needed [PATCH] 64bit resource: change pnp core to use resource_size_t [PATCH] 64bit resource: change pci core and arch code to use resource_size_t [PATCH] 64bit resource: change resource core to use resource_size_t [PATCH] 64bit resource: introduce resource_size_t for the start and end of struct resource [PATCH] 64bit resource: fix up printks for resources in misc drivers [PATCH] 64bit resource: fix up printks for resources in arch and core code [PATCH] 64bit resource: fix up printks for resources in pcmcia drivers [PATCH] 64bit resource: fix up printks for resources in video drivers [PATCH] 64bit resource: fix up printks for resources in ide drivers [PATCH] 64bit resource: fix up printks for resources in mtd drivers [PATCH] 64bit resource: fix up printks for resources in pci core and hotplug drivers [PATCH] 64bit resource: fix up printks for resources in networks drivers [PATCH] 64bit resource: fix up printks for resources in sound drivers [PATCH] 64bit resource: C99 changes for struct resource declarations Fixed up trivial conflict in drivers/ide/pci/cmd64x.c (the printk that was changed by the 64-bit resources had been deleted in the meantime ;)
2006-06-29[PATCH] genirq: add chip->eoi(), fastack -> fasteoiIngo Molnar
Clean up the fastack concept by turning it into fasteoi and introducing the ->eoi() method for chips. This also allows the cleanup of an i386 EOI quirk - now the quirk is cleanly separated from the pure ACK implementation. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: fasteoi handler: handle interrupt disablingBenjamin Herrenschmidt
Note when a disable interrupt happened with the fasteoi handler as well so that delayed disable can be implemented with fasteoi-type controllers. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: more verbose debugging on unexpected IRQ vectorsIngo Molnar
One frequent sign of IRQ handling bugs is the appearance of unexpected vectors. Print out all the IRQ state in that case. We dont want this patch upstream, but it is useful during initial testing. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: no_irq_type -> no_irq_chip renameIngo Molnar
Rename no_irq_type to no_irq_chip. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add SA_TRIGGER supportThomas Gleixner
Enable drivers to request an IRQ with a given irq-flow (trigger/polarity) setting. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add irq-wake (power-management) supportThomas Gleixner
Enable platforms to set the irq-wake (power-management) properties of an IRQ. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add handle_bad_irq()Ingo Molnar
Handle bad IRQ vectors via the irqchip mechanism. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add irq-chip supportThomas Gleixner
Enable platforms to use the irq-chip and irq-flow abstractions: allow setting of the chip, the type and provide highlevel handlers for common irq-flows. [rostedt@goodmis.org: misroute-irq: Don't call desc->chip->end because of edge interrupts] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: coreThomas Gleixner
Core genirq support: add the irq-chip and irq-flow abstractions. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: update copyrightsIngo Molnar
Update/add copyrights in the generic IRQ code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add IRQ_NOAUTOEN supportThomas Gleixner
Enable platforms to disable the automatic enabling of freshly set up irqs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add IRQ_NOREQUEST supportThomas Gleixner
Enable platforms to disable request_irq() for certain interrupts. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add IRQ_NOPROBE supportThomas Gleixner
Introduce IRQ_NOPROBE: enables platforms to control chip-probing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add genirq sw IRQ-retriggerThomas Gleixner
Enable platforms that do not have a hardware-assisted hardirq-resend mechanism to resend them via a softirq-driven IRQ emulation mechanism. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: no_irq_type cleanupsIngo Molnar
Clean up no_irq_type: share the NOP functions where possible, and properly name the ack_bad() function. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: doc: handle_IRQ_event() and __do_IRQ() commentsIngo Molnar
Document handle_IRQ_event() and __do_IRQ(). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: add ->retrigger() irq op to consolidate hw_irq_resend()Ingo Molnar
Add ->retrigger() irq op to consolidate hw_irq_resend() implementations. (Most architectures had it defined to NOP anyway.) NOTE: ia64 needs testing. i386 and x86_64 tested. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: debug: better debug printout in enable_irq()Thomas Gleixner
Make enable_irq() debug printouts user-readable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: turn ARCH_HAS_IRQ_PER_CPU into CONFIG_IRQ_PER_CPUIngo Molnar
Cleanup: change ARCH_HAS_IRQ_PER_CPU into a Kconfig method. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: merge pending_irq_cpumask[] into irq_desc[]Ingo Molnar
Consolidation: remove the pending_irq_cpumask[NR_IRQS] array and move it into the irq_desc[NR_IRQS].pending_mask field. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: merge irq_dir[], smp_affinity_entry[] into irq_desc[]Ingo Molnar
Consolidation: remove the irq_dir[NR_IRQS] and the smp_affinity_entry[NR_IRQS] arrays and move them into the irq_desc[] array. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: reduce irq_desc_t use, mark it obsoleteIngo Molnar
Cleanup: remove irq_desc_t use from the generic IRQ code, and mark it obsolete. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: misc code cleanupsIngo Molnar
Assorted code cleanups to the generic IRQ code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: remove fastcallIngo Molnar
Now that i386 defaults to regparm, explicit uses of fastcall are not needed anymore. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: remove irq_descp()Ingo Molnar
Cleanup: remove irq_descp() - explicit use of irq_desc[] is shorter and more readable. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: cleanup: merge irq_affinity[] into irq_desc[]Ingo Molnar
Consolidation: remove the irq_affinity[NR_IRQS] array and move it into the irq_desc[NR_IRQS].affinity field. [akpm@osdl.org: sparc64 build fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: sem2mutex probe_sem -> probing_activeIngo Molnar
Convert the irq auto-probing semaphore to a mutex. (This allows us to find probing API usage bugs sooner, via the mutex debugging code.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-29[PATCH] genirq: rename desc->handler to desc->chipIngo Molnar
This patch-queue improves the generic IRQ layer to be truly generic, by adding various abstractions and features to it, without impacting existing functionality. While the queue can be best described as "fix and improve everything in the generic IRQ layer that we could think of", and thus it consists of many smaller features and lots of cleanups, the one feature that stands out most is the new 'irq chip' abstraction. The irq-chip abstraction is about describing and coding and IRQ controller driver by mapping its raw hardware capabilities [and quirks, if needed] in a straightforward way, without having to think about "IRQ flow" (level/edge/etc.) type of details. This stands in contrast with the current 'irq-type' model of genirq architectures, which 'mixes' raw hardware capabilities with 'flow' details. The patchset supports both types of irq controller designs at once, and converts i386 and x86_64 to the new irq-chip design. As a bonus side-effect of the irq-chip approach, chained interrupt controllers (master/slave PIC constructs, etc.) are now supported by design as well. The end result of this patchset intends to be simpler architecture-level code and more consolidation between architectures. We reused many bits of code and many concepts from Russell King's ARM IRQ layer, the merging of which was one of the motivations for this patchset. This patch: rename desc->handler to desc->chip. Originally i did not want to do this, because it's a big patch. But having both "desc->handler", "desc->handle_irq" and "action->handler" caused a large degree of confusion and made the code appear alot less clean than it truly is. I have also attempted a dual approach as well by introducing a desc->chip alias - but that just wasnt robust enough and broke frequently. So lets get over with this quickly. The conversion was done automatically via scripts and converts all the code in the kernel. This renaming patch is the first one amongst the patches, so that the remaining patches can stay flexible and can be merged and split up without having some big monolithic patch act as a merge barrier. [akpm@osdl.org: build fix] [akpm@osdl.org: another build fix] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[PATCH] load_module() cleanupAndrew Morton
Undo bizarre declaration in load_module(). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[PATCH] Add EXPORT_UNUSED_SYMBOL and EXPORT_UNUSED_SYMBOL_GPLArjan van de Ven
Temporarily add EXPORT_UNUSED_SYMBOL and EXPORT_UNUSED_SYMBOL_GPL. These will be used as a transition measure for symbols that aren't used in the kernel and are on the way out. When a module uses such a symbol, a warning is printk'd at modprobe time. The main reason for removing unused exports is size: eacho export takes roughly between 100 and 150 bytes of kernel space in the binary. This patch gives users the option to immediately get this size gain via a config option. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-28[POWERPC] Add the use of the firmware soft-reset-nmi to kdump.David Wilder
With this patch, kdump uses the firmware soft-reset NMI for two purposes: 1) Initiate the kdump (take a crash dump) by issuing a soft-reset. 2) Break a CPU out of a deadlock condition that is detected during kdump processing. When a soft-reset is initiated each CPU will enter system_reset_exception() and set its corresponding bit in the global bit-array cpus_in_sr then call die(). When die() finds the CPU's bit set in cpu_in_sr crash_kexec() is called to initiate a crash dump. The first CPU to enter crash_kexec() is called the "crashing CPU". All other CPUs are "secondary CPUs". The secondary CPU's pass through to crash_kexec_secondary() and sleep. The crashing CPU waits for all CPUs to enter via soft-reset then boots the kdump kernel (see crash_soft_reset_check()) When the system crashes due to a panic or exception, crash_kexec() is called by panic() or die(). The crashing CPU sends an IPI to all other CPUs to notify them of the pending shutdown. If a CPU is in a deadlock or hung state with interrupts disabled, the IPI will not be delivered. The result being, that the kdump kernel is not booted. This problem is solved with the use of a firmware generated soft-reset. After the crashing_cpu has issued the IPI, it waits for 10 sec for all CPUs to enter crash_ipi_callback(). A CPU signifies its entry to crash_ipi_callback() by setting its corresponding bit in the cpus_in_crash bit array. After 10 sec, if one or more CPUs have not set their bit in cpus_in_crash we assume that the CPU(s) is deadlocked. The operator is then prompted to generate a soft-reset to break the deadlock. Each CPU enters the soft reset handler as described above. Two conditions must be handled at this point: 1) The system crashed because the operator generated a soft-reset. See 2) The system had crashed before the soft-reset was generated ( in the case of a Panic or oops). The first CPU to enter crash_kexec() uses the state of the kexec_lock to determine this state. If kexec_lock is already held then condition 2 is true and crash_kexec_secondary() is called, else; this CPU is flagged as the crashing CPU, the kexec_lock is acquired and crash_kexec() proceeds as described above. Each additional CPUs responding to the soft-reset will pass through crash_kexec() to kexec_secondary(). All secondary CPUs call crash_ipi_callback() readying them self's for the shutdown. When ready they clear their bit in cpus_in_sr. The crashing CPU waits in kexec_secondary() until all other CPUs have cleared their bits in cpus_in_sr. The kexec kernel boot is then started. Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2006-06-27[PATCH] Remove redundant NULL checks before [kv]free - in kernel/Jesper Juhl
Remove redundant kfree NULL checks from kernel/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] futex_requeue() optimizationSebastien Dugue
In futex_requeue(), when the 2 futexes keys hash to the same bucket, there is no need to move the futex_q to the end of the bucket list. Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] rtmutex: Propagate priority settings into PI lock chainsThomas Gleixner
When the priority of a task, which is blocked on a lock, changes we must propagate this change into the PI lock chain. Therefor the chain walk code is changed to get rid of the references to current to avoid false positives in the deadlock detector, as setscheduler might be called by a task which holds the lock on which the task whose priority is changed is blocked. Also add some comments about the get/put_task_struct usage to avoid confusion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] rtmutex: Modify rtmutex-tester to test the setscheduler propagationThomas Gleixner
Make test suite setscheduler calls asynchronously. Remove the waits in the test cases and add a new testcase to verify the correctness of the setscheduler priority propagation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] Drop tasklist lock in do_sched_setschedulerThomas Gleixner
There is no need to hold tasklist_lock across the setscheduler call, when we pin the task structure with get_task_struct(). Interrupts are disabled in setscheduler anyway and the permission checks do not need interrupts disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: futex_lock_pi/futex_unlock_pi supportIngo Molnar
This adds the actual pi-futex implementation, based on rt-mutexes. [dino@in.ibm.com: fix an oops-causing race] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex futex apiIngo Molnar
Add proxy-locking rt-mutex functionality needed by pi-futexes. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex testerThomas Gleixner
RT-mutex tester: scriptable tester for rt mutexes, which allows userspace scripting of mutex unit-tests (and dynamic tests as well), using the actual rt-mutex implementation of the kernel. [akpm@osdl.org: fixlet] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex debugIngo Molnar
Runtime debugging functionality for rt-mutexes. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: rt mutex coreIngo Molnar
Core functions for the rt-mutex subsystem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: scheduler support for piIngo Molnar
Add framework to boost/unboost the priority of RT tasks. This consists of: - caching the 'normal' priority in ->normal_prio - providing a functions to set/get the priority of the task - make sched_setscheduler() aware of boosting The effective_prio() cleanups also fix a priority-calculation bug pointed out by Andrey Gelman, in set_user_nice(). has_rt_policy() fix: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrey Gelman <agelman@012.net.il> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] pi-futex: futex code cleanupsIngo Molnar
We are pleased to announce "lightweight userspace priority inheritance" (PI) support for futexes. The following patchset and glibc patch implements it, ontop of the robust-futexes patchset which is included in 2.6.16-mm1. We are calling it lightweight for 3 reasons: - in the user-space fastpath a PI-enabled futex involves no kernel work (or any other PI complexity) at all. No registration, no extra kernel calls - just pure fast atomic ops in userspace. - in the slowpath (in the lock-contention case), the system call and scheduling pattern is in fact better than that of normal futexes, due to the 'integrated' nature of FUTEX_LOCK_PI. [more about that further down] - the in-kernel PI implementation is streamlined around the mutex abstraction, with strict rules that keep the implementation relatively simple: only a single owner may own a lock (i.e. no read-write lock support), only the owner may unlock a lock, no recursive locking, etc. Priority Inheritance - why, oh why??? ------------------------------------- Many of you heard the horror stories about the evil PI code circling Linux for years, which makes no real sense at all and is only used by buggy applications and which has horrible overhead. Some of you have dreaded this very moment, when someone actually submits working PI code ;-) So why would we like to see PI support for futexes? We'd like to see it done purely for technological reasons. We dont think it's a buggy concept, we think it's useful functionality to offer to applications, which functionality cannot be achieved in other ways. We also think it's the right thing to do, and we think we've got the right arguments and the right numbers to prove that. We also believe that we can address all the counter-arguments as well. For these reasons (and the reasons outlined below) we are submitting this patch-set for upstream kernel inclusion. What are the benefits of PI? The short reply: ---------------- User-space PI helps achieving/improving determinism for user-space applications. In the best-case, it can help achieve determinism and well-bound latencies. Even in the worst-case, PI will improve the statistical distribution of locking related application delays. The longer reply: ----------------- Firstly, sharing locks between multiple tasks is a common programming technique that often cannot be replaced with lockless algorithms. As we can see it in the kernel [which is a quite complex program in itself], lockless structures are rather the exception than the norm - the current ratio of lockless vs. locky code for shared data structures is somewhere between 1:10 and 1:100. Lockless is hard, and the complexity of lockless algorithms often endangers to ability to do robust reviews of said code. I.e. critical RT apps often choose lock structures to protect critical data structures, instead of lockless algorithms. Furthermore, there are cases (like shared hardware, or other resource limits) where lockless access is mathematically impossible. Media players (such as Jack) are an example of reasonable application design with multiple tasks (with multiple priority levels) sharing short-held locks: for example, a highprio audio playback thread is combined with medium-prio construct-audio-data threads and low-prio display-colory-stuff threads. Add video and decoding to the mix and we've got even more priority levels. So once we accept that synchronization objects (locks) are an unavoidable fact of life, and once we accept that multi-task userspace apps have a very fair expectation of being able to use locks, we've got to think about how to offer the option of a deterministic locking implementation to user-space. Most of the technical counter-arguments against doing priority inheritance only apply to kernel-space locks. But user-space locks are different, there we cannot disable interrupts or make the task non-preemptible in a critical section, so the 'use spinlocks' argument does not apply (user-space spinlocks have the same priority inversion problems as other user-space locking constructs). Fact is, pretty much the only technique that currently enables good determinism for userspace locks (such as futex-based pthread mutexes) is priority inheritance: Currently (without PI), if a high-prio and a low-prio task shares a lock [this is a quite common scenario for most non-trivial RT applications], even if all critical sections are coded carefully to be deterministic (i.e. all critical sections are short in duration and only execute a limited number of instructions), the kernel cannot guarantee any deterministic execution of the high-prio task: any medium-priority task could preempt the low-prio task while it holds the shared lock and executes the critical section, and could delay it indefinitely. Implementation: --------------- As mentioned before, the userspace fastpath of PI-enabled pthread mutexes involves no kernel work at all - they behave quite similarly to normal futex-based locks: a 0 value means unlocked, and a value==TID means locked. (This is the same method as used by list-based robust futexes.) Userspace uses atomic ops to lock/unlock these mutexes without entering the kernel. To handle the slowpath, we have added two new futex ops: FUTEX_LOCK_PI FUTEX_UNLOCK_PI If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to TID fails], then FUTEX_LOCK_PI is called. The kernel does all the remaining work: if there is no futex-queue attached to the futex address yet then the code looks up the task that owns the futex [it has put its own TID into the futex value], and attaches a 'PI state' structure to the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware, kernel-based synchronization object. The 'other' task is made the owner of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the futex value. Then this task tries to lock the rt-mutex, on which it blocks. Once it returns, it has the mutex acquired, and it sets the futex value to its own TID and returns. Userspace has no other work to perform - it now owns the lock, and futex value contains FUTEX_WAITERS|TID. If the unlock side fastpath succeeds, [i.e. userspace manages to do a TID -> 0 atomic transition of the futex value], then no kernel work is triggered. If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the behalf of userspace - and it also unlocks the attached pi_state->rt_mutex and thus wakes up any potential waiters. Note that under this approach, contrary to other PI-futex approaches, there is no prior 'registration' of a PI-futex. [which is not quite possible anyway, due to existing ABI properties of pthread mutexes.] Also, under this scheme, 'robustness' and 'PI' are two orthogonal properties of futexes, and all four combinations are possible: futex, robust-futex, PI-futex, robust+PI-futex. glibc support: -------------- Ulrich Drepper and Jakub Jelinek have written glibc support for PI-futexes (and robust futexes), enabling robust and PI (PTHREAD_PRIO_INHERIT) POSIX mutexes. (PTHREAD_PRIO_PROTECT support will be added later on too, no additional kernel changes are needed for that). [NOTE: The glibc patch is obviously inofficial and unsupported without matching upstream kernel functionality.] the patch-queue and the glibc patch can also be downloaded from: http://redhat.com/~mingo/PI-futex-patches/ Many thanks go to the people who helped us create this kernel feature: Steven Rostedt, Esben Nielsen, Benedikt Spranger, Daniel Walker, John Cooper, Arjan van de Ven, Oleg Nesterov and others. Credits for related prior projects goes to Dirk Grambow, Inaky Perez-Gonzalez, Bill Huey and many others. Clean up the futex code, before adding more features to it: - use u32 as the futex field type - that's the ABI - use __user and pointers to u32 instead of unsigned long - code style / comment style cleanups - rename hash-bucket name from 'bh' to 'hb'. I checked the pre and post futex.o object files to make sure this patch has no code effects. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] BUG() if setscheduler is called from interrupt contextSteven Rostedt
Thomas Gleixner is adding the call to a rtmutex function in setscheduler. This call grabs a spin_lock that is not always protected by interrupts disabled. So this means that setscheduler cant be called from interrupt context. To prevent this from happening in the future, this patch adds a BUG_ON(in_interrupt()) in that function. (Thanks to akpm <aka. Andrew Morton> for this suggestion). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] sched: uninline task_rq_lock()Oleg Nesterov
Saves 543 bytes from sched.o (gcc 3.3.3). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Con Kolivas <kernel@kolivas.org> Cc: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-27[PATCH] sched: mc/smt power savings sched policySiddha, Suresh B
sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in /sys/devices/system/cpu/ control the MC/SMT power savings policy for the scheduler. Based on the values (1-enable, 0-disable) for these controls, sched groups cpu power will be determined for different domains. When power savings policy is enabled and under light load conditions, scheduler will minimize the physical packages/cpu cores carrying the load and thus conserving power(with a perf impact based on the workload characteristics... see OLS 2005 CMP kernel scheduler paper for more details..) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Con Kolivas <kernel@kolivas.org> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>