aboutsummaryrefslogtreecommitdiff
path: root/init/Kconfig
AgeCommit message (Collapse)Author
2008-04-14slub: No need for per node slab counters if !SLUB_DEBUGChristoph Lameter
The per node counters are used mainly for showing data through the sysfs API. If that API is not compiled in then there is no point in keeping track of this data. Disable counters for the number of slabs and the number of total slabs if !SLUB_DEBUG. Incrementing the per node counters is also accessing a potentially contended cacheline so this could actually be a performance benefit to embedded systems. SLABINFO support is also affected. It now must depends on SLUB_DEBUG (which is on by default). Patch also avoids a check for a NULL kmem_cache_node pointer in new_slab() if the system is not compiled with NUMA support. [penberg@cs.helsinki.fi: fix oops and move ->nr_slabs into CONFIG_SLUB_DEBUG] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-03-10rcu: move PREEMPT_RCU config option back under PREEMPTPaul E. McKenney
The original preemptible-RCU patch put the choice between classic and preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures on machines not supporting CONFIG_PREEMPT. This choice was therefore moved to init/Kconfig, which worked, but placed the choice between classic and preemptible RCU at the top level, a very obtuse choice indeed. This patch changes from the Kconfig "choice" mechanism to a pair of booleans, only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in kernel/Kconfig.preempt, where one would expect it to be. The other (CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all architectures, hopefully avoiding build breakage. Thanks to Roman Zippel for suggesting this approach. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Josh Triplett <josh@freedesktop.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: debugfs: fix sparse warnings Driver core: Fix cleanup when failing device_add(). driver core: Remove dpm_sysfs_remove() from error path of device_add() PM: fix new mutex-locking bug in the PM core PM: Do not acquire device semaphores upfront during suspend kobject: properly initialize ksets sysfs: CONFIG_SYSFS_DEPRECATED fix driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATED
2008-03-04Memory controller: rename to Memory Resource ControllerBalbir Singh
Rename Memory Controller to Memory Resource Controller. Reflect the same changes in the CONFIG definition for the Memory Resource Controller. Group together the config options for Resource Counters and Memory Resource Controller. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-04sysfs: CONFIG_SYSFS_DEPRECATED fixIngo Molnar
CONFIG_SYSFS_DEPRECATED=y changed its meaning recently and causes regressions in working setups that had SYSFS_DEPRECATED disabled. so rename it to SYSFS_DEPRECATED_V2 so that testers pick up the new default via 'make oldconfig', even if their old .config's disabled CONFIG_SYSFS_DEPRECATED ... Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-03-04driver core: fix up Kconfig text for CONFIG_SYSFS_DEPRECATEDGreg Kroah-Hartman
As things get moved into this config option, the hard date of 2006 does not work anymore, so update the text to be more descriptive. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-02-23cgroup memory controller: document huge memory/cache overhead in KconfigAndi Kleen
Document huge memory/cache overhead of memory controller in Kconfig I was a little surprised that 2.6.25-rc* increased struct page for the memory controller. At least on many x86-64 machines it will not fit into a single cache line now anymore and also costs considerable amounts of RAM. At earlier review I remembered asking for a external data structure for this. It's also quite unobvious that a innocent looking Kconfig option with a single line Kconfig description has such a negative effect. This patch attempts to document these disadvantages at least so that users configuring their kernel can make a informed decision. Signed-off-by: Andi Kleen <ak@suse.de> Cc: Balbir Singh <balbir@in.ibm.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-13sched: rt-group: make rt groups scheduling configurablePeter Zijlstra
Make the rt group scheduler compile time configurable. Keep it experimental for now. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-09brk: help text typo fixIngo Molnar
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-02-08namespaces: cleanup the code managed with PID_NS optionPavel Emelyanov
Just like with the user namespaces, move the namespace management code into the separate .c file and mark the (already existing) PID_NS option as "depend on NAMESPACES" [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: cleanup the code managed with the USER_NS optionPavel Emelyanov
Make the user_namespace.o compilation depend on this option and move the init_user_ns into user.c file to make the kernel compile and work without the namespaces support. This make the user namespace code be organized similar to other namespaces'. Also mask the USER_NS option as "depend on NAMESPACES". [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: move the IPC namespace under IPC_NS optionPavel Emelyanov
Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: move the UTS namespace under UTS_NS optionPavel Emelyanov
Currently all the namespace management code is in the kernel/utsname.c file, so just compile it out and make stubs in the appropriate header. The init namespace itself is in init/version.c and is in the kernel all the time. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08namespaces: add the NAMESPACES config optionPavel Emelyanov
The option is selectable if EMBEDDED is chosen only. When the EMBEDDED is off namespaces will be on. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07Memory controller: cgroups setupBalbir Singh
Setup the memory cgroup and add basic hooks and controls to integrate and work with the cgroup. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07Memory controller: resource countersPavel Emelianov
With fixes from David Rientjes <rientjes@google.com> Introduce generic structures and routines for resource accounting. Each resource accounting cgroup is supposed to aggregate it, cgroup_subsystem_state and its resource-specific members within. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Pavel Emelianov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-06brk randomization: introduce CONFIG_COMPAT_BRKIngo Molnar
based on similar patch from: Pavel Machek <pavel@ucw.cz> Introduce CONFIG_COMPAT_BRK. If disabled then the kernel is free (but not obliged to) randomize the brk area. Heap randomization breaks ancient binaries, so we keep COMPAT_BRK enabled by default. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-02-05slob: correct Kconfig descriptionMatt Mackall
Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05maps4: make page monitoring /proc file optionalMatt Mackall
Make /proc/ page monitoring configurable This puts the following files under an embedded config option: /proc/pid/clear_refs /proc/pid/smaps /proc/pid/pagemap /proc/kpagecount /proc/kpageflags [akpm@linux-foundation.org: Kconfig fix] Signed-off-by: Matt Mackall <mpm@selenic.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05timerfd: un-break CONFIG_TIMERFDDavide Libenzi
Remove the broken status to CONFIG_TIMERFD. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-03Move Kconfig.instrumentation to arch/Kconfig and init/KconfigMathieu Desnoyers
Move the instrumentation Kconfig to arch/Kconfig for architecture dependent options - oprofile - kprobes and init/Kconfig for architecture independent options - profiling - markers Remove the "Instrumentation Support" menu. Everything moves to "General setup". Delete the kernel/Kconfig.instrumentation file. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-02-03Create arch/KconfigMathieu Desnoyers
Puts the content of arch/Kconfig in the "General setup" menu. Linus: > Should it come with a re-duplication of it's content into each > architecture, which was the case previously ? The oprofile and kprobes > menu entries were litteraly cut and pasted from one architecture to > another. Should we put its content in init/Kconfig then ? I don't think it's a good idea to go back to making it per-architecture, although that extensive "depends on <list-of-archiectures-here>" might indicate that there certainly is room for cleanup there. And I don't think it's wrong keeping it in kernel/Kconfig.xyz per se, I just think it's wrong to (a) lump the code together when it really doesn't necessarily need to and (b) show it to users as some kind of choice that is tied together (whether it then has common code or not). On the per-architecture side, I do think it would be better to *not* have internal architecture knowledge in a generic file, and as such a line like depends on X86_32 || IA64 || PPC || S390 || SPARC64 || X86_64 || AVR32 really shouldn't exist in a file like kernel/Kconfig.instrumentation. It would be much better to do depends on ARCH_SUPPORTS_KPROBES in that generic file, and then architectures that do support it would just have a bool ARCH_SUPPORTS_KPROBES default y in *their* architecture files. That would seem to be much more logical, and is readable both for arch maintainers *and* for people who have no clue - and don't care - about which architecture is supposed to support which interface... Sam Ravnborg: Stuff it into a new file: arch/Kconfig We can then extend this file to include all the 'trailing' Kconfig things that are anyway equal for all ARCHs. But it should be kept clean - so if we introduce such a file then we should use ARCH_HAS_whatever in the arch specific Kconfig files to enable stuff that is not shared. [...] The above suggestion is actually not exactly the best way to do it... First the naming.. A quick grep shows following usage today (in Kconfig files) ARCH_HAS 51 ARCH_SUPPORTS 4 HAVE_ARCH 7 ARCH_HAS is the clear winner. In the common Kconfig file do: config FOO depends on ARCH_HAS_FOO bool "bla bla" config ARCH_HAS_FOO def_bool n In the arch specific Kconfig file in a suitable place do: config SUITABLE_OPTION select ARCH_HAS_FOO The naming of ARCH_HAS_ is fixed and shall be: ARCH_HAS_<config option it will enable> Only a single line added pr. architecture. And we will end up with a (maybe even commented) list of trivial selects. - Yet another update : Moving to HAVE_* now. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Jeff Dike <jdike@addtoit.com> Cc: David Howells <dhowells@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-31RCU: add help text for "RCU implementation type"Paul E. McKenney
This patch supplies help text for the "RCU implementation type" kernel configuration choice. Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-28kconfig: use environment optionRoman Zippel
Use the environment option to provide the ARCH symbol and the KERNELVERSION symbol. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-01-28sh: syscall audit support.Yuichi Nakamura
Support syscall auditing.. Signed-off-by: Yuichi Nakamura <ynakam@hitachisoft.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2008-01-25Preempt-RCU: implementationPaul E. McKenney
This patch implements a new version of RCU which allows its read-side critical sections to be preempted. It uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit read-side critical section. The details of this implementation can be found in this paper - http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf and the article- http://lwn.net/Articles/253651/ This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the "classic" RCU implementations and can be switched to at compiler. Also includes RCU tracing summarized in debugfs. [ akpm@linux-foundation.org: build fixes on non-preempt architectures ] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-24sysfs: make SYSFS_DEPRECATED depend on SYSFSRandy Dunlap
Make SYSFS_DEPRECATED depend on SYSFS since files that check CONFIG_SYSFS_DEPRECATED don't check for CONFIG_SYSFS first. Also don't prompt user about SYSFS_DEPRECATED if SYSFS=n. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-02Unify /proc/slabinfo configurationLinus Torvalds
Both SLUB and SLAB really did almost exactly the same thing for /proc/slabinfo setup, using duplicate code and per-allocator #ifdef's. This just creates a common CONFIG_SLABINFO that is enabled by both SLUB and SLAB, and shares all the setup code. Maybe SLOB will want this some day too. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-02sched: cpu accounting controller (V2)Srivatsa Vaddagiri
Commit cfb5285660aad4931b2ebbfa902ea48a37dfffa1 removed a useful feature for us, which provided a cpu accounting resource controller. This feature would be useful if someone wants to group tasks only for accounting purpose and doesnt really want to exercise any control over their cpu consumption. The patch below reintroduces the feature. It is based on Paul Menage's original patch (Commit 62d0df64065e7c135d0002f069444fbdfc64768f), with these differences: - Removed load average information. I felt it needs more thought (esp to deal with SMP and virtualized platforms) and can be added for 2.6.25 after more discussions. - Convert group cpu usage to be nanosecond accurate (as rest of the cfs stats are) and invoke cpuacct_charge() from the respective scheduler classes - Make accounting scalable on SMP systems by splitting the usage counter to be per-cpu - Move the code from kernel/cpu_acct.c to kernel/sched.c (since the code is not big enough to warrant a new file and also this rightly needs to live inside the scheduler. Also things like accessing rq->lock while reading cpu usage becomes easier if the code lived in kernel/sched.c) The patch also modifies the cpu controller not to provide the same accounting information. Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com> Tested the patches on top of 2.6.24-rc3. The patches work fine. Ran some simple tests like cpuspin (spin on the cpu), ran several tasks in the same group and timed them. Compared their time stamps with cpuacct.usage. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-11-23Blackfin arch: punt CONFIG_BFIN -- we already have CONFIG_BLACKFINMike Frysinger
Signed-off-by: Mike Frysinger <michael.frysinger@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com>
2007-11-14pidns: Place under CONFIG_EXPERIMENTALEric W. Biederman
This is my trivial patch to swat innumerable little bugs with a single blow. After some intensive review (my apologies for not having gotten to this sooner) what we have looks like a good base to build on with the current pid namespace code but it is not complete, and it is still much to simple to find issues where the kernel does the wrong thing outside of the initial pid namespace. Until the dust settles and we are certain we have the ABI and the implementation is as correct as humanly possible let's keep process ID namespaces behind CONFIG_EXPERIMENTAL. Allowing us the option of fixing any ABI or other bugs we find as long as they are minor. Allowing users of the kernel to avoid those bugs simply by ensuring their kernel does not have support for multiple pid namespaces. [akpm@linux-foundation.org: coding-style cleanups] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Adrian Bunk <bunk@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Kir Kolyshkin <kir@swsoft.com> Cc: Kirill Korotaev <dev@sw.ru> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14revert "Task Control Groups: example CPU accounting subsystem"Andrew Morton
Revert 62d0df64065e7c135d0002f069444fbdfc64768f. This was originally intended as a simple initial example of how to create a control groups subsystem; it wasn't intended for mainline, but I didn't make this clear enough to Andrew. The CFS cgroup subsystem now has better functionality for the per-cgroup usage accounting (based directly on CFS stats) than the "usage" status file in this patch, and the "load" status file is rather simplistic - although having a per-cgroup load average report would be a useful feature, I don't believe this patch actually provides it. If it gets into the final 2.6.24 we'd probably have to support this interface for ever. Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-24sched: mark CONFIG_FAIR_GROUP_SCHED as !EXPERIMENTALIngo Molnar
mark CONFIG_FAIR_GROUP_SCHED as !EXPERIMENTAL. All bugs have been fixed and it's perfect ;-) Signed-off-by: Ingo Molnar <mingo@elte.hu>
2007-10-21[PATCH] audit: watching subtreesAl Viro
New kind of audit rule predicates: "object is visible in given subtree". The part that can be sanely implemented, that is. Limitations: * if you have hardlink from outside of tree, you'd better watch it too (or just watch the object itself, obviously) * if you mount something under a watched tree, tell audit that new chunk should be added to watched subtrees * if you umount something in a watched tree and it's still mounted elsewhere, you will get matches on events happening there. New command tells audit to recalculate the trees, trimming such sources of false positives. Note that it's _not_ about path - if something mounted in several places (multiple mount, bindings, different namespaces, etc.), the match does _not_ depend on which one we are using for access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2007-10-19Hook up group scheduler with control groupsSrivatsa Vaddagiri
Enable "cgroup" (formerly containers) based fair group scheduling. This will let administrator create arbitrary groups of tasks (using "cgroup" pseudo filesystem) and control their cpu bandwidth usage. [akpm@linux-foundation.org: fix cpp condition] Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19cgroups: implement namespace tracking subsystemSerge E. Hallyn
When a task enters a new namespace via a clone() or unshare(), a new cgroup is created and the task moves into it. This version names cgroups which are automatically created using cgroup_clone() as "node_<pid>" where pid is the pid of the unsharing or cloned process. (Thanks Pavel for the idea) This is safe because if the process unshares again, it will create /cgroups/(...)/node_<pid>/node_<pid> The only possibilities (AFAICT) for a -EEXIST on unshare are 1. pid wraparound 2. a process fails an unshare, then tries again. Case 1 is unlikely enough that I ignore it (at least for now). In case 2, the node_<pid> will be empty and can be rmdir'ed to make the subsequent unshare() succeed. Changelog: Name cloned cgroups as "node_<pid>". [clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Task Control Groups: simple task cgroup debug info subsystemPaul Menage
This example subsystem exports debugging information as an aid to diagnosing refcount leaks, etc, in the cgroup framework. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Task Control Groups: example CPU accounting subsystemPaul Menage
This example demonstrates how to use the generic cgroup subsystem for a simple resource tracker that counts, for the processes in a cgroup, the total CPU time used and the %CPU used in the last complete 10 second interval. Portions contributed by Balbir Singh <balbir@in.ibm.com> Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Task Control Groups: make cpusets a client of cgroupsPaul Menage
Remove the filesystem support logic from the cpusets system and makes cpusets a cgroup subsystem The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get passed through to the cgroup filesystem with the appropriate options to emulate the old cpuset filesystem behaviour. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19Task Control Groups: basic task cgroup frameworkPaul Menage
Generic Process Control Groups -------------------------- There have recently been various proposals floating around for resource management/accounting and other task grouping subsystems in the kernel, including ResGroups, User BeanCounters, NSProxy cgroups, and others. These all need the basic abstraction of being able to group together multiple processes in an aggregate, in order to track/limit the resources permitted to those processes, or control other behaviour of the processes, and all implement this grouping in different ways. This patchset provides a framework for tracking and grouping processes into arbitrary "cgroups" and assigning arbitrary state to those groupings, in order to control the behaviour of the cgroup as an aggregate. The intention is that the various resource management and virtualization/cgroup efforts can also become task cgroup clients, with the result that: - the userspace APIs are (somewhat) normalised - it's easier to test e.g. the ResGroups CPU controller in conjunction with the BeanCounters memory controller, or use either of them as the resource-control portion of a virtual server system. - the additional kernel footprint of any of the competing resource management systems is substantially reduced, since it doesn't need to provide process grouping/containment, hence improving their chances of getting into the kernel This patch: Add the main task cgroups framework - the cgroup filesystem, and the basic structures for tracking membership and associating subsystem state objects to tasks. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Move PREEMPT_NOTIFIERS into an always-included KconfigAvi Kivity
Kconfig.preempt is not included on some archs (for example, m68k). On those archs, the Kconfig machinery complains that KVM selects an undefined symbol PREEMPT_NOTIFIERS (which lives in Kconfig.preempt). So move the offending symbol into a Kconfig file which is included by everyone. Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16remove PAGE_GROUP_BY_MOBILITYMel Gorman
Grouping pages by mobility can be disabled at compile-time. This was considered undesirable by a number of people. However, in the current stack of patches, it is not a simple case of just dropping the configurable patch as it would cause merge conflicts. This patch backs out the configuration option. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16Add a configure option to group pages by mobilityMel Gorman
The grouping mechanism has some memory overhead and a more complex allocation path. This patch allows the strategy to be disabled for small memory systems or if it is known the workload is suffering because of the strategy. It also acts to show where the page groupings strategy interacts with the standard buddy allocator. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Joel Schopp <jschopp@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-15sched: group scheduler, fix coding style issuesSrivatsa Vaddagiri
Fix coding style issues reported by Randy Dunlap and others Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: enable CONFIG_FAIR_GROUP_SCHED=y by defaultIngo Molnar
enable CONFIG_FAIR_GROUP_SCHED=y by default. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: fair-group sched, cleanupsIngo Molnar
fair-group sched, cleanups. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: add fair-user schedulerSrivatsa Vaddagiri
Enable user-id based fair group scheduling. This is useful for anyone who wants to test the group scheduler w/o having to enable CONFIG_CGROUPS. A separate scheduling group (i.e struct task_grp) is automatically created for every new user added to the system. Upon uid change for a task, it is made to move to the corresponding scheduling group. A /proc tunable (/proc/root_user_share) is also provided to tune root user's quota of cpu bandwidth. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: clean up code under CONFIG_FAIR_GROUP_SCHEDSrivatsa Vaddagiri
With the view of supporting user-id based fair scheduling (and not just container-based fair scheduling), this patch renames several functions and makes them independent of whether they are being used for container or user-id based fair scheduling. Also fix a problem reported by KAMEZAWA Hiroyuki (wrt allocating less-sized array for tg->cfs_rq[] and tf->se[]). Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-15sched: group-scheduler coreSrivatsa Vaddagiri
Add interface to control cpu bandwidth allocation to task-groups. (not yet configurable, due to missing CONFIG_CONTAINERS) Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-09-19disable sys_timerfd() for 2.6.23Andrew Morton
There is still some confusion and disagreement over what this interface should actually do. So it is best that we disable it in 2.6.23 until we get that fully sorted out. (sys_timerfd() was present in 2.6.22 but it was apparently broken, so here we assume that nobody is using it yet). Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Davide Libenzi <davidel@xmailserver.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>