aboutsummaryrefslogtreecommitdiff
path: root/mm/slab.c
AgeCommit message (Collapse)Author
2008-04-28mm: move cache_line_size() to <linux/cache.h>Pekka Enberg
Not all architectures define cache_line_size() so as suggested by Andrew move the private implementations in mm/slab.c and mm/slob.c to <linux/cache.h>. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mm: have zonelist contains structs with both a zone pointer and zone_idxMel Gorman
Filtering zonelists requires very frequent use of zone_idx(). This is costly as it involves a lookup of another structure and a substraction operation. As the zone_idx is often required, it should be quickly accessible. The node idx could also be stored here if it was found that accessing zone->node is significant which may be the case on workloads where nodemasks are heavily used. This patch introduces a struct zoneref to store a zone pointer and a zone index. The zonelist then consists of an array of these struct zonerefs which are looked up as necessary. Helpers are given for accessing the zone index as well as the node index. [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers] [hugh@veritas.com: mm-have-zonelist: fix memcg ooms] [hugh@veritas.com: just return do_try_to_free_pages] [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mm: use two zonelist that are filtered by GFP maskMel Gorman
Currently a node has two sets of zonelists, one for each zone type in the system and a second set for GFP_THISNODE allocations. Based on the zones allowed by a gfp mask, one of these zonelists is selected. All of these zonelists consume memory and occupy cache lines. This patch replaces the multiple zonelists per-node with two zonelists. The first contains all populated zones in the system, ordered by distance, for fallback allocations when the target/preferred node has no free pages. The second contains all populated zones in the node suitable for GFP_THISNODE allocations. An iterator macro is introduced called for_each_zone_zonelist() that interates through each zone allowed by the GFP flags in the selected zonelist. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28mm: introduce node_zonelist() for accessing the zonelist for a GFP maskMel Gorman
Introduce a node_zonelist() helper function. It is used to lookup the appropriate zonelist given a node and a GFP mask. The patch on its own is a cleanup but it helps clarify parts of the two-zonelist-per-node patchset. If necessary, it can be merged with the next patch in this set without problems. Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-19nodemask: use new node_to_cpumask_ptr functionMike Travis
* Use new node_to_cpumask_ptr. This creates a pointer to the cpumask for a given node. This definition is in mm patch: asm-generic-add-node_to_cpumask_ptr-macro.patch * Use new set_cpus_allowed_ptr function. Depends on: [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch [sched-devel]: sched: add new set_cpus_allowed_ptr function [x86/latest]: x86: add cpus_scnprintf function Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Greg Banks <gnb@melbourne.sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26slab: fix cache_cache bootstrap in kmem_cache_init()Daniel Yeisley
Commit 556a169dab38b5100df6f4a45b655dddd3db94c1 ("slab: fix bootstrap on memoryless node") introduced bootstrap-time cache_cache list3s for all nodes but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This patch fixes list_add() corruption in mm/slab.c seen on the ES7000. Cc: Mel Gorman <mel@csn.ul.ie> Cc: Olaf Hering <olaf@aepfle.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-03-19mm: fix various kernel-doc commentsRandy Dunlap
Fix various kernel-doc notation in mm/: filemap.c: add function short description; convert 2 to kernel-doc fremap.c: change parameter 'prot' to @prot pagewalk.c: change "-" in function parameters to ":" slab.c: fix short description of kmem_ptr_validate() swap.c: fix description & parameters of put_pages_list() swap_state.c: fix function parameters vmalloc.c: change "@returns" to "Returns:" since that is not a parameter Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-06slab: NUMA slab allocator migration bugfixJoe Korty
NUMA slab allocator cpu migration bugfix The NUMA slab allocator (specifically, cache_alloc_refill) is not refreshing its local copies of what cpu and what numa node it is on, when it drops and reacquires the irq block that it inherited from its caller. As a result those values become invalid if an attempt to migrate the process to another numa node occured while the irq block had been dropped. The solution is to make cache_alloc_refill reload these variables whenever it drops and reacquires the irq block. The error is very difficult to hit. When it does occur, one gets the following oops + stack traceback bits in check_spinlock_acquired: kernel BUG at mm/slab.c:2417 cache_alloc_refill+0xe6 kmem_cache_alloc+0xd0 ... This patch was developed against 2.6.23, ported to and compiled-tested only against 2.6.25-rc4. Signed-off-by: Joe Korty <joe.korty@ccur.com> Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-03-06slab - use angle brackets for include of kmalloc_sizes.hJoe Perches
Make them all use angle brackets and the directory name. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-03-06slab numa fallback logic: Do not pass unfiltered flags to page allocatorChristoph Lameter
The NUMA fallback logic should be passing local_flags to kmem_get_pages() and not simply the flags passed in. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-02-14slab: avoid double initialization & do initialization in 1 placeMarcin Slusarz
- alloc_slabmgmt: initialize all slab fields in 1 place - slab->nodeid was initialized twice: in alloc_slabmgmt and immediately after it in cache_grow Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> CC: Christoph Lameter <clameter@sgi.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com>
2008-01-25cpu-hotplug: replace per-subsystem mutexes with get_online_cpus()Gautham R Shenoy
This patch converts the known per-subsystem mutexes to get_online_cpus put_online_cpus. It also eliminates the CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE hotplug notification events. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-01-25slab: fix bootstrap on memoryless nodePekka Enberg
If the node we're booting on doesn't have memory, bootstrapping kmalloc() caches resorts to fallback_alloc() which requires ->nodelists set for all nodes. Fix that by calling set_up_list3s() for CACHE_CACHE in kmem_cache_init(). As kmem_getpages() is called with GFP_THISNODE set, this used to work before because of breakage in 2.6.22 and before with GFP_THISNODE returning pages from the wrong node if a node had no memory. So it may have worked accidentally and in an unsafe manner because the pages would have been associated with the wrong node which could trigger bug ons and locking troubles. Tested-by: Mel Gorman <mel@csn.ul.ie> Tested-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [ With additional one-liner by Olaf Hering - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-24slab: partially revert list3 changesMel Gorman
Partial revert the changes made by 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 to the kmem_list3 management. On a machine with a memoryless node, this BUG_ON was triggering static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid) { struct list_head *entry; struct slab *slabp; struct kmem_list3 *l3; void *obj; int x; l3 = cachep->nodelists[nodeid]; BUG_ON(!l3); Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-02Unify /proc/slabinfo configurationLinus Torvalds
Both SLUB and SLAB really did almost exactly the same thing for /proc/slabinfo setup, using duplicate code and per-allocator #ifdef's. This just creates a common CONFIG_SLABINFO that is enabled by both SLUB and SLAB, and shares all the setup code. Maybe SLOB will want this some day too. Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-12-05Add EXPORT_SYMBOL(ksize);Tetsuo Handa
mm/slub.c exports ksize(), but mm/slob.c and mm/slab.c don't. It's used by binfmt_flat, which can be built as a module. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Christoph Lameter <clameter@sgi.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-30Fix kmem_cache_free performance regression in slabMatthew Wilcox
The database performance group have found that half the cycles spent in kmem_cache_free are spent in this one call to BUG_ON. Moving it into the CONFIG_SLAB_DEBUG-only function cache_free_debugcheck() is a performance win of almost 0.5% on their particular benchmark. The call was added as part of commit ddc2e812d592457747c4367fb73edcaa8e1e49ff with the comment that "overhead should be minimal". It may have been minimal at the time, but it isn't now. [ Quoth Pekka Enberg: "I don't think the BUG_ON per se caused the performance regression but rather the virt_to_head_page() changes to virt_to_cache() that were added later." ] Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Pekka J Enberg <penberg@cs.helsinki.fi> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14slab: fix typo in allocation failure handlingAkinobu Mita
This patch fixes wrong array index in allocation failure handling. Cc: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-20spelling fixes: mm/Simon Arlott
Spelling fixes in mm/. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-18cpu hotplug: slab: fix memory leak in cpu hotplug error pathAkinobu Mita
This patch fixes memory leak in error path. In reality, we don't need to call cpuup_canceled(cpu) for now. But upcoming cpu hotplug error handling change needs this. Cc: Christoph Lameter <clameter@sgi.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18cpu hotplug: slab: cleanup cpuup_callback()Akinobu Mita
cpuup_callback() is too long. This patch factors out CPU_UP_CANCELLED and CPU_UP_PREPARE handlings from cpuup_callback(). Cc: Christoph Lameter <clameter@sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Delete gcc-2.95 compatible structure definition.Robert P. J. Day
Since nothing earlier than gcc-3.2 is supported for kernel compilation, that 2.95 hack can be removed. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Slab API: remove useless ctor parameter and reorder parametersChristoph Lameter
Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void *object, struct kmem_cache *s, unsigned long flags) to ctor(struct kmem_cache *s, void *object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16Group short-lived and reclaimable kernel allocationsMel Gorman
This patch marks a number of allocations that are either short-lived such as network buffers or are reclaimable such as inode allocations. When something like updatedb is called, long-lived and unmovable kernel allocations tend to be spread throughout the address space which increases fragmentation. This patch groups these allocations together as much as possible by adding a new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be reclaimed on demand, but not moved. i.e. they can be migrated by deleting them and re-reading the information from elsewhere. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16Categorize GFP flagsChristoph Lameter
The function of GFP_LEVEL_MASK seems to be unclear. In order to clear up the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP flags: GFP_RECLAIM_MASK Flags used to control page allocator reclaim behavior. GFP_CONSTRAINT_MASK Flags used to limit where allocations can occur. GFP_SLAB_BUG_MASK Flags that the slab allocator BUG()s on. These replace the uses of GFP_LEVEL mask in the slab allocators and in vmalloc.c. The use of the flags not included in these sets may occur as a result of a slab allocation standing in for a page allocation when constructing scatter gather lists. Extraneous flags are cleared and not passed through to the page allocator. __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will now be ignored if passed to a slab allocator. Change the allocation of allocator meta data in SLAB and vmalloc to not pass through flags listed in GFP_CONSTRAINT_MASK. SLAB already removes the __GFP_THISNODE flag for such allocations. Generalize that to also cover vmalloc. The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL. The impact of allocator metadata placement on access latency to the cachelines of the object itself is minimal since metadata is only referenced on alloc and free. The attempt is still made to place the meta data optimally but we consistently allow fallback both in SLAB and vmalloc (SLUB does not need to allocate metadata like that). Allocator metadata may serve multiple in kernel users and thus should not be subject to the limitations arising from a single allocation context. [akpm@linux-foundation.org: fix fallback_alloc()] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16Memoryless nodes: Slab supportChristoph Lameter
Slab should not allocate control structures for nodes without memory. This may seem to work right now but its unreliable since not all allocations can fall back due to the use of GFP_THISNODE. Switching a few for_each_online_node's to N_NORMAL_MEMORY will allow us to only allocate for nodes that have regular memory. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Bob Picco <bob.picco@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@skynet.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16Slab allocators: fail if ksize is called with a NULL parameterChristoph Lameter
A NULL pointer means that the object was not allocated. One cannot determine the size of an object that has not been allocated. Currently we return 0 but we really should BUG() on attempts to determine the size of something nonexistent. krealloc() interprets NULL to mean a zero sized object. Handle that separately in krealloc(). Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-22slab: skip calling cache_free_alien() when the platform is not numa capableSiddha, Suresh B
Skip calling cache_free_alien() when the platform is not numa capable. This will avoid cache misses that happen while accessing slabp (which is per page memory reference) to get nodeid. Instead use a global variable to skip the call, which is mostly likely to be present in the cache. This gives a 0.8% performance boost with the database oltp workload on a quad-core SMP platform and by any means the number is not small :) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-24slab: correctly handle __GFP_ZEROAndrew Morton
Use the correct local variable when calling into the page allocator. Local `flags' can have __GFP_ZERO set, which causes us to pass __GFP_ZERO into the page allocator, possibly from illegal contexts. The page allocator will later do prep_zero_page()->kmap_atomic(..., KM_USER0) from irq contexts and will then go BUG. Cc: Mike Galbraith <efault@gmx.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-20mm: Remove slab destructors from kmem_cache_create().Paul Mundt
Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-19Fix up non-NUMA SLAB configuration for zero-sized allocationsLinus Torvalds
I suspect Christoph tested his code only in the NUMA configuration, for the combination of SLAB+non-NUMA the zero-sized kmalloc's would not work. Of course, this would only trigger in configurations where those zero- sized allocations happen (not very common), so that may explain why it wasn't more widely noticed. Seen by by Andi Kleen under qemu, and there seems to be a report by Michael Tsirkin on it too. Cc: Andi Kleen <ak@suse.de> Cc: Roland Dreier <rdreier@cisco.com> Cc: Michael S. Tsirkin <mst@dev.mellanox.co.il> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19FRV: work around a possible compiler bugDavid Howells
Work around a possible bug in the FRV compiler. What appears to be happening is that gcc resolves the __builtin_constant_p() in kmalloc() to true, but then fails to reduce the therefore constant conditions in the if-statements it guards to constant results. When compiling with -O2 or -Os, one single spurious error crops up in cpuup_callback() in mm/slab.c. This can be avoided by making the memsize variable const. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17kallsyms: make KSYM_NAME_LEN include space for trailing '\0'Tejun Heo
KSYM_NAME_LEN is peculiar in that it does not include the space for the trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating buffer. This is nonsense and error-prone. Moreover, when the caller forgets that it's very likely to subtly bite back by corrupting the stack because the last position of the buffer is always cleared to zero. This patch increments KSYM_NAME_LEN by one and updates code accordingly. * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro is fixed. * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together, MODULE_NAME_LEN was treated as if it didn't include space for the trailing '\0'. Fix it. Signed-off-by: Tejun Heo <htejun@gmail.com> Acked-by: Paulo Marques <pmarques@grupopie.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17Slab allocators: Cleanup zeroing allocationsChristoph Lameter
It becomes now easy to support the zeroing allocs with generic inline functions in slab.h. Provide inline definitions to allow the continued use of kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing functions from the slab allocators and util.c. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17Slab allocators: support __GFP_ZERO in all allocatorsChristoph Lameter
A kernel convention for many allocators is that if __GFP_ZERO is passed to an allocator then the allocated memory should be zeroed. This is currently not supported by the slab allocators. The inconsistency makes it difficult to implement in derived allocators such as in the uncached allocator and the pool allocators. In addition the support zeroed allocations in the slab allocators does not have a consistent API. There are no zeroing allocator functions for NUMA node placement (kmalloc_node, kmem_cache_alloc_node). The zeroing allocations are only provided for default allocs (kzalloc, kmem_cache_zalloc_node). __GFP_ZERO will make zeroing universally available and does not require any addititional functions. So add the necessary logic to all slab allocators to support __GFP_ZERO. The code is added to the hot path. The gfp flags are on the stack and so the cacheline is readily available for checking if we want a zeroed object. Zeroing while allocating is now a frequent operation and we seem to be gradually approaching a 1-1 parity between zeroing and not zeroing allocs. The current tree has 3476 uses of kmalloc vs 2731 uses of kzalloc. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17Slab allocators: consistent ZERO_SIZE_PTR support and NULL result semanticsChristoph Lameter
Define ZERO_OR_NULL_PTR macro to be able to remove the checks from the allocators. Move ZERO_SIZE_PTR related stuff into slab.h. Make ZERO_SIZE_PTR work for all slab allocators and get rid of the WARN_ON_ONCE(size == 0) that is still remaining in SLAB. Make slub return NULL like the other allocators if a too large memory segment is requested via __kmalloc. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17Slab allocators: consolidate code for krealloc in mm/util.cChristoph Lameter
The size of a kmalloc object is readily available via ksize(). ksize is provided by all allocators and thus we can implement krealloc in a generic way. Implement krealloc in mm/util.c and drop slab specific implementations of krealloc. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16mm/slab.c: start_cpu_timer() should be __cpuinitAdrian Bunk
start_cpu_timer() should be __cpuinit (which also matches what it's callers are). __devinit didn't cause problems, it simply wasted a few bytes of memory for the common CONFIG_HOTPLUG_CPU=n case. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16Make /proc/slabinfo use seq_list_xxx helpersPavel Emelianov
This entry prints a header in .start callback. This is OK, but the more elegant solution would be to move this into the .show callback and use seq_list_start_head() in .start one. I have left it as is in order to make the patch just switch to new API and noting more. [adobriyan@sw.ru: Wrong pointer was used as kmem_cache pointer] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Cc: Christoph Lameter <clameter@sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-05Fix slab redzone alignmentDavid Woodhouse
Commit b46b8f19c9cd435ecac4d9d12b39d78c137ecd66 fixed a couple of bugs by switching the redzone to 64 bits. Unfortunately, it neglected to ensure that the _second_ redzone, after the slab object, is aligned correctly. This caused illegal instruction faults on sparc32, which for some reason not entirely clear to me are not trapped and fixed up. Two things need to be done to fix this: - increase the object size, rounding up to alignof(long long) so that the second redzone can be aligned correctly. - If SLAB_STORE_USER is set but alignof(long long)==8, allow a full 64 bits of space for the user word at the end of the buffer, even though we may not _use_ the whole 64 bits. This patch should be a no-op on any 64-bit architecture or any 32-bit architecture where alignof(long long) == 4. Of the others, it's tested on ppc32 by myself and a very similar patch was tested on sparc32 by Mark Fortescue, who reported the new problem. Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to the new sizes. Again noticed by Mark. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-01SLAB: remove WARN_ON_ONCE for zero sized objects for 2.6.22 releaseChristoph Lameter
We agreed to remove the WARN_ON_ONCE before 2.6.22 is released. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-06-08slab: fix alien cache handlingChristoph Lameter
cache_free_alien must be called regardless if we use alien caches or not. cache_free_alien() will do the right thing if there are no alien caches available. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: Pekka J Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-19mm/slab: fix section mismatch warningSam Ravnborg
Use the new __init_refok marker to avoid the section mismatch warning from slab.c Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2007-05-17Slab allocators: define common size limitationsChristoph Lameter
Currently we have a maze of configuration variables that determine the maximum slab size. Worst of all it seems to vary between SLAB and SLUB. So define a common maximum size for kmalloc. For conveniences sake we use the maximum size ever supported which is 32 MB. We limit the maximum size to a lower limit if MAX_ORDER does not allow such large allocations. For many architectures this patch will have the effect of adding large kmalloc sizes. x86_64 adds 5 new kmalloc sizes. So a small amount of memory will be needed for these caches (contemporary SLAB has dynamically sizeable node and cpu structure so the waste is less than in the past) Most architectures will then be able to allocate object with sizes up to MAX_ORDER. We have had repeated breakage (in fact whenever we doubled the number of supported processors) on IA64 because one or the other struct grew beyond what the slab allocators supported. This will avoid future issues and f.e. avoid fixes for 2k and 4k cpu support. CONFIG_LARGE_ALLOCS is no longer necessary so drop it. It fixes sparc64 with SLAB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17Remove SLAB_CTOR_CONSTRUCTORChristoph Lameter
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17slab: warn on zero-length allocationsChristoph Lameter
slub warns on this, and we're working on making kmalloc(0) return NULL. Let's make slab warn as well so our testers detect such callers more rapidly. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17Slab allocators: Drop support for destructorsChristoph Lameter
There is no user of destructors left. There is no reason why we should keep checking for destructors calls in the slab allocators. The RFC for this patch was discussed at http://marc.info/?l=linux-kernel&m=117882364330705&w=2 Destructors were mainly used for list management which required them to take a spinlock. Taking a spinlock in a destructor is a bit risky since the slab allocators may run the destructors anytime they decide a slab is no longer needed. Patch drops destructor support. Any attempt to use a destructor will BUG(). Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Move remote node draining out of slab allocatorsChristoph Lameter
Currently the slab allocators contain callbacks into the page allocator to perform the draining of pagesets on remote nodes. This requires SLUB to have a whole subsystem in order to be compatible with SLAB. Moving node draining out of the slab allocators avoids a section of code in SLUB. Move the node draining so that is is done when the vm statistics are updated. At that point we are already touching all the cachelines with the pagesets of a processor. Add a expire counter there. If we have to update per zone or global vm statistics then assume that the pageset will require subsequent draining. The expire counter will be decremented on each vm stats update pass until it reaches zero. Then we will drain one batch from the pageset. The draining will cause vm counter updates which will then cause another expiration until the pcp is empty. So we will drain a batch every 3 seconds. Note that remote node draining is a somewhat esoteric feature that is required on large NUMA systems because otherwise significant portions of system memory can become trapped in pcp queues. The number of pcp is determined by the number of processors and nodes in a system. A system with 4 processors and 2 nodes has 8 pcps which is okay. But a system with 1024 processors and 512 nodes has 512k pcps with a high potential for large amount of memory being caught in them. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09vmstat: use our own timer eventsChristoph Lameter
vmstat is currently using the cache reaper to periodically bring the statistics up to date. The cache reaper does only exists in SLUB as a way to provide compatibility with SLAB. This patch removes the vmstat calls from the slab allocators and provides its own handling. The advantage is also that we can use a different frequency for the updates. Refreshing vm stats is a pretty fast job so we can run this every second and stagger this by only one tick. This will lead to some overlap in large systems. F.e a system running at 250 HZ with 1024 processors will have 4 vm updates occurring at once. However, the vm stats update only accesses per node information. It is only necessary to stagger the vm statistics updates per processor in each node. Vm counter updates occurring on distant nodes will not cause cacheline contention. We could implement an alternate approach that runs the first processor on each node at the second and then each of the other processor on a node on a subsequent tick. That may be useful to keep a large amount of the second free of timer activity. Maybe the timer folks will have some feedback on this one? [jirislaby@gmail.com: add missing break] Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Add suspend-related notifications for CPU hotplugRafael J. Wysocki
Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>