aboutsummaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2009-04-01vmscan: clip swap_cluster_max in shrink_all_memory()Johannes Weiner
shrink_inactive_list() scans in sc->swap_cluster_max chunks until it hits the scan limit it was passed. shrink_inactive_list() { do { isolate_pages(swap_cluster_max) shrink_page_list() } while (nr_scanned < max_scan); } This assumes that swap_cluster_max is not bigger than the scan limit because the latter is checked only after at least one iteration. In shrink_all_memory() sc->swap_cluster_max is initialized to the overall reclaim goal in the beginning but not decreased while reclaim is making progress which leads to subsequent calls to shrink_inactive_list() reclaiming way too much in the one iteration that is done unconditionally. Set sc->swap_cluster_max always to the proper goal before doing shrink_all_zones() shrink_list() shrink_inactive_list(). While the current shrink_all_memory() happily reclaims more than actually requested, this patch fixes it to never exceed the goal: unpatched wanted=10000 reclaimed=13356 wanted=10000 reclaimed=19711 wanted=10000 reclaimed=10289 wanted=10000 reclaimed=17306 wanted=10000 reclaimed=10700 wanted=10000 reclaimed=10004 wanted=10000 reclaimed=13301 wanted=10000 reclaimed=10976 wanted=10000 reclaimed=10605 wanted=10000 reclaimed=10088 wanted=10000 reclaimed=15000 patched wanted=10000 reclaimed=10000 wanted=10000 reclaimed=9599 wanted=10000 reclaimed=8476 wanted=10000 reclaimed=8326 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=9919 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=9624 wanted=10000 reclaimed=10000 wanted=10000 reclaimed=10000 wanted=8500 reclaimed=8092 wanted=316 reclaimed=316 Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: MinChan Kim <minchan.kim@gmail.com> Acked-by: Nigel Cunningham <ncunningham@crca.org.au> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: shrink_all_memory(): use sc.nr_reclaimedMinChan Kim
Commit a79311c14eae4bb946a97af25f3e1b17d625985d "vmscan: bail out of direct reclaim after swap_cluster_max pages" moved the nr_reclaimed counter into the scan control to accumulate the number of all reclaimed pages in a reclaim invocation. shrink_all_memory() can use the same mechanism. it increase code consistency and redability. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: MinChan Kim <minchan.kim@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: don't call mark_page_accessed() in do_swap_page()KOSAKI Motohiro
commit bf3f3bc5e734706730c12a323f9b2068052aa1f0 (mm: don't mark_page_accessed in fault path) only remove the mark_page_accessed() in filemap_fault(). Therefore, swap-backed pages and file-backed pages have inconsistent behavior. mark_page_accessed() should be removed from do_swap_page(). Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: introduce for_each_populated_zone() macroKOSAKI Motohiro
Impact: cleanup In almost cases, for_each_zone() is used with populated_zone(). It's because almost function doesn't need memoryless node information. Therefore, for_each_populated_zone() can help to make code simplify. This patch has no functional change. [akpm@linux-foundation.org: small cleanup] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01vmscan: rename sc.may_swap to may_unmapJohannes Weiner
sc.may_swap does not only influence reclaiming of anon pages but pages mapped into pagetables in general, which also includes mapped file pages. In shrink_page_list(): if (!sc->may_swap && page_mapped(page)) goto keep_locked; For anon pages, this makes sense as they are always mapped and reclaiming them always requires swapping. But mapped file pages are skipped here as well and it has nothing to do with swapping. The real effect of the knob is whether mapped pages are unmapped and reclaimed or not. Rename it to `may_unmap' to have its name match its actual meaning more precisely. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: MinChan Kim <minchan.kim@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01oom_kill: don't call for int_sqrt(0)Cyrill Gorcunov
There is no need to call for int_sqrt if argument is 0. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01vmap: remove needless lock and list in vmapMinChan Kim
vmap's dirty_list is unused. It's for optimizing flushing. but Nick didn't write the code yet. so, we don't need it until time as it is needed. This patch removes vmap_block's dirty_list and codes related to it. Signed-off-by: MinChan Kim <minchan.kim@gmail.com> Acked-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01mm: mminit_validate_memmodel_limits(): remove redundant testCyrill Gorcunov
In case if start_pfn overlap the upper bound no need to test end_pfn again since we have it already trimmed. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-31Merge branch 'cpumask-for-linus' of ↵Rusty Russell
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip Conflicts: arch/x86/include/asm/topology.h drivers/oprofile/buffer_sync.c (Both cases: changed in Linus' tree, removed in Ingo's).
2009-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumaskLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: oprofile: Thou shalt not call __exit functions from __init functions cpumask: remove the now-obsoleted pcibus_to_cpumask(): generic cpumask: remove cpumask_t from core cpumask: convert rcutorture.c cpumask: use new cpumask_ functions in core code. cpumask: remove references to struct irqaction's mask field. cpumask: use mm_cpumask() wrapper: kernel/fork.c cpumask: use set_cpu_active in init/main.c cpumask: remove node_to_first_cpu cpumask: fix seq_bitmap_*() functions. cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL
2009-03-30Merge branch 'locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (33 commits) lockdep: fix deadlock in lockdep_trace_alloc lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB lockdep: annotate reclaim context (__GFP_NOFS), fix lockdep: build fix for !PROVE_LOCKING lockstat: warn about disabled lock debugging lockdep: use stringify.h lockdep: simplify check_prev_add_irq() lockdep: get_user_chars() redo lockdep: simplify get_user_chars() lockdep: add comments to mark_lock_irq() lockdep: remove macro usage from mark_held_locks() lockdep: fully reduce mark_lock_irq() lockdep: merge the !_READ mark_lock_irq() helpers lockdep: merge the _READ mark_lock_irq() helpers lockdep: simplify mark_lock_irq() helpers #3 lockdep: further simplify mark_lock_irq() helpers lockdep: simplify the mark_lock_irq() helpers lockdep: split up mark_lock_irq() lockdep: generate usage strings lockdep: generate the state bit definitions ...
2009-03-31tracing, Text Edit Lock: cleanupIngo Molnar
Remove incorrectly introduced headers. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-30Merge branch 'linus' into cpumask-for-linusIngo Molnar
Conflicts: arch/x86/kernel/cpu/common.c
2009-03-30lockdep: annotate reclaim context (__GFP_NOFS), fix SLOBIngo Molnar
Impact: build fix fix typo in mm/slob.c: mm/slob.c:469: error: ‘flags’ undeclared (first use in this function) mm/slob.c:469: error: (Each undeclared identifier is reported only once mm/slob.c:469: error: for each function it appears in.) Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090128135457.350751756@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-30Merge branch 'x86-stage-3-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-stage-3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (190 commits) Revert "cpuacct: reduce one NULL check in fast-path" Revert "x86: don't compile vsmp_64 for 32bit" x86: Correct behaviour of irq affinity x86: early_ioremap_init(), use __fix_to_virt(), because we are sure it's safe x86: use default_cpu_mask_to_apicid for 64bit x86: fix set_extra_move_desc calling x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot x86/dmi: fix dmi_alloc() section mismatches x86: e820 fix various signedness issues in setup.c and e820.c x86: apic/io_apic.c define msi_ir_chip and ir_ioapic_chip all the time x86: irq.c keep CONFIG_X86_LOCAL_APIC interrupts together x86: irq.c use same path for show_interrupts x86: cpu/cpu.h cleanup x86: Fix a couple of sparse warnings in arch/x86/kernel/apic/io_apic.c Revert "x86: create a non-zero sized bm_pte only when needed" x86: pci-nommu.c cleanup x86: io_delay.c cleanup x86: rtc.c cleanup x86: i8253 cleanup x86: kdebugfs.c cleanup ...
2009-03-30trivial: Fix dubious bitwise 'or' usage spotted by sparse.Alexey Zaytsev
It doesn't change the semantics, but it looks like the logical 'or' was meant to be used here. Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-03-30cpumask: use new cpumask_ functions in core code.Rusty Russell
Impact: cleanup Time to clean up remaining laggards using the old cpu_ functions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Trond.Myklebust@netapp.com
2009-03-30cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALLRusty Russell
Impact: cleanup (Thanks to Al Viro for reminding me of this, via Ingo) CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so: #define CPU_MASK_ALL (cpumask_t) { { ... } } Taking the address of such a temporary is questionable at best, unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added CPU_MASK_ALL_PTR: #define CPU_MASK_ALL_PTR (&CPU_MASK_ALL) Which formalizes this practice. One day gcc could bite us over this usage (though we seem to have gotten away with it so far). So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR with the modern "cpu_all_mask" (a real const struct cpumask *). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Mike Travis <travis@sgi.com>
2009-03-28Merge branch 'linus' into x86/coreIngo Molnar
2009-03-28Merge branch 'devel' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* 'devel' of master.kernel.org:/home/rmk/linux-2.6-arm: (422 commits) [ARM] 5435/1: fix compile warning in sanity_check_meminfo() [ARM] 5434/1: ARM: OMAP: Fix mailbox compile for 24xx [ARM] pxa: fix the bad assumption that PCMCIA sockets always start with 0 [ARM] pxa: fix Colibri PXA300 and PXA320 LCD backlight pins imxfb: Fix TFT mode i.MX21/27: remove ifdef CONFIG_FB_IMX imxfb: add clock support mxc: add arch_reset() function clkdev: add possibility to get a clock based on the device name i.MX1: remove fb support from mach-imx [ARM] pxa: build arch/arm/plat-pxa/mfp.c only when PXA3xx or ARCH_MMP defined Gemini: Add support for Teltonika RUT100 Gemini: gpiolib based GPIO support v2 MAINTAINERS: add myself as Gemini architecture maintainer ARM: Add Gemini architecture v3 [ARM] OMAP: Fix compile for omap2_init_common_hw() MAINTAINERS: Add myself as Faraday ARM core variant maintainer ARM: Add support for FA526 v2 [ARM] acorn,ebsa110,footbridge,integrator,sa1100: Convert asm/io.h to linux/io.h [ARM] collie: fix two minor formatting nits ...
2009-03-28Merge branch 'origin' into develRussell King
Conflicts: sound/soc/pxa/pxa2xx-i2s.c
2009-03-27Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2Ingo Molnar
Conflicts: arch/parisc/kernel/irq.c arch/x86/include/asm/fixmap_64.h arch/x86/include/asm/setup.h kernel/irq/handle.c Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: slob: fix lockup in slob_free() slub: use get_track() slub: rename calculate_min_partial() to set_min_partial() slub: add min_partial sysfs tunable slub: move min_partial to struct kmem_cache SLUB: Fix default slab order for big object sizes SLUB: Do not pass 8k objects through to the page allocator SLUB: Introduce and use SLUB_MAX_SIZE and SLUB_PAGE_SHIFT constants slob: clean up the code SLUB: Use ->objsize from struct kmem_cache_cpu in slab_free()
2009-03-26Merge branch 'for-2.6.30' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.30' of git://git.kernel.dk/linux-2.6-block: Get rid of pdflush_operation() in emergency sync and remount btrfs: get rid of current_is_pdflush() in btrfs_btree_balance_dirty Move the default_backing_dev_info out of readahead.c and into backing-dev.c block: Repeated lines in switching-sched.txt bsg: Remove bogus check against request_queue->max_sectors block: WARN in __blk_put_request() for potential bio leak loop: fix circular locking in loop_clr_fd() loop: support barrier writes bsg: add support for tail queuing cpqarray: enable bus mastering block: genhd.h cleanup patch block: add private bio_set for bio integrity allocations block: genhd.h comment needs updating block: get rid of unused blkdev_free_rq() define block: remove various blk_queue_*() setting functions in blk_init_queue_node() cciss: add BUILD_BUG_ON() for catching bad CommandList_struct alignment block: don't create bio_vec slabs of less than the inline number block: cleanup bio_alloc_bioset()
2009-03-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (71 commits) SELinux: inode_doinit_with_dentry drop no dentry printk SELinux: new permission between tty audit and audit socket SELinux: open perm for sock files smack: fixes for unlabeled host support keys: make procfiles per-user-namespace keys: skip keys from another user namespace keys: consider user namespace in key_permission keys: distinguish per-uid keys in different namespaces integrity: ima iint radix_tree_lookup locking fix TOMOYO: Do not call tomoyo_realpath_init unless registered. integrity: ima scatterlist bug fix smack: fix lots of kernel-doc notation TOMOYO: Don't create securityfs entries unless registered. TOMOYO: Fix exception policy read failure. SELinux: convert the avc cache hash list to an hlist SELinux: code readability with avc_cache SELinux: remove unused av.decided field SELinux: more careful use of avd in avc_has_perm_noaudit SELinux: remove the unused ae.used SELinux: check seqno when updating an avc_node ...
2009-03-26writeback: double the dirty thresholdsWu Fengguang
Enlarge default dirty ratios from 5/10 to 10/20. This fixes [Bug #12809] iozone regression with 2.6.29-rc6. The iozone benchmarks are performed on a 1200M file, with 8GB ram. iozone -i 0 -i 1 -i 2 -i 3 -i 4 -r 4k -s 64k -s 512m -s 1200m -b tmp.xls iozone -B -r 4k -s 64k -s 512m -s 1200m -b tmp.xls The performance regression is triggered by commit 1cf6e7d83bf3(mm: task dirty accounting fix), which makes more correct/thorough dirty accounting. The default 5/10 dirty ratios were picked (a) with the old dirty logic and (b) largely at random and (c) designed to be aggressive. In particular, that (a) means that having fixed some of the dirty accounting, maybe the real bug is now that it was always too aggressive, just hidden by an accounting issue. The enlarged 10/20 dirty ratios are just about enough to fix the regression. [ We will have to look at how this affects the old fsync() latency issue, but that probably will need independent work. - Linus ] Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: "Lin, Ming M" <ming.m.lin@intel.com> Tested-by: "Lin, Ming M" <ming.m.lin@intel.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-26Move the default_backing_dev_info out of readahead.c and into backing-dev.cJens Axboe
It really makes no sense to have it in readahead.c, so move it where it belongs. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-03-25Merge branch 'for-next' of ↵Russell King
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 into devel
2009-03-24Merge branches 'topic/slob/cleanups', 'topic/slob/fixes', 'topic/slub/core', ↵Pekka Enberg
'topic/slub/cleanups' and 'topic/slub/perf' into for-linus
2009-03-24Merge branch 'master' into nextJames Morris
2009-03-23slob: fix lockup in slob_free()Nick Piggin
Don't hold SLOB lock when freeing the page. Reduces lock hold width. See the following thread for discussion of the bug: http://marc.info/?l=linux-kernel&m=123709983214143&w=2 Reported-by: Ingo Molnar <mingo@elte.hu> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-03-23slub: use get_track()Akinobu Mita
Use get_track() in set_track() Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-03-20tracing, Text Edit Lock - kprobes architecture independent support, nommu fixIngo Molnar
Impact: build fix on SH !CONFIG_MMU Stephen Rothwell reported this linux-next build failure on the SH architecture: kernel/built-in.o: In function `disable_all_kprobes': kernel/kprobes.c:1382: undefined reference to `text_mutex' [...] And observed: | Introduced by commit 4460fdad85becd569f11501ad5b91814814335ff ("tracing, | Text Edit Lock - kprobes architecture independent support") from the | tracing tree. text_mutex is defined in mm/memory.c which is only built | if CONFIG_MMU is defined, which is not true for sh allmodconfig. Move this lock to kernel/extable.c (which is already home to various kernel text related routines), which file is always built-in. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> LKML-Reference: <20090320110602.86351a91.sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-18Merge branches 'x86/cleanups', 'x86/cpu', 'x86/debug', 'x86/mce2', 'x86/mm', ↵Ingo Molnar
'x86/mtrr', 'x86/setup', 'x86/setup-memory', 'x86/urgent', 'x86/uv', 'x86/x2apic' and 'linus' into x86/core Conflicts: arch/parisc/kernel/irq.c
2009-03-16Merge branches 'tracing/ftrace', 'tracing/syscalls' and 'linus' into ↵Ingo Molnar
tracing/core Conflicts: arch/parisc/kernel/irq.c
2009-03-15highmem: atomic highmem kmap page pinningNicolas Pitre
Most ARM machines have a non IO coherent cache, meaning that the dma_map_*() set of functions must clean and/or invalidate the affected memory manually before DMA occurs. And because the majority of those machines have a VIVT cache, the cache maintenance operations must be performed using virtual addresses. When a highmem page is kunmap'd, its mapping (and cache) remains in place in case it is kmap'd again. However if dma_map_page() is then called with such a page, some cache maintenance on the remaining mapping must be performed. In that case, page_address(page) is non null and we can use that to synchronize the cache. It is unlikely but still possible for kmap() to race and recycle the virtual address obtained above, and use it for another page before some on-going cache invalidation loop in dma_map_page() is done. In that case, the new mapping could end up with dirty cache lines for another page, and the unsuspecting cache invalidation loop in dma_map_page() might simply discard those dirty cache lines resulting in data loss. For example, let's consider this sequence of events: - dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page. --> - vaddr = page_address(page) is non null. In this case it is likely that the page has valid cache lines associated with vaddr. Remember that the cache is VIVT. --> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32) invalidate_cache_line(i); *** preemption occurs in the middle of the loop above *** - kmap_high() is called for a different page. --> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps() is called. The pkmap_count value for the page passed to dma_map_page() above happens to be 1, so the page is unmapped. But prior to that, flush_cache_kmaps() cleared the cache for it. So far so good. - A fresh pkmap entry is assigned for this kmap request. The Murphy law says this pkmap entry will eventually happen to use the same vaddr as the one which used to belong to the other page being processed by dma_map_page() in the preempted thread above. - The kmap_high() caller start dirtying the cache using the just assigned virtual mapping for its page. *** the first thread is rescheduled *** - The for(...) loop is resumed, but now cached data belonging to a different physical page is being discarded ! And this is not only a preemption issue as ARM can be SMP as well, making the above scenario just as likely. Hence the need for some kind of pkmap page pinning which can be used in any context, primarily for the benefit of dma_map_page() on ARM. This provides the necessary interface to cope with the above issue if ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is unchanged. Signed-off-by: Nicolas Pitre <nico@marvell.com> Reviewed-by: MinChan Kim <minchan.kim@gmail.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2009-03-14vmscan: pgmoved should be cleared after updating recent_rotatedDaisuke Nishimura
pgmoved should be cleared after updating recent_rotated. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-14Merge branches 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/debug', ↵Ingo Molnar
'x86/kconfig', 'x86/mm', 'x86/ptrace', 'x86/setup' and 'x86/urgent'; commit 'v2.6.29-rc8' into x86/core
2009-03-14VM, x86, PAT: add a new vm flag to track full pfnmap at mmapPallipadi, Venkatesh
Impact: cleanup Add a new vm flag VM_PFN_AT_MMAP to identify a PFNMAP that is fully mapped with remap_pfn_range. Patch removes the overloading of VM_INSERTPAGE from the earlier patch. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Nick Piggin <npiggin@suse.de> LKML-Reference: <20090313233543.GA19909@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13Merge branch 'x86/core' into x86/kconfigIngo Molnar
2009-03-13Merge branch 'cpus4096' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-x86 into cpus4096
2009-03-13Merge commit 'v2.6.29-rc8' into cpus4096Ingo Molnar
2009-03-13cpumask: replace node_to_cpumask with cpumask_of_node.Rusty Russell
Impact: cleanup node_to_cpumask (and the blecherous node_to_cpumask_ptr which contained a declaration) are replaced now everyone implements cpumask_of_node. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-03-13VM, x86, PAT: Change is_linear_pfn_mapping to not use vm_pgoffPallipadi, Venkatesh
Impact: fix false positive PAT warnings - also fix VirtalBox hang Use of vma->vm_pgoff to identify the pfnmaps that are fully mapped at mmap time is broken. vm_pgoff is set by generic mmap code even for cases where drivers are setting up the mappings at the fault time. The problem was originally reported here: http://marc.info/?l=linux-kernel&m=123383810628583&w=2 Change is_linear_pfn_mapping logic to overload VM_INSERTPAGE flag along with VM_PFNMAP to mean full PFNMAP setup at mmap time. Problem also tracked at: http://bugzilla.kernel.org/show_bug.cgi?id=12800 Reported-by: Thomas Hellstrom <thellstrom@vmware.com> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha>@intel.com> Cc: Nick Piggin <npiggin@suse.de> Cc: "ebiederm@xmission.com" <ebiederm@xmission.com> Cc: <stable@kernel.org> # only for 2.6.29.1, not .28 LKML-Reference: <20090313004527.GA7176@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-13Merge branch 'core/locking' into tracing/ftraceIngo Molnar
2009-03-13Merge branch 'linus' into core/lockingIngo Molnar
2009-03-12memcg: use correct scan number at reclaimKOSAKI Motohiro
Even when page reclaim is under mem_cgroup, # of scan page is determined by status of global LRU. Fix that. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-03-11percpu: fix spurious alignment WARN in legacy SMP percpu allocatorTejun Heo
Impact: remove spurious WARN on legacy SMP percpu allocator Commit f2a8205c4ef1af917d175c36a4097ae5587791c8 incorrectly added too tight WARN_ON_ONCE() on alignments for UP and legacy SMP percpu allocator. Commit e317603694bfd17b28a40de9d65e1a4ec12f816e fixed it for UP but legacy SMP allocator was forgotten. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sachin P. Sant <sachinp@in.ibm.com>
2009-03-10Merge branch 'x86/core' into tracing/ftraceIngo Molnar
Semantic merge: kernel/trace/trace_functions_graph.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-10percpu: generalize embedding first chunk setup helperTejun Heo
Impact: code reorganization Separate out embedding first chunk setup helper from x86 embedding first chunk allocator and put it in mm/percpu.c. This will be used by the default percpu first chunk allocator and possibly by other archs. Signed-off-by: Tejun Heo <tj@kernel.org>