aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-01-06mm: write_cache_pages integrity fixNick Piggin
In write_cache_pages, nr_to_write is heeded even for data-integrity syncs, so the function will return success after writing out nr_to_write pages, even if that was not sufficient to guarantee data integrity. The callers tend to set it to values that could break data interity semantics easily in practice. For example, nr_to_write can be set to mapping->nr_pages * 2, however if a file has a single, dirty page, then fsync is called, subsequent pages might be concurrently added and dirtied, then write_cache_pages might writeout two of these newly dirty pages, while not writing out the old page that should have been written out. Fix this by ignoring nr_to_write if it is a data integrity sync. This is a data integrity bug. The reason this has been done in the past is to avoid stalling sync operations behind page dirtiers. "If a file has one dirty page at offset 1000000000000000 then someone does an fsync() and someone else gets in first and starts madly writing pages at offset 0, we want to write that page at 1000000000000000. Somehow." What we do today is return success after an arbitrary amount of pages are written, whether or not we have provided the data-integrity semantics that the caller has asked for. Even this doesn't actually fix all stall cases completely: in the above situation, if the file has a huge number of pages in pagecache (but not dirty), then mapping->nrpages is going to be huge, even if pages are being dirtied. This change does indeed make the possibility of long stalls lager, and that's not a good thing, but lying about data integrity is even worse. We have to either perform the sync, or return -ELINUXISLAME so at least the caller knows what has happened. There are subsequent competing approaches in the works to solve the stall problems properly, without compromising data integrity. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: write_cache_pages writepage error fixNick Piggin
In write_cache_pages, if ret signals a real error, but we still have some pages left in the pagevec, done would be set to 1, but the remaining pages would continue to be processed and ret will be overwritten in the process. It could easily be overwritten with success, and thus success will be returned even if there is an error. Thus the caller is told all writes succeeded, wheras in reality some did not. Fix this by bailing immediately if there is an error, and retaining the first error code. This is a data integrity bug. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: write_cache_pages early loop terminationNick Piggin
We'd like to break out of the loop early in many situations, however the existing code has been setting mapping->writeback_index past the final page in the pagevec lookup for cyclic writeback. This is a problem if we don't process all pages up to the final page. Currently the code mostly keeps writeback_index reasonable and hacked around this by not breaking out of the loop or writing pages outside the range in these cases. Keep track of a real "done index" that enables us to terminate the loop in a much more flexible manner. Needed by the subsequent patch to preserve writepage errors, and then further patches to break out of the loop early for other reasons. However there are no functional changes with this patch alone. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: write_cache_pages cyclic fixNick Piggin
In write_cache_pages, scanned == 1 is supposed to mean that cyclic writeback has circled through zero, thus we should not circle again. However it gets set to 1 after the first successful pagevec lookup. This leads to cases where not enough data gets written. Counterexample: file with first 10 pages dirty, writeback_index == 5, nr_to_write == 10. Then the 5 last pages will be found, and scanned will be set to 1, after writing those out, we will not cycle back to get the first 5. Rework this logic, now we'll always cycle unless we started off from index 0. When cycling, only write out as far as 1 page before the start page from the first cycle (so we don't write parts of the file twice). Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06do_mpage_readpage(): don't submit lots of small bios on boundaryMiquel van Smoorenburg
While tracing I/O patterns with blktrace (a great tool) a few weeks ago I identified a minor issue in fs/mpage.c As the comment above mpage_readpages() says, a fs's get_block function will set BH_Boundary when it maps a block just before a block for which extra I/O is required. Since get_block() can map a range of pages, for all these pages the BH_Boundary flag will be set. But we only need to push what I/O we have accumulated at the last block of this range. This makes do_mpage_readpage() send out the largest possible bio instead of a bunch of page-sized ones in the BH_Boundary case. Signed-off-by: Miquel van Smoorenburg <mikevs@xs4all.net> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06oom: print triggering task's cpuset and mems allowedDavid Rientjes
When cpusets are enabled, it's necessary to print the triggering task's set of allowable nodes so the subsequently printed meminfo can be interpreted correctly. We also print the task's cpuset name for informational purposes. [rientjes@google.com: task lock current before dereferencing cpuset] Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06oom: fix zone_scan_mutex nameDavid Rientjes
zone_scan_mutex is actually a spinlock, so name it appropriately. Signed-off-by: David Rientjes <rientjes@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: invoke oom-killer from page faultNick Piggin
Rather than have the pagefault handler kill a process directly if it gets a VM_FAULT_OOM, have it call into the OOM killer. With increasingly sophisticated oom behaviour (cpusets, memory cgroups, oom killing throttling, oom priority adjustment or selective disabling, panic on oom, etc), it's silly to unconditionally kill the faulting process at page fault time. Create a hook for pagefault oom path to call into instead. Only converted x86 and uml so far. [akpm@linux-foundation.org: make __out_of_memory() static] [akpm@linux-foundation.org: fix comment] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jeff Dike <jdike@addtoit.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: move_pages: no need to set pp->page to ZERO_PAGE(0) by defaultBrice Goglin
pp->page is never used when not set to the right page, so there is no need to set it to ZERO_PAGE(0) by default. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: rework do_pages_move() to work on page_sized chunksBrice Goglin
Rework do_pages_move() to work by page-sized chunks of struct page_to_node that are passed to do_move_page_to_node_array(). We now only have to allocate a single page instead a possibly very large vmalloc area to store all page_to_node entries. As a result, new_page_node() will now have a very small lookup, hidding much of the overall sys_move_pages() overhead. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Signed-off-by: Nathalie Furmento <Nathalie.Furmento@labri.fr> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: don't mark_page_accessed in shmem_faultHugh Dickins
Following "mm: don't mark_page_accessed in fault path", which now places a mark_page_accessed() in zap_pte_range(), we should remove the mark_page_accessed() from shmem_fault(). Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: don't mark_page_accessed in fault pathNick Piggin
Doing a mark_page_accessed at fault-time, then doing SetPageReferenced at unmap-time if the pte is young has a number of problems. mark_page_accessed is supposed to be roughly the equivalent of a young pte for unmapped references. Unfortunately it doesn't come with any context: after being called, reclaim doesn't know who or why the page was touched. So calling mark_page_accessed not only adds extra lru or PG_referenced manipulations for pages that are already going to have pte_young ptes anyway, but it also adds these references which are difficult to work with from the context of vma specific references (eg. MADV_SEQUENTIAL pte_young may not wish to contribute to the page being referenced). Then, simply doing SetPageReferenced when zapping a pte and finding it is young, is not a really good solution either. SetPageReferenced does not correctly promote the page to the active list for example. So after removing mark_page_accessed from the fault path, several mmap()+touch+munmap() would have a very different result from several read(2) calls for example, which is not really desirable. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Johannes Weiner <hannes@saeurebad.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: report the MMU pagesize in /proc/pid/smapsMel Gorman
The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the kernel to back a VMA. This matches the size used by the MMU in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64K as a base pagesize may still use 4K pages for the MMU on older processor. To distinguish, this patch reports MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06mm: report the pagesize backing a VMA in /proc/pid/smapsMel Gorman
It is useful to verify a hugepage-aware application is using the expected pagesizes for its memory regions. This patch creates an entry called KernelPageSize in /proc/pid/smaps that is the size of page used by the kernel to back a VMA. The entry is not called PageSize as it is possible the MMU uses a different size. This extension should not break any sensible parser that skips lines containing unrecognised information. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dmLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: dm snapshot: extend exception store functions dm snapshot: split out exception store implementations dm snapshot: rename struct exception_store dm snapshot: separate out exception store interface dm mpath: move trigger_event to system workqueue dm: add name and uuid to sysfs dm table: rework reference counting dm: support barriers on simple devices dm request: extend target interface dm request: add caches dm ioctl: allow dm_copy_name_and_uuid to return only one field dm log: ensure log bitmap fits on log device dm log: move region_size validation dm log: avoid reinitialising io_req on every operation dm: consolidate target deregistration error handling dm raid1: fix error count dm log: fix dm_io_client leak on error paths dm snapshot: change yield to msleep dm table: drop reference at unbind
2009-01-06dm snapshot: extend exception store functionsJonathan Brassow
Supply dm_add_exception as a callback to the read_metadata function. Add a status function ready for a later patch and name the functions consistently. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm snapshot: split out exception store implementationsAlasdair G Kergon
Move the existing snapshot exception store implementations out into separate files. Later patches will place these behind a new interface in preparation for alternative implementations. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm snapshot: rename struct exception_storeJonathan Brassow
Rename struct exception_store to dm_exception_store. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm snapshot: separate out exception store interfaceJonathan Brassow
Pull structures that bridge the gap between snapshot and exception store out of dm-snap.h and put them in a new .h file - dm-exception-store.h. This file will define the API for new exception stores. Ultimately, dm-snap.h is unnecessary, since only dm-snap.c should be using it. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm mpath: move trigger_event to system workqueueAlasdair G Kergon
The same workqueue is used both for sending uevents and processing queued I/O. Deadlock has been reported in RHEL5 when sending a uevent was blocked waiting for the queued I/O to be processed. Use scheduled_work() for the asynchronous uevents instead. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm: add name and uuid to sysfsMilan Broz
Implement simple read-only sysfs entry for device-mapper block device. This patch adds a simple sysfs directory named "dm" under block device properties and implements - name attribute (string containing mapped device name) - uuid attribute (string containing UUID, or empty string if not set) The kobject is embedded in mapped_device struct, so no additional memory allocation is needed for initializing sysfs entry. During the processing of sysfs attribute we need to lock mapped device which is done by a new function dm_get_from_kobj, which returns the md associated with kobject and increases the usage count. Each 'show attribute' function is responsible for its own locking. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm table: rework reference countingMikulas Patocka
Rework table reference counting. The existing code uses a reference counter. When the last reference is dropped and the counter reaches zero, the table destructor is called. Table reference counters are acquired/released from upcalls from other kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all). If the reference counter reaches zero in one of the upcalls, the table destructor is called from almost random kernel code. This leads to various problems: * dm_any_congested being called under a spinlock, which calls the destructor, which calls some sleeping function. * the destructor attempting to take a lock that is already taken by the same process. * stale reference from some other kernel code keeps the table constructed, which keeps some devices open, even after successful return from "dmsetup remove". This can confuse lvm and prevent closing of underlying devices or reusing device minor numbers. The patch changes reference counting so that the table destructor can be called only at predetermined places. The table has always exactly one reference from either mapped_device->map or hash_cell->new_map. After this patch, this reference is not counted in table->holders. A pair of dm_create_table/dm_destroy_table functions is used for table creation/destruction. Temporary references from the other code increase table->holders. A pair of dm_table_get/dm_table_put functions is used to manipulate it. When the table is about to be destroyed, we wait for table->holders to reach 0. Then, we call the table destructor. We use active waiting with msleep(1), because the situation happens rarely (to one user in 5 years) and removing the device isn't performance-critical task: the user doesn't care if it takes one tick more or not. This way, the destructor is called only at specific points (dm_table_destroy function) and the above problems associated with lazy destruction can't happen. Finally remove the temporary protection added to dm_any_congested(). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm: support barriers on simple devicesAndi Kleen
Implement barrier support for single device DM devices This patch implements barrier support in DM for the common case of dm linear just remapping a single underlying device. In this case we can safely pass the barrier through because there can be no reordering between devices. NB. Any DM device might cease to support barriers if it gets reconfigured so code must continue to allow for a possible -EOPNOTSUPP on every barrier bio submitted. - agk Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm request: extend target interfaceKiyoshi Ueda
This patch adds the following target interfaces for request-based dm. map_rq : for mapping a request rq_end_io : for finishing a request busy : for avoiding performance regression from bio-based dm. Target can tell dm core not to map requests now, and that may help requests in the block layer queue to be bigger by I/O merging. In bio-based dm, this behavior is done by device drivers managing the block layer queue. But in request-based dm, dm core has to do that since dm core manages the block layer queue. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm request: add cachesKiyoshi Ueda
This patch prepares some kmem_caches for request-based dm. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm ioctl: allow dm_copy_name_and_uuid to return only one fieldMilan Broz
Allow NULL buffer in dm_copy_name_and_uuid if you only want to return one of the fields. (Required by a following patch that adds these fields to sysfs.) Signed-off-by: Milan Broz <mbroz@redhat.com> Reviewed-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm log: ensure log bitmap fits on log deviceMilan Broz
Check that the log bitmap will fit within the log device. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm log: move region_size validationMilan Broz
Move log size validation from mirror target to log constructor. Removed PAGE_SIZE restriction we no longer think necessary. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm log: avoid reinitialising io_req on every operationTakahiro Yasui
rw_header function updates three members of io_req data every time when I/O is processed. bi_rw and notify.fn are never modified once they get initialized, and so they can be set in advance. header_to_disk() can also be pulled out of write_header() since only one caller needs it and write_header() can be replaced by rw_header() directly. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm: consolidate target deregistration error handlingMikulas Patocka
Change dm_unregister_target to return void and use BUG() for error reporting. dm_unregister_target can only fail because of programming bug in the target driver. It can't fail because of user's behavior or disk errors. This patch changes unregister_target to return void and use BUG if someone tries to unregister non-registered target or unregister target that is in use. This patch removes code duplication (testing of error codes in all dm targets) and reports bugs in just one place, in dm_unregister_target. In some target drivers, these return codes were ignored, which could lead to a situation where bugs could be missed. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm raid1: fix error countJonathan Brassow
Always increase the error count when I/O on a leg of a mirror fails. The error count is used to decide whether to select an alternative mirror leg. If the target doesn't use the "handle_errors" feature, the error count is not updated and the bio can get requeued forever by the read callback. Fix it by increasing error_count before the handle_errors feature checking. Cc: stable@kernel.org Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm log: fix dm_io_client leak on error pathsTakahiro Yasui
In create_log_context function, dm_io_client_destroy function needs to be called, when memory allocation of disk_header, sync_bits and recovering_bits failed, but dm_io_client_destroy is not called. Cc: stable@kernel.org Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Acked-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm snapshot: change yield to msleepMikulas Patocka
Change yield() to msleep(1). If the thread had realtime priority, yield() doesn't really yield, so the yielding process would loop indefinitely and cause machine lockup. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-06dm table: drop reference at unbindMikulas Patocka
Move one dm_table_put() so that the last reference in the thread gets dropped in __unbind(). This is required for a following patch, dm-table-rework-reference-counting.patch, which will change the logic in such a way that table destructor is called only at specific points in the code. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-01-05Merge branch 'for-next' of git://git.o-hand.com/linux-mfdLinus Torvalds
* 'for-next' of git://git.o-hand.com/linux-mfd: (30 commits) mfd: Fix section mismatch in da903x mfd: move drivers/i2c/chips/menelaus.c to drivers/mfd mfd: move drivers/i2c/chips/tps65010.c to drivers/mfd mfd: dm355evm msp430 driver mfd: Add missing break from wm3850-core mfd: Add WM8351 support mfd: Support configurable numbers of DCDCs and ISINKs on WM8350 mfd: Handle missing WM8350 platform data mfd: Add WM8352 support mfd: Use irq_to_desc in twl4030 code power_supply: Add Dialog DA9030 battery charger driver mfd: Dialog DA9030 battery charger MFD driver mfd: Register WM8400 codec device mfd: Pass driver_data onto child devices mfd: Fix twl4030-core.c build error mfd: twl4030 regulator bug fixes mfd: twl4030: create some regulator devices mfd: twl4030: cleanup symbols and OMAP dependency mfd: twl4030: simplified child creation code power_supply: Add battery health reporting for WM8350 ...
2009-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: module: convert to stop_machine_create/destroy. stop_machine: introduce stop_machine_create/destroy. parisc: fix module loading failure of large kernel modules module: fix module loading failure of large kernel modules for parisc module: fix warning of unused function when !CONFIG_PROC_FS kernel/module.c: compare symbol values when marking symbols as exported in /proc/kallsyms. remove CONFIG_KMOD
2009-01-05Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: swiotlb: Don't include linux/swiotlb.h twice in lib/swiotlb.c intel-iommu: fix build error with INTR_REMAP=y and DMAR=n swiotlb: add missing __init annotations
2009-01-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: fs/dlm/ast.c: fix warning dlm: add new debugfs entry dlm: add time stamp of blocking callback dlm: change lock time stamping dlm: improve how bast mode handling dlm: remove extra blocking callback check dlm: replace schedule with cond_resched dlm: remove kmap/kunmap dlm: trivial annotation of be16 value dlm: fix up memory allocation flags
2009-01-05Merge branch 'i2c-next' of git://aeryn.fluff.org.uk/bjdooks/linuxLinus Torvalds
* 'i2c-next' of git://aeryn.fluff.org.uk/bjdooks/linux: i2c-omap: fix type of irq handler function i2c-s3c2410: Change IRQ to be plain integer. i2c-s3c2410: Allow more than one i2c-s3c2410 adapter i2c-s3c2410: Remove default platform data. i2c-s3c2410: Use platform data for gpio configuration i2c-s3c2410: Fixup style problems from checkpatch.pl i2c-omap: Enable I2C wakeups for 34xx i2c-omap: reprogram OCP_SYSCONFIG register after reset i2c-omap: convert 'rev1' flag to generic 'rev' u8 i2c-omap: fix I2C timeouts due to recursive omap_i2c_{un,}idle() i2c-omap: Clean-up i2c-omap i2c-omap: Don't compile in OMAP15xx I2C ISR for non-OMAP15xx builds i2c-omap: Mark init-only functions as __init i2c-omap: Add support for omap34xx i2c-omap: FIFO handling support and broken hw workaround for i2c-omap i2c-omap: Add high-speed support to omap-i2c i2c-omap: Close suspected race between omap_i2c_idle() and omap_i2c_isr() i2c-omap: Do not use interruptible wait call in omap_i2c_xfer_msg Fix up apparently-trivial conflict in drivers/i2c/busses/i2c-s3c2410.c
2009-01-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (22 commits) HID: fix error condition propagation in hid-sony driver HID: fix reference count leak hidraw HID: add proper support for pensketch 12x9 tablet HID: don't allow DealExtreme usb-radio be handled by usb hid driver HID: fix default Kconfig setting for TopSpeed driver HID: driver for TopSeed Cyberlink quirky remote HID: make boot protocol drivers depend on EMBEDDED HID: avoid sparse warning in HID_COMPAT_LOAD_DRIVER HID: hiddev cleanup -- handle all error conditions properly HID: force feedback driver for GreenAsia 0x12 PID HID: switch specialized drivers from "default y" to !EMBEDDED HID: set proper dev.parent in hidraw HID: add dynids facility HID: use GFP_KERNEL in hid_alloc_buffers HID: usbhid, use usb_endpoint_xfer_int HID: move usbhid flags to usbhid.h HID: add n-trig digitizer support HID: add phys and name ioctls to hidraw HID: struct device - replace bus_id with dev_name(), dev_set_name() HID: automatically call usbhid_set_leds in usbhid driver ...
2009-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (27 commits) GFS2: Use DEFINE_SPINLOCK GFS2: Fix use-after-free bug on umount (try #2) Revert "GFS2: Fix use-after-free bug on umount" GFS2: Streamline alloc calculations for writes GFS2: Send useful information with uevent messages GFS2: Fix use-after-free bug on umount GFS2: Remove ancient, unused code GFS2: Move four functions from super.c GFS2: Fix bug in gfs2_lock_fs_check_clean() GFS2: Send some sensible sysfs stuff GFS2: Kill two daemons with one patch GFS2: Move gfs2_recoverd into recovery.c GFS2: Fix "truncate in progress" hang GFS2: Clean up & move gfs2_quotad GFS2: Add more detail to debugfs glock dumps GFS2: Banish struct gfs2_rgrpd_host GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd GFS2: Move rg_igeneration into struct gfs2_rgrpd GFS2: Banish struct gfs2_dinode_host GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize ...
2009-01-05igb: fix anoying type mismatch warning on rx/tx queue sizingLinus Torvalds
When using "min()", the types of both sides should match. With the cpu mask changes, the type of num_online_cpus() will now depend on config options. Use "min_t()" with an explicit type instead. And make the rx/tx case look the same too, just for sanity. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: (30 commits) sparc: Fix minor SPARC32 compile error sparc: Remove reg*.h from Kbuild sparc: Clean arch-specific code in prom_common.c sparc: Kill asm/reg*.h sparc: Use 64BIT config entry MAINTAINERS: update sparc maintainer sparc: unify ipcbuf.h sparc: Update 64-bit defconfig. sparc: remove NO_PROC_ID - it is no longer used sparc: drop get_tbr() in traps.h sparc: fix warning in userspace header traps.h sparc: fix warnings in userspace header byteorder.h sparc: fix warning in userspace header jsflash.h sparc: unify openprom.h sparc64: delete unused linux_prom64_ranges from openprom_64.h sparc: prepare openprom for unification sparc: remove linux_prom_pci_assigned_addresses from openprom_32.h sparc: remove ebus definitions from openprom*.h sparc: unify siginfo.h sparc: unify ptrace.h ...
2009-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (44 commits) qlge: Fix sparse warnings for tx ring indexes. qlge: Fix sparse warning regarding rx buffer queues. qlge: Fix sparse endian warning in ql_hw_csum_setup(). qlge: Fix sparse endian warning for inbound packet control block flags. qlge: Fix sparse warnings for byte swapping in qlge_ethool.c myri10ge: print MAC and serial number on probe failure pkt_sched: cls_u32: Fix locking in u32_change() iucv: fix cpu hotplug af_iucv: Free iucv path/socket in path_pending callback af_iucv: avoid left over IUCV connections from failing connects af_iucv: New error return codes for connect() net/ehea: bitops work on unsigned longs Revert "net: Fix for initial link state in 2.6.28" tcp: Kill extraneous SPLICE_F_NONBLOCK checks. tcp: don't mask EOF and socket errors on nonblocking splice receive dccp: Integrate the TFRC library with DCCP dccp: Clean up ccid.c after integration of CCID plugins dccp: Lockless integration of CCID congestion-control plugins qeth: get rid of extra argument after printk to dev_* conversion qeth: No large send using EDDP for HiperSockets. ...
2009-01-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: ice1724 - Fix a typo in IEC958 PCM name ASoC: fix davinci-sffsdr buglet ALSA: sound/usb: Use negated usb_endpoint_xfer_control, etc ALSA: hda - cxt5051 report jack state ALSA: hda - add basic jack reporting functions to patch_conexant.c ALSA: Use usb_set/get_intfdata ASoC: Clean up kerneldoc warnings ASoC: Fix pxa2xx-pcm checks for invalid DMA channels LSA: hda - Add HP Acacia detection ALSA: hda - fix name for ALC1200 ALSA: sound/usb: use USB API functions rather than constants ASoC: TWL4030: DAPM based capture implementation ASoC: TWL4030: Make the enum filter generic for twl4030
2009-01-05Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Fix on resume, now preserves user policy min/max. [CPUFREQ] Add Celeron Core support to p4-clockmod. [CPUFREQ] add to speedstep-lib additional fsb values for core processors [CPUFREQ] Disable sysfs ui for p4-clockmod. [CPUFREQ] p4-clockmod: reduce noise [CPUFREQ] clean up speedstep-centrino and reduce cpumask_t usage
2009-01-05Merge branch 'upstream-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (138 commits) ocfs2: Access the right buffer_head in ocfs2_merge_rec_left. ocfs2: use min_t in ocfs2_quota_read() ocfs2: remove unneeded lvb casts ocfs2: Add xattr support checking in init_security ocfs2: alloc xattr bucket in ocfs2_xattr_set_handle ocfs2: calculate and reserve credits for xattr value in mknod ocfs2/xattr: fix credits calculation during index create ocfs2/xattr: Always updating ctime during xattr set. ocfs2/xattr: Remove extend_trans call and add its credits from the beginning ocfs2/dlm: Fix race during lockres mastery ocfs2/dlm: Fix race in adding/removing lockres' to/from the tracking list ocfs2/dlm: Hold off sending lockres drop ref message while lockres is migrating ocfs2/dlm: Clean up errors in dlm_proxy_ast_handler() ocfs2/dlm: Fix a race between migrate request and exit domain ocfs2: One more hamming code optimization. ocfs2: Another hamming code optimization. ocfs2: Don't hand-code xor in ocfs2_hamming_encode(). ocfs2: Enable metadata checksums. ocfs2: Validate superblock with checksum and ecc. ocfs2: Checksum and ECC for directory blocks. ...
2009-01-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: inotify: fix type errors in interfaces fix breakage in reiserfs_new_inode() fix the treatment of jfs special inodes vfs: remove duplicate code in get_fs_type() add a vfs_fsync helper sys_execve and sys_uselib do not call into fsnotify zero i_uid/i_gid on inode allocation inode->i_op is never NULL ntfs: don't NULL i_op isofs check for NULL ->i_op in root directory is dead code affs: do not zero ->i_op kill suid bit only for regular files vfs: lseek(fd, 0, SEEK_CUR) race condition
2009-01-05mm lockless pagecache barrier fixNick Piggin
An XFS workload showed up a bug in the lockless pagecache patch. Basically it would go into an "infinite" loop, although it would sometimes be able to break out of the loop! The reason is a missing compiler barrier in the "increment reference count unless it was zero" case of the lockless pagecache protocol in the gang lookup functions. This would cause the compiler to use a cached value of struct page pointer to retry the operation with, rather than reload it. So the page might have been removed from pagecache and freed (refcount==0) but the lookup would not correctly notice the page is no longer in pagecache, and keep attempting to increment the refcount and failing, until the page gets reallocated for something else. This isn't a data corruption because the condition will be detected if the page has been reallocated. However it can result in a lockup. Linus points out that ACCESS_ONCE is also required in that pointer load, even if it's absence is not causing a bug on our particular build. The most general way to solve this is just to put an rcu_dereference in radix_tree_deref_slot. Assembly of find_get_pages, before: .L220: movq (%rbx), %rax #* ivtmp.1162, tmp82 movq (%rax), %rdi #, prephitmp.1149 .L218: testb $1, %dil #, prephitmp.1149 jne .L217 #, testq %rdi, %rdi # prephitmp.1149 je .L203 #, cmpq $-1, %rdi #, prephitmp.1149 je .L217 #, movl 8(%rdi), %esi # <variable>._count.counter, c testl %esi, %esi # c je .L218 #, after: .L212: movq (%rbx), %rax #* ivtmp.1109, tmp81 movq (%rax), %rdi #, ret testb $1, %dil #, ret jne .L211 #, testq %rdi, %rdi # ret je .L197 #, cmpq $-1, %rdi #, ret je .L211 #, movl 8(%rdi), %esi # <variable>._count.counter, c testl %esi, %esi # c je .L212 #, (notice the obvious infinite loop in the first example, if page->count remains 0) Signed-off-by: Nick Piggin <npiggin@suse.de> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-05i2o: Update my addressAlan Cox
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>