aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2008-04-18[XFS] cleanup vnode use in xfs_create/mknod/mkdirChristoph Hellwig
SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30546a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] cleanup vnode use in dmapi callsChristoph Hellwig
SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30545a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Use power-of-2 sized buffers to reduce overheadDavid Chinner
Now that the ktrace_enter() code is using atomics, the non-power-of-2 buffer sizes - which require modulus operations to get the index - are showing up as using substantial CPU in the profiles. Force the buffer sizes to be rounded up to the nearest power of two and use masking rather than modulus operations to convert the index counter to the buffer index. This reduces ktrace_enter overhead to 8% of a CPU time, and again almost halves the trace intensive test runtime. SGI-PV: 977546 SGI-Modid: xfs-linux-melb:xfs-kern:30538a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Use atomic counters for ktrace buffer indexesDavid Chinner
ktrace_enter() is consuming vast amounts of CPU time due to the use of a single global lock for protecting buffer index increments. Change it to use per-buffer atomic counters - this reduces ktrace_enter() overhead during a trace intensive test on a 4p machine from 58% of all CPU time to 12% and halves test runtime. SGI-PV: 977546 SGI-Modid: xfs-linux-melb:xfs-kern:30537a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Update c/mtime correctly on truncatesDavid Chinner
XFS changes the c/mtime of an inode when truncating it to the same size. The c/mtime is only supposed to change if the size is changed. Not to be confused with ftruncate, where the c/mtime is supposed to be changed even if the size is not changed. The Linux VFS encodes this semantic difference in the flags it sends down to ->setattr, which XFS currently ignores. We need to make XFS pay attention to the VFS flags and hence Do The Right Thing. SGI-PV: 977547 SGI-Modid: xfs-linux-melb:xfs-kern:30536a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] don't encode parent in nfs filehandles unless nessecaryChristoph Hellwig
As Dave pointed out after the export ops changes we now always encode the parent into the filehandle for regular files, but it's not actually needed when the filesystem is export with no_subtree_check. This one-liner fixes xfs_fs_encode_fh to skip encoding the parent unless nessecary. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30535a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] kill xfs_rwlock/xfs_rwunlockChristoph Hellwig
We can just use xfs_ilock/xfs_iunlock instead and get rid of the ugly bhv_vrwlock_t. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30533a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] kill xfs_get_dir_entryChristoph Hellwig
Instead of of xfs_get_dir_entry use a macro to get the xfs_inode from the dentry in the callers and grab the reference manually. Only grab the reference once as it's fine to keep it over the dmapi calls. (And even that reference is actually superflous in Linux but I'll leave that for another patch) SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30531a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] vnode cleanup in xfs_fs_subr.cChristoph Hellwig
Cleanup the unneeded intermediate vnode step in the flushing helpers and go directly from the xfs_inode to the struct address_space. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30530a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] cleanup xfs_vn_mknodChristoph Hellwig
- use proper goto based unwinding instead of the current mess of multiple conditionals - rename ip to inode because that's the normal convention for Linux inodes while ip is the convention for xfs_inodes - remove unlikely checks for the default_acl - branches marked unlikely might lead to extreme branch bredictor slowdons if taken and for some workloads a default acl is quite common - properly indent the switch statements - remove xfs_has_fs_struct as nfsd has a fs_struct in any semi-recent kernel SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30529a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Use atomics for iclog reference countingDavid Chinner
Now that we update the log tail LSN less frequently on transaction completion, we pass the contention straight to the global log state lock (l_iclog_lock) during transaction completion. We currently have to take this lock to decrement the iclog reference count. there is a reference count on each iclog, so we need to take þhe global lock for all refcount changes. When large numbers of processes are all doing small trnasctions, the iclog reference counts will be quite high, and the state change that absolutely requires the l_iclog_lock is the except rather than the norm. Change the reference counting on the iclogs to use atomic_inc/dec so that we can use atomic_dec_and_lock during transaction completion and avoid the need for grabbing the l_iclog_lock for every reference count decrement except the one that matters - the last. SGI-PV: 975671 SGI-Modid: xfs-linux-melb:xfs-kern:30505a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Prevent AIL lock contention during transaction completionDavid Chinner
When hundreds of processors attempt to commit transactions at the same time, they can contend on the AIL lock when updating the tail LSN held in the in-core log structure. At the moment, the tail LSN is only needed when actually writing out an iclog, so it really does not need to be updated on every single transaction completion - only those that result in switching iclogs and flushing them to disk. The result is that we reduce the number of times we need to grab the AIL lock and the log grant lock by up to two orders of magnitude on large processor count machines. The problem has previously been hidden by AIL lock contention walking the AIL list which was recently solved and uncovered this issue. SGI-PV: 975671 SGI-Modid: xfs-linux-melb:xfs-kern:30504a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Use xfs_inode_clean() in more placesDavid Chinner
Remove open coded checks for the whether the inode is clean and replace them with an inlined function. SGI-PV: 977461 SGI-Modid: xfs-linux-melb:xfs-kern:30503a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Remove the xfs_icluster structureDavid Chinner
Remove the xfs_icluster structure and replace with a radix tree lookup. We don't need to keep a list of inodes in each cluster around anymore as we can look them up quickly when we need to. The only time we need to do this now is during inode writeback. Factor the inode cluster writeback code out of xfs_iflush and convert it to use radix_tree_gang_lookup() instead of walking a list of inodes built when we first read in the inodes. This remove 3 pointers from each xfs_inode structure and the xfs_icluster structure per inode cluster. Hence we reduce the cache footprint of the xfs_inodes by between 5-10% depending on cluster sparseness. To be truly efficient we need a radix_tree_gang_lookup_range() call to stop searching once we are past the end of the cluster instead of trying to find a full cluster's worth of inodes. Before (ia64): $ cat /sys/slab/xfs_inode/object_size 536 After: $ cat /sys/slab/xfs_inode/object_size 512 SGI-PV: 977460 SGI-Modid: xfs-linux-melb:xfs-kern:30502a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Don't block pdflush when writing back inodesDavid Chinner
When pdflush is writing back inodes, it can get stuck on inode cluster buffers that are currently under I/O. This occurs when we write data to multiple inodes in the same inode cluster at the same time. Effectively, delayed allocation marks the inode dirty during the data writeback. Hence if the inode cluster was flushed during the writeback of the first inode, the writeback of the second inode will block waiting for the inode cluster write to complete before writing it again for the newly dirtied inode. Basically, we want to avoid this from happening so we don't block pdflush and slow down all of writeback. Hence we introduce a non-blocking async inode flush flag that pdflush uses. If this flag is set, we use non-blocking operations (e.g. try locks) whereever we can to avoid blocking or extra I/O being issued. SGI-PV: 970925 SGI-Modid: xfs-linux-melb:xfs-kern:30501a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Factor xfs_itobp() and xfs_inotobp().David Chinner
The only difference between the functions is one passes an inode for the lookup, the other passes an inode number. However, they don't do the same validity checking or set all the same state on the buffer that is returned yet they should. Factor the functions into a common implementation. SGI-PV: 970925 SGI-Modid: xfs-linux-melb:xfs-kern:30500a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] Fix regression due to refcache removalLachlan McIlroy
SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30490a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Donald Douwsma <donaldd@sgi.com>
2008-04-18[XFS] Remove the xfs_refcacheDonald Douwsma
Remove the xfs_refcache, it was only needed while we were still building for 2.4 kernels. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30472a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-18[XFS] make inode reclaim synchronise with xfs_iflush_done()Lachlan McIlroy
On a forced shutdown, xfs_finish_reclaim() will skip flushing the inode. If the inode flush lock is not already held and there is an outstanding xfs_iflush_done() then we might free the inode prematurely. By acquiring and releasing the flush lock we will synchronise with xfs_iflush_done(). SGI-PV: 909874 SGI-Modid: xfs-linux-melb:xfs-kern:30468a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>
2008-04-18[XFS] actually check error returned by xfs_flush_pages, clean up andNiv Sardi
bailout if fails. SGI-PV: 973041 SGI-Modid: xfs-linux-melb:xfs-kern:30462a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
2008-04-16Linux 2.6.25Linus Torvalds
2008-04-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: it821x: do not describe noraid parameter with its value Pb1200/DBAu1200: fix bad IDE resource size Au1200: IDE driver build fix Au1200: kill IDE driver function prototypes avr32 mustn't select HAVE_IDE
2008-04-17it821x: do not describe noraid parameter with its valuePaul Bolle
Describe noraid parameter with its name (and not its value). Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-17Pb1200/DBAu1200: fix bad IDE resource sizeSergei Shtylyov
The header files for the Pb1200/DBAu1200 boards have wrong definition for the IDE interface's decoded range length -- it should be 512 bytes according to what the IDE driver does. In addition, the IDE platform device claims 1 byte too many for its memory resource -- fix the platform code and the IDE driver in accordance. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-17Au1200: IDE driver build fixSergei Shtylyov
The driver fails to compile with CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA enabled: drivers/ide/mips/au1xxx-ide.c: In function `auide_build_dmatable': drivers/ide/mips/au1xxx-ide.c:256: error: implicit declaration of function `sg_virt' drivers/ide/mips/au1xxx-ide.c:275: error: implicit declaration of function `sg_next' drivers/ide/mips/au1xxx-ide.c:275: warning: assignment makes pointer from integer without a cast Fix this by including <linux/scatterlist.h>. While at it, remove the #include's without which the driver happily builds. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-17Au1200: kill IDE driver function prototypesSergei Shtylyov
Fix these warnings emitted when compiling drivers/ide/mips/au1xxx-ide.c: include/asm/mach-au1x00/au1xxx_ide.h:137: warning: 'auide_tune_drive' declared `static' but never defined include/asm/mach-au1x00/au1xxx_ide.h:138: warning: 'auide_tune_chipset' declared `static' but never defined by wiping out the whole "function prototyping" section from the header file <asm-mips/mach-au1x00/au1xxx_ide.h> as it mostly declared functions that are already dead in the IDE driver; move the only useful prototype into the driver. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-17avr32 mustn't select HAVE_IDEAdrian Bunk
There's a libata based PATA driver for avr32, but no support for drivers/ide/ on avr32. This patch fixes the following compile error: <-- snip --> ... CC [M] drivers/ide/ide-cd.o In file included from /home/bunk/linux/kernel-2.6/git/linux-2.6/drivers/ide/ide-cd.c:37: /home/bunk/linux/kernel-2.6/git/linux-2.6/include/linux/ide.h:209:21: error: asm/ide.h: No such file or directory make[3]: *** [drivers/ide/ide-cd.o] Error 1 <-- snip --> Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2008-04-16Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: update git url for blktrace io context: increment task attachment count in ioc_task_link()
2008-04-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: USB: remove broken usb-serial num_endpoints check USB: option: Add new vendor ID and device ID for AMOI HSDPA modem USB: support more Huawei data card product IDs USB: option.c: add more device IDs USB: Obscure Maxon BP3-USB Device Support 16d8:6280 for option driver
2008-04-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: [TCP]: Add return value indication to tcp_prune_ofo_queue(). PS3: gelic: fix the oops on the broken IE returned from the hypervisor b43legacy: fix DMA mapping leakage mac80211: remove message on receiving unexpected unencrypted frames Update rt2x00 MAINTAINERS entry Add rfkill to MAINTAINERS file rfkill: Fix device type check when toggling states b43legacy: Fix usage of struct device used for DMAing ssb: Fix usage of struct device used for DMAing MAINTAINERS: move to generic repository for iwlwifi b43legacy: fix initvals loading on bcm4303 rtl8187: Add missing priv->vif assignments netconsole: only set CON_PRINTBUFFER if the user specifies a netconsole [CAN]: Update documentation of struct sockaddr_can MAINTAINERS: isdn4linux@listserv.isdn4linux.de is subscribers-only [TCP]: Fix never pruned tcp out-of-order queue. [NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop
2008-04-16AFS: Do not describe debug parameters with their valuePaul Bolle
Describe debug parameters with their names (and not their values). Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15USB: remove broken usb-serial num_endpoints checkGreg Kroah-Hartman
The num_interrupt_in, num_bulk_in, and other checks in the usb-serial code are just wrong, there are too many different devices out there with different numbers of endpoints. We need to just be sticking with the device ids instead of trying to catch this kind of thing. It broke too many different devices. This fixes a large number of usb-serial devices to get them working properly again. Cc: Oliver Neukum <oliver@neukum.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-15USB: option: Add new vendor ID and device ID for AMOI HSDPA modemtang kai
This patch add new vendor ID and device ID for AMOI HSDPA modem. From: tang kai <tangk73@hotmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-15USB: support more Huawei data card product IDsfangxiaozhi
- declare the unusal device for Huawei data card devices in unusual_devs.h - disable the product ID matching for Huawei data card devices in usb_match_device function of driver.c - declare the product IDs in option.c. Signed-off-by: fangxiaozhi <huananhu@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-15USB: option.c: add more device IDsMatthias Urlichs
Add devices by AMOI and NovatelWireless. Signed-Off-By: Matthias Urlichs <matthias@urlichs.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-15USB: Obscure Maxon BP3-USB Device Support 16d8:6280 for option driverJames Cameron
The modem was detected, the ttyUSB{0,1,2} appeared, a call could be made, and the expected data rate was achieved. Tested for an hour or two, total of 100Mb. I shall do more testing. Signed-off-by: James Cameron <quozl@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-15[TCP]: Add return value indication to tcp_prune_ofo_queue().Vitaliy Gusev
Returns non-zero if tp->out_of_order_queue was seen non-empty. This allows tcp_try_rmem_schedule() to return early. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15acpi: unneccessary to scan the PCI bus already scannedyakui.zhao@intel.com
http://bugzilla.kernel.org/show_bug.cgi?id=10124 this change: commit 08f1c192c3c32797068bfe97738babb3295bbf42 Author: Muli Ben-Yehuda <muli@il.ibm.com> Date: Sun Jul 22 00:23:39 2007 +0300 x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata This patch introduces struct pci_sysdata to x86 and x86-64, and converts the existing two users (NUMA, Calgary) to use it. This lays the groundwork for having other users of sysdata, such as the PCI domains work. The Calgary bits are tested, the NUMA bits just look ok. replaces pcibios_scan_root by pci_scan_bus_parented... but in pcibios_scan_root we have a check about scanned busses. Cc: <yakui.zhao@intel.com> Cc: Stian Jordet <stian@jordet.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Yinghai Lu" <yhlu.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15acpi thermal trip points increased to 12Krzysztof Helt
The THERMAL_MAX_TRIPS value is set to 10. It is too few for the Compaq AP550 machine which has 12 trip points. Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Len Brown <lenb@kernel.org> Cc: Zhang Rui <rui.zhang@intel.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15acpi: bus: check once more for an empty list after locking itChuck Ebbert
List could have become empty after the unlocked check that was made earlier, so check again inside the lock. Should fix https://bugzilla.redhat.com/show_bug.cgi?id=427765 Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Cc: <stable@kernel.org> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15spi: spi_s3c24xx must initialize num_chipselectBen Dooks
The SPI core now expects num_chipselect to be set correctly as due to added checks on the chip being selected before an transfer is allowed. This patch adds a num_cs field to the platform data which needs to be set correctly before adding the SPI platform device. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15spi: spi_s3c24xx must initialize bus_numBen Dooks
Pass the bus number we expect the S3C24XX SPI driver to attach to via the platform data. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15spi: spi_s3c24xx driver must init completionBen Dooks
The s3c24xx_spi_txrx() function should initialise the completion each time before using it, otherwise we end up with the possibility of returning success before the interrupt handler has processed all the data. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrsJan Kara
mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But filesystems are calling this function while holding xattr_sem so possible recursion into the fs violates locking ordering of xattr_sem and transaction start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems can specify desired gfp mask and use GFP_NOFS from all of them. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Dave Jones <davej@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15Documentation: correct overcommit caveat in hugetlbpage.txtNishanth Aravamudan
As shown by Gurudas Pai recently, we can put hugepages into the surplus state (by echo 0 > /proc/sys/vm/nr_hugepages), even when /proc/sys/vm/nr_overcommit_hugepages is 0. This is actually correct, to allow the original goal (shrink the static pool to 0) to succeed (we are converting hugepages to surplus because they are in use). However, the documentation does not accurately reflect this case. Update it. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15add "Isolate" migratetype name to /proc/pagetypeinfoKOSAKI Motohiro
In a5d76b54a3f3a40385d7f76069a2feac9f1bad63 (memory unplug: page isolation by KAMEZAWA Hiroyuki), "isolate" migratetype added. but unfortunately, it doesn't treat /proc/pagetypeinfo display logic. this patch add "Isolate" to pagetype name field. /proc/pagetype before: ------------------------------------------------------------------------------------------------------------------------ Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 1 2 2 2 1 2 2 1 1 0 0 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Movable 2 3 3 1 3 3 2 0 0 0 0 Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 0 1 Node 0, zone DMA, type <NULL> 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Unmovable 1 9 7 4 1 1 1 1 0 0 0 Node 0, zone Normal, type Reclaimable 5 2 0 0 1 1 0 0 0 1 0 Node 0, zone Normal, type Movable 0 1 1 0 0 0 1 0 0 1 60 Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 1 Node 0, zone Normal, type <NULL> 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone HighMem, type Unmovable 0 0 1 1 1 0 1 1 2 2 0 Node 0, zone HighMem, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone HighMem, type Movable 236 62 6 2 2 1 1 0 1 1 16 Node 0, zone HighMem, type Reserve 0 0 0 0 0 0 0 0 0 0 1 Node 0, zone HighMem, type <NULL> 0 0 0 0 0 0 0 0 0 0 0 Number of blocks type Unmovable Reclaimable Movable Reserve <NULL> Node 0, zone DMA 1 0 2 1 0 Node 0, zone Normal 10 40 169 1 0 Node 0, zone HighMem 2 0 283 1 0 after: ------------------------------------------------------------------------------------------------------------------------ Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 1 2 2 2 1 2 2 1 1 0 0 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Movable 2 3 3 1 3 3 2 0 0 0 0 Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 0 1 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Unmovable 0 2 1 1 0 1 0 0 0 0 0 Node 0, zone Normal, type Reclaimable 1 1 1 1 1 0 1 1 1 0 0 Node 0, zone Normal, type Movable 0 1 1 1 0 1 0 1 0 0 196 Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 1 Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone HighMem, type Unmovable 0 1 0 0 0 1 1 1 2 2 0 Node 0, zone HighMem, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone HighMem, type Movable 1 0 1 1 0 0 0 0 1 0 200 Node 0, zone HighMem, type Reserve 0 0 0 0 0 0 0 0 0 0 1 Node 0, zone HighMem, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Number of blocks type Unmovable Reclaimable Movable Reserve Isolate Node 0, zone DMA 1 0 2 1 0 Node 0, zone Normal 8 4 207 1 0 Node 0, zone HighMem 2 0 283 1 0 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15Fix typos in Documentation/filesystems/seq_file.txtDmitri Vorobiev
A couple of typos crept into the newly added document about the seq_file interface. This patch corrects those typos and simultaneously deletes unnecessary trailing spaces. Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15uml: compile error fixWANG Cong
This patch fixes this error: In file included from /home/wangcong/projects/linux-2.6/arch/um/kernel/smp.c:9: include2/asm/tlb.h: In function `tlb_remove_page': include2/asm/tlb.h:101: error: implicit declaration of function `page_cache_release' And since including <linux/pagemap.h> in <linux/swap.h> will break sparc, we add this #include in uml's own header. Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15memcg: fix oops in oom handlingLi Zefan
When I used a test program to fork mass processes and immediately move them to a cgroup where the memory limit is low enough to trigger oom kill, I got oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000808 IP: [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18 PGD 4c95f067 PUD 4406c067 PMD 0 Oops: 0002 [1] SMP CPU 2 Modules linked in: Pid: 11973, comm: a.out Not tainted 2.6.25-rc7 #5 RIP: 0010:[<ffffffff8045c47f>] [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18 RSP: 0018:ffff8100448c7c30 EFLAGS: 00010002 RAX: 0000000000000202 RBX: 0000000000000009 RCX: 000000000001c9f3 RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000808 RBP: ffff81007e444080 R08: 0000000000000000 R09: ffff8100448c7900 R10: ffff81000105f480 R11: 00000100ffffffff R12: ffff810067c84140 R13: 0000000000000001 R14: ffff8100441d0018 R15: ffff81007da56200 FS: 00007f70eb1856f0(0000) GS:ffff81007fbad3c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000808 CR3: 000000004498a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process a.out (pid: 11973, threadinfo ffff8100448c6000, task ffff81007da533e0) Stack: ffffffff8023ef5a 00000000000000d0 ffffffff80548dc0 00000000000000d0 ffff810067c84140 ffff81007e444080 ffffffff8026cef9 00000000000000d0 ffff8100441d0000 00000000000000d0 ffff8100441d0000 ffff8100505445c0 Call Trace: [<ffffffff8023ef5a>] ? force_sig_info+0x25/0xb9 [<ffffffff8026cef9>] ? oom_kill_task+0x77/0xe2 [<ffffffff8026d696>] ? mem_cgroup_out_of_memory+0x55/0x67 [<ffffffff802910ad>] ? mem_cgroup_charge_common+0xec/0x202 [<ffffffff8027997b>] ? handle_mm_fault+0x24e/0x77f [<ffffffff8022c4af>] ? default_wake_function+0x0/0xe [<ffffffff8027a17a>] ? get_user_pages+0x2ce/0x3af [<ffffffff80290fee>] ? mem_cgroup_charge_common+0x2d/0x202 [<ffffffff8027a441>] ? make_pages_present+0x8e/0xa4 [<ffffffff8027d1ab>] ? mmap_region+0x373/0x429 [<ffffffff8027d7eb>] ? do_mmap_pgoff+0x2ff/0x364 [<ffffffff80210471>] ? sys_mmap+0xe5/0x111 [<ffffffff8020bfc9>] ? tracesys+0xdc/0xe1 Code: 00 00 01 48 8b 3c 24 e9 46 d4 dd ff f0 ff 07 48 8b 3c 24 e9 3a d4 dd ff fe 07 48 8b 3c 24 e9 2f d4 dd ff 9c 58 fa ba 00 01 00 00 <f0> 66 0f c1 17 38 f2 74 06 f3 90 8a 17 eb f6 c3 fa b8 00 01 00 RIP [<ffffffff8045c47f>] _spin_lock_irqsave+0x8/0x18 RSP <ffff8100448c7c30> CR2: 0000000000000808 ---[ end trace c3702fa668021ea4 ]--- It's reproducable in a x86_64 box, but doesn't happen in x86_32. This is because tsk->sighand is not guarded by RCU, so we have to hold tasklist_lock, just as what out_of_memory() does. Signed-off-by: Li Zefan <lizf@cn.fujitsu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Rientjes <rientjes@cs.washington.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-15serial: fix platform driver hotplug/coldplugKay Sievers
Since 43cc71eed1250755986da4c0f9898f9a635cb3bf, the platform modalias is prefixed with "platform:". Add MODULE_ALIAS() to the hotpluggable serial platform drivers, to re-enable auto loading. NOTE that Kconfig for some of these drivers doesn't allow modular builds, and thus doesn't match the driver source's unload support. Presumably their unload code is buggy and/or weakly tested... [dbrownell@users.sourceforge.net: more drivers, registration fixes] Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>