aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2009-02-05revert "rlimit: permit setting RLIMIT_NOFILE to RLIM_INFINITY"Andrew Morton
Revert commit 0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f because it causes (arguably poorly designed) existing userspace to spend interminable periods closing billions of not-open file descriptors. We could bring this back, with some sort of opt-in tunable in /proc, which defaults to "off". Peter's alanysis follows: : I spent several hours trying to get to the bottom of a serious : performance issue that appeared on one of our servers after upgrading to : 2.6.28. In the end it's what could be considered a userspace bug that : was triggered by a change in 2.6.28. Since this might also affect other : people I figured I'd at least document what I found here, and maybe we : can even do something about it: : : : So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately : the team maintaining our ftp archive complained that one of their : scripts that previously ran in a few minutes still hadn't even come : close to being done after an hour or so. Downgrading to 2.6.27 fixed : that. : : Turns out that script is forking a lot and something in it or python or : whereever closes all the file descriptors it doesn't want to pass on. : That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and : closes them all with a few exceptions. : : Turns out that takes a long time when your limit -n is now 2^20 (1048576). : : With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is : now a thousand times that. : : 2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to : RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that : allows, as the title implies, to set the limit for number of files to : infinity. : : Closer investigation showed that the broken default ulimit did not apply : to "system" processes (like stuff started from init). In the end I : could establish that all processes that passed through pam_limit at one : point had the bad resource limit. : : Apparently the pam library in Debian etch (4.0) initializes the limits : to some default values when it doesn't have any settings in limit.conf : to override them. Turns out that for nofiles this is RLIM_INFINITY. : Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam : package version 0.79-5 fixes that - tho I'm not sure what side effects : that has. : : Debian lenny (the upcoming 5.0 version) doesn't have this issue as it : uses a different pam (version). Reported-by: Peter Palfrader <weasel@debian.org> Cc: Adam Tkac <vonsch@gmail.com> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <stable@kernel.org> [2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05shm: fix shmctl(SHM_INFO) lockup with !CONFIG_SHMEMTony Battersby
shm_get_stat() assumes that the inode is a "struct shmem_inode_info", which is incorrect for !CONFIG_SHMEM (see fs/ramfs/inode.c: ramfs_get_inode() vs. mm/shmem.c: shmem_get_inode()). This bad assumption can cause shmctl(SHM_INFO) to lockup when shm_get_stat() tries to spin_lock(&info->lock). Users of !CONFIG_SHMEM may encounter this lockup simply by invoking the 'ipcs' command. Reported by Jiri Olsa back in February 2008: http://lkml.org/lkml/2008/2/29/74 Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Cc: Jiri Kosina <jkosina@suse.cz> Reported-by: Jiri Olsa <olsajiri@gmail.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: <stable@kernel.org> [2.6.everything] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05fbmem: don't call copy_from/to_user() with mutex heldAndrea Righi
Avoid calling copy_from/to_user() with fb_info->lock mutex held in fbmem ioctl(). fb_mmap() is called under mm->mmap_sem (A) held, that also acquires fb_info->lock (B); fb_ioctl() takes fb_info->lock (B) and does copy_from/to_user() that might acquire mm->mmap_sem (A), causing a deadlock. NOTE: it doesn't push down the fb_info->lock in each own driver's fb_ioctl(), so there are still potential deadlocks elsewhere. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Johannes Weiner <hannes@saeurebad.de> Cc: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05rtc: rtc-dm355evm driverDavid Brownell
Simple RTC driver for the MSP430 firmware on the DM355 EVM board. Other than not supporting atomic reads/writes of all four bytes, this is reasonable as a basic no-alarm RTC. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05misc: dell-laptop should depend on POWER_SUPPLYMatthew Garrett
dell-laptop makes use of the power supply class information to choose which backlight interface to change. Add a depends on it. Signed-off-by: Matthew Garrett <mjg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05generic swap(): don't return a value from swap()Peter Zijlstra
The swap() macro is accidentally retuning the value of its first argument. Change it into a doesn't-return-anything macro before someone goes and relies upon this behaviour. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wu Fengguang <wfg@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05hpilo: open/close fixDavid Altobelli
The device can take a while to respond to an open/close request, so increase the time kernel will wait for response (1 ms to 10ms). Also, properly clean up a channel on a failed open, by calling the channel close routine. Just freeing the memory isn't sufficient, the device needs to be informed that the channel is no longer open, and the device memory cleared of references to freed dma buffer. Signed-off-by: David Altobelli <david.altobelli@hp.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05kernel/async.c: fix printk warningsAndrew Morton
alpha: kernel/async.c: In function 'run_one_entry': kernel/async.c:141: warning: format '%lli' expects type 'long long int', but argument 2 has type 'async_cookie_t' kernel/async.c:149: warning: format '%lli' expects type 'long long int', but argument 2 has type 'async_cookie_t' kernel/async.c:149: warning: format '%lld' expects type 'long long int', but argument 4 has type 's64' kernel/async.c: In function 'async_synchronize_cookie_special': kernel/async.c:250: warning: format '%lli' expects type 'long long int', but argument 3 has type 's64' Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-05Btrfs: Fix memory leak in cache_drop_leaf_refChris Mason
The code wasn't doing a kfree on the sorted array Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-05ALSA: hda - Fix misc workqueue issuesTakashi Iwai
Some fixes regarding snd-hda-intel workqueue: - Use create_singlethread_workqueue() instead of create_workqueue() as per-CPU work isn't required. - Allocate workq name string properly - Renamed the workq name to "hd-audio*" to be more obvious. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-02-04gianfar: Fix potential soft reset raceAndy Fleming
SOFT_RESET must be asserted for at least 3 TX clocks in order for it to work properly. The syncs in the gfar_write() commands have been hiding this, but we need to guarantee it. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04gianfar: Fix BD_LENGTH_MASK definitionAndy Fleming
BD_LENGTH_MASK is supposed to catch the low 16-bits of the status field, not the low byte. The old way, we would never be able to clean up tx packets with sizes divisible by 256. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04cxgb3: Fix lro switchDivy Le Ray
The LRO switch is always set to 1 in the rx processing loop. It breaks the accelerated iSCSI receive traffic. Fix its computation. Signed-off-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-04ALSA: hda - Add quirk for FSC Amilo Xi2550Takashi Iwai
Added model=fujisu-pi2515 for FSC Amilo Xi2550 with ALC883 codec. Refernece: Novell bnc#450979 https://bugzilla.novell.com/show_bug.cgi?id=450979 Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-02-04Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: APIC: enable workaround on AMD Fam10h CPUs xen: disable interrupts before saving in percpu x86: add x86@kernel.org to MAINTAINERS x86: push old stack address on irqstack for unwinder irq, x86: fix lock status with numa_migrate_irq_desc x86: add cache descriptors for Intel Core i7 x86/Voyager: make it build and boot
2009-02-04Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: add missing kernel-doc in sched.h
2009-02-04Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ftrace: do_each_pid_task() needs rcu lock
2009-02-04iwlwifi: save PCI state before suspend, restore after resumeReinette Chatre
This is the right thing to do and fixes the following warning: [ 115.012278] ------------[ cut here ]------------ [ 115.012281] WARNING: at drivers/pci/pci-driver.c:370 pci_legacy_suspend+0x85/0xc2() [ 115.012285] Hardware name: Latitude D630 [ 115.012301] PCI PM: Device state not saved by iwl3945_pci_suspend+0x0/0x4c [iwl3945] [ 115.012304] Modules linked in: fuse nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc ipv6 acpi_cpufreq kvm_intel kvm snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep arc4 snd_seq_device snd_pcm_oss snd_mixer_oss ecb snd_pcm cryptomgr aead snd_timer crypto_blkcipher snd snd_page_alloc ohci1394 crypto_hash crypto_algapi ch341 ieee1394 usbserial thermal iwl3945 mac80211 led_class lib80211 tg3 processor i2c_i801 i2c_core sg cfg80211 libphy usbhid battery ac button sr_mod cdrom evdev dcdbas ata_generic ata_piix libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [last unloaded: microcode] [ 115.012374] Pid: 4163, comm: pm-suspend Not tainted 2.6.29-rc3-00227-gf1dd849-dirty #67 [ 115.012377] Call Trace: [ 115.012382] [<ffffffff8023d04d>] warn_slowpath+0xb1/0xed [ 115.012387] [<ffffffff80450b5e>] ? _spin_unlock_irqrestore+0x5c/0x78 [ 115.012390] [<ffffffff80254f08>] ? up+0x34/0x39 [ 115.012394] [<ffffffff80362319>] ? acpi_ut_release_mutex+0x5d/0x61 [ 115.012397] [<ffffffff803584b2>] ? acpi_get_data+0x5e/0x70 [ 115.012400] [<ffffffff80363dd9>] ? acpi_bus_get_device+0x25/0x39 [ 115.012403] [<ffffffff80363e98>] ? acpi_bus_power_manageable+0x11/0x29 [ 115.012406] [<ffffffff803462f7>] ? acpi_pci_power_manageable+0x17/0x19 [ 115.012410] [<ffffffff8033ddfd>] ? pci_set_power_state+0xcc/0x101 [ 115.012418] [<ffffffffa01f28e9>] ? iwl3945_pci_suspend+0x0/0x4c [iwl3945] [ 115.012422] [<ffffffff803401e6>] pci_legacy_suspend+0x85/0xc2 [ 115.012425] [<ffffffff80340316>] pci_pm_suspend+0x34/0x86 [ 115.012429] [<ffffffff8039d7ce>] pm_op+0x52/0xe5 [ 115.012432] [<ffffffff8039dd78>] device_suspend+0x32a/0x451 [ 115.012436] [<ffffffff80269ec2>] suspend_devices_and_enter+0x3e/0x13a [ 115.012439] [<ffffffff8026a128>] enter_state+0x110/0x164 [ 115.012442] [<ffffffff8026a233>] state_store+0xb7/0xd7 [ 115.012446] [<ffffffff8032f95f>] kobj_attr_store+0x17/0x19 [ 115.012449] [<ffffffff80307d64>] sysfs_write_file+0xe4/0x119 [ 115.012453] [<ffffffff802baa7a>] vfs_write+0xae/0x137 [ 115.012456] [<ffffffff802babc7>] sys_write+0x47/0x70 [ 115.012459] [<ffffffff8020b73a>] system_call_fastpath+0x16/0x1b [ 115.012467] ---[ end trace 829828966f6f24dc ]--- Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Ming Lei <tom.leiming@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-04iwlwifi: clean key table in iwl_clear_stations_tableReinette Chatre
Cleans uCode key table bit map iwl_clear_stations_table since all stations are cleared also the key table must be. Since the keys are not removed properly on suspend by mac80211 this may result in exhausting key table on resume leading to memory corruption during removal Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-02-04Revert "configfs: Silence lockdep on mkdir(), rmdir() and ↵Mark Fasheh
configfs_depend_item()" This reverts commit 0e0333429a6280e6eb3c98845e4eed90d5f8078a. I committed this by accident - Joel and Louis are working with the lockdep maintainer to provide a better solution than just turning lockdep off. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: <Joel Becker <joel.becker@oracle.com>
2009-02-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: ALSA: pcm_oss: AFMT_S24_LE is set twice in return value ALSA: ASoC: email - update email addresses. OMAP: ASoC: Fix spinlock misuse in omap-pcm.c ALSA: hda - No widget selection for volume knob widgets in proc output ALSA: hda - Add support of iMac 24 Aluminium ALSA: alsa: time reaches -1, tested 0 ALSA: hda - Add quirk for another HP dv5 model
2009-02-04Merge branch 'fix/asoc' into for-linusTakashi Iwai
2009-02-04Merge branch 'fix/hda' into for-linusTakashi Iwai
2009-02-04ALSA: pcm_oss: AFMT_S24_LE is set twice in return valueRoel Kluin
AFMT_S24_LE is set twice in return value vi sound/core/oss/pcm_oss.c +640 #define AFMT_S24_LE 0x00008000 #define AFMT_S24_BE 0x00010000 Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-02-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (40 commits) Blackfin arch: Remove outdated code Blackfin arch: Fix udelay implementation Blackfin arch: Update Copyright information Blackfin arch: Add BF561 PPI POLS, POLC Masks Blackfin arch: Update CM-BF527 kernel config Blackfin arch: define bfin_memmap as static since it is only used here Blackfin arch: cplb mananger: use a do...while loop rather than a for loop Blackfin arch: fix bug - traps test case 19 for exception 0x2d fails Blackfin arch: add platform device bfin_mii-bus and KSZ8893M switch driver platform resources to board files Blackfin arch: build jtag tty driver as a module by default Blackfin arch: fix 2 bugs related to debug Blackfin arch: Add ANOMALY_05000380 to BF54x to kill the compile warning Blackfin arch: Fix bug - 561 SMP kernel can't boot from jffs2 Blackfin arch: base SIC_IWR# programming on whether the MMR exists Blackfin arch: read SYSCR on newer parts that mirror the bits of SWRST in it Blackfin arch: fixup board init function name Blackfin arch: drop CONFIG_I2C_BOARDINFO ifdefs Blackfin arch: bfin_reset->_bfin_reset redirection no longer needed Blackfin arch: sync reboot handler with version in u-boot Blackfin arch: Faster Implementation of csum_tcpudp_nofold() ...
2009-02-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc64: Kill bogus TPC/address truncation during 32-bit faults. sparc: fixup for sparseirq changes sparc64: Validate kernel generated fault addresses on sparc64. sparc64: On non-Niagara, need to touch NMI watchdog in NOHZ mode. sparc64: Implement NMI watchdog on capable cpus. sparc: Probe PMU type and record in sparc_pmu_type. sparc64: Move generic PCR support code to seperate file.
2009-02-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: sunrpc: fix rdma dependencies e1000: Fix PCI enable to honor the need_ioport flag sgi-xp: link XPNET's net_device_ops to its net_device structure pcnet_cs: Fix misuse of the equality operator. hso: add new device id's dca: redesign locks to fix deadlocks cassini/sungem: limit reaches -1, but 0 tested net: variables reach -1, but 0 tested qlge: bugfix: Add missing netif_napi_del call. qlge: bugfix: Add flash offset for second port. qlge: bugfix: Fix endian issue when reading flash. udp: increments sk_drops in __udp_queue_rcv_skb() net: Fix userland breakage wrt. linux/if_tunnel.h net: packet socket packet_lookup_frame fix
2009-02-04Merge branch 'for-linus' of git://git.o-hand.com/linux-mfdLinus Torvalds
* 'for-linus' of git://git.o-hand.com/linux-mfd: mfd: Remove non exported references from pcf50633
2009-02-04Btrfs: don't return congestion in write_cache_pages as oftenChris Mason
On fast devices that go from congested to uncongested very quickly, pdflush is waiting too often in congestion_wait, and the FS is backing off to easily in write_cache_pages. For now, fix this on the btrfs side by only checking congestion after some bios have already gone down. Longer term a real fix is needed for pdflush, but that is a larger project. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: Only prep for btree deletion balances when nodes are mostly emptyChris Mason
Whenever an item deletion is done, we need to balance all the nodes in the tree to make sure we don't end up with an empty node if a pointer is deleted. This balance prep happens from the root of the tree down so we can drop our locks as we go. reada_for_balance was triggering read-ahead on neighboring nodes even when no balancing was required. This adds an extra check to avoid calling balance_level() and avoid reada_for_balance() when a balance won't be required. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: fix btrfs_unlock_up_safe to walk the entire pathChris Mason
btrfs_unlock_up_safe would break out at the first NULL node entry or unlocked node it found in the path. Some of the callers have missing nodes at the lower levels of the path, so this commit fixes things to check all the nodes in the path before returning. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: change btrfs_del_leaf to drop locks earlierChris Mason
btrfs_del_leaf does two things. First it removes the pointer in the parent, and then it frees the block that has the leaf. It has the parent node locked for both operations. But, it only needs the parent locked while it is deleting the pointer. After that it can safely free the block without the parent locked. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inodeChris Mason
btrfs_truncate_inode_items is setup to stop doing btree searches when it has finished removing the items for the inode. It used to detect the end of the inode by looking for an objectid that didn't match the one we were searching for. But, this would result in an extra search through the btree, which adds extra balancing and cow costs to the operation. This commit adds a check to see if we found the inode item, which means we can stop searching early. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: Don't try to compress pages past i_sizeChris Mason
The compression code had some checks to make sure we were only compressing bytes inside of i_size, but it wasn't catching every case. To make things worse, some incorrect math about the number of bytes remaining would make it try to compress more pages than the file really had. The fix used here is to fall back to the non-compression code in this case, which does all the proper cleanup of delalloc and other accounting. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: join the transaction in __btrfs_setxattrJosef Bacik
With selinux on we end up calling __btrfs_setxattr when we create an inode, which calls btrfs_start_transaction(). The problem is we've already called that in btrfs_new_inode, and in btrfs_start_transaction we end up doing a wait_current_trans(). If btrfs-transaction has started committing it will wait for all handles to finish, while the other process is waiting for the transaction to commit. This is fixed by using btrfs_join_transaction, which won't wait for the transaction to commit. Thanks, Signed-off-by: Josef Bacik <jbacik@redhat.com>
2009-02-04Btrfs: Handle SGID bit when creating inodesChris Ball
Before this patch, new files/dirs would ignore the SGID bit on their parent directory and always be owned by the creating user's uid/gid. Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunksChris Mason
Every transaction in btrfs creates a new snapshot, and then schedules the snapshot from the last transaction for deletion. Snapshot deletion works by walking down the btree and dropping the reference counts on each btree block during the walk. If if a given leaf or node has a reference count greater than one, the reference count is decremented and the subtree pointed to by that node is ignored. If the reference count is one, walking continues down into that node or leaf, and the references of everything it points to are decremented. The old code would try to work in small pieces, walking down the tree until it found the lowest leaf or node to free and then returning. This was very friendly to the rest of the FS because it didn't have a huge impact on other operations. But it wouldn't always keep up with the rate that new commits added new snapshots for deletion, and it wasn't very optimal for the extent allocation tree because it wasn't finding leaves that were close together on disk and processing them at the same time. This changes things to walk down to a level 1 node and then process it in bulk. All the leaf pointers are sorted and the leaves are dropped in order based on their extent number. The extent allocation tree and commit code are now fast enough for this kind of bulk processing to work without slowing the rest of the FS down. Overall it does less IO and is better able to keep up with snapshot deletions under high load. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: Change btree locking to use explicit blocking pointsChris Mason
Most of the btrfs metadata operations can be protected by a spinlock, but some operations still need to schedule. So far, btrfs has been using a mutex along with a trylock loop, most of the time it is able to avoid going for the full mutex, so the trylock loop is a big performance gain. This commit is step one for getting rid of the blocking locks entirely. btrfs_tree_lock takes a spinlock, and the code explicitly switches to a blocking lock when it starts an operation that can schedule. We'll be able get rid of the blocking locks in smaller pieces over time. Tracing allows us to find the most common cause of blocking, so we can start with the hot spots first. The basic idea is: btrfs_tree_lock() returns with the spin lock held btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in the extent buffer flags, and then drops the spin lock. The buffer is still considered locked by all of the btrfs code. If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops the spin lock and waits on a wait queue for the blocking bit to go away. Much of the code that needs to set the blocking bit finishes without actually blocking a good percentage of the time. So, an adaptive spin is still used against the blocking bit to avoid very high context switch rates. btrfs_clear_lock_blocking() clears the blocking bit and returns with the spinlock held again. btrfs_tree_unlock() can be called on either blocking or spinning locks, it does the right thing based on the blocking bit. ctree.c has a helper function to set/clear all the locked buffers in a path as blocking. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: hash_lock is no longer neededChris Mason
Before metadata is written to disk, it is updated to reflect that writeout has begun. Once this update is done, the block must be cow'd before it can be modified again. This update was originally synchronized by using a per-fs spinlock. Today the buffers for the metadata blocks are locked before writeout begins, and everyone that tests the flag has the buffer locked as well. So, the per-fs spinlock (called hash_lock for no good reason) is no longer required. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: disable leak debugging checks in extent_io.cChris Mason
extent_io.c has debugging code to report and free leaked extent_state and extent_buffer objects at rmmod time. This helps track down leaks and it saves you from rebooting just to properly remove the kmem_cache object. But, the code runs under a fairly expensive spinlock and the checks to see if it is currently enabled are not entirely consistent. Some use #ifdef and some #if. This changes everything to #if and disables the leak checking. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: sort references by byte number during btrfs_inc_refChris Mason
When a block goes through cow, we update the reference counts of everything that block points to. The internal pointers of the block can be in just about any order, and it is likely to have clusters of things that are close together and clusters of things that are not. To help reduce the seeks that come with updating all of these reference counts, sort them by byte number before actual updates are done. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: async threads should try harder to find workChris Mason
Tracing shows the delay between when an async thread goes to sleep and when more work is added is often very short. This commit adds a little bit of delay and extra checking to the code right before we schedule out. It allows more work to be added to the worker without requiring notifications from other procs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Btrfs: selinux supportJim Owens
Add call to LSM security initialization and save resulting security xattr for new inodes. Add xattr support to symlink inode ops. Set inode->i_op for existing special files. Signed-off-by: jim owens <jowens@hp.com>
2009-02-04Btrfs: make btrfs acls selectableChristian Hesse
This patch adds a menu entry to kconfig to enable acls for btrfs. This allows you to enable FS_POSIX_ACL at kernel compile time. (updated by Jeff Mahoney to make the changes in fs/btrfs/Kconfig instead) Signed-off-by: Christian Hesse <mail@earthworm.de> Signed-off-by: Jeff Mahoney <jeffm@suse.com>
2009-02-04Btrfs: Catch missed bios in the async bio submission threadChris Mason
The async bio submission thread was missing some bios that were added after it had decided there was no work left to do. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2009-02-04Merge branch 'core/xen' into x86/urgentIngo Molnar
2009-02-04Blackfin arch: Remove outdated codeMichael Hennerich
The removed version with the loop registers saved on the stack was originally intended to workaround the missing toolchain support for LoopReg Clobbers. Since our toolchain now supports these there is no point in keeping this workaround. And since we don't touch LoopRegs anymore we're no longer subject for ANOMALY_05000312. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org>
2009-02-04Blackfin arch: Fix udelay implementationMichael Hennerich
Avoid possible overflow during 32*32->32 multiplies. Reported-by: Marco Reppenhagen <marco.reppenhagen@auerswald.de> Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org>
2009-02-04Blackfin arch: Update Copyright informationMichael Hennerich
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org>
2009-02-04Blackfin arch: Add BF561 PPI POLS, POLC MasksMichael Hennerich
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org>