aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2007-10-17unicode diacritics supportSamuel Thibault
There have been issues with non-latin1 diacritics and unicode. http://bugzilla.kernel.org/show_bug.cgi?id=7746 Git 759448f459234bfcf34b82471f0dba77a9aca498 `Kernel utf-8 handling' partly resolved it by adding conversion between diacritics and unicode. The patch below goes further by just turning diacritics into unicode, hence providing better future support. The kbd support can be fetched from http://bugzilla.kernel.org/attachment.cgi?id=12313 This was tested in all of latin1, latin9, latin2 and unicode with french and czech dead keys. Turn the kernel accent_table into unicode, and extend ioctls KDGKBDIACR and KDSKBDIACR into their equivalents KDGKBDIACRUC and KDSKBDIACR. New function int conv_uni_to_8bit(u32 uni) for converting unicode into 8bit _input_. No, we don't want to store the translation, as it is potentially sparse and large. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Jan Engelhardt <jengelh@gmx.de> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Add MMF_DUMP_ELF_HEADERSRoland McGrath
This adds the MMF_DUMP_ELF_HEADERS option to /proc/pid/coredump_filter. This dumps the first page (only) of a private file mapping if it appears to be a mapping of an ELF file. Including these pages in the core dump may give sufficient identifying information to associate the original DSO and executable file images and their debugging information with a core file in a generic way just from its contents (e.g. when those binaries were built with ld --build-id). I expect this to become the default behavior eventually. Existing versions of gdb can be confused by the core dumps it creates, so it won't enabled by default for some time to come. Soon many people will have systems with a gdb that handle these dumps, so they can arrange to set the bit at boot and have it inherited system-wide. This also cleans up the checking of the MMF_DUMP_* flag bits, which did not need to be using atomic macros. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Add linux/elfcore-compat.hRoland McGrath
This adds the linux/elfcore-compat.h header file, which is the CONFIG_COMPAT analog of the linux/elfcore.h header. Each arch that needs to fake out fs/binfmt_elf.c for its compat code can use this header to replace the hand-copied definitions of the compat variants of struct elf_prstatus et al. Only the pr_reg field varies by arch, so asm/{compat,elf}.h must define compat_elf_gregset_t before linux/elfcore-compat.h can be used. It's a clean-up that every arch with compat core dumping code can benefit from. I only touched the ones I have handy to test at home. Doing the same for each other arch should be straightforward, and I'm happy to offer tips. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ufs: move non-layout parts of ufs_fs.h to fs/ufs/Christoph Hellwig
Move prototypes and in-core structures to fs/ufs/ similar to what most other filesystems already do. I made little modifications: move also ufs debug macros and mount options constants into fs/ufs/ufs.h, this stuff also private for ufs. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17printk: add interfaces for external access to the log bufferMike Frysinger
Add two new functions for reading the kernel log buffer. The intention is for them to be used by recovery/dump/debug code so the kernel log can be easily retrieved/parsed in a crash scenario, but they are generic enough for other people to dream up other fun uses. [akpm@linux-foundation.org: buncha fixes] Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Robin Getz <rgetz@blackfin.uclinux.org> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Acked-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Add /sys/module/name/notesRoland McGrath
This patch adds the /sys/module/<name>/notes/ magic directory, which has a file for each allocated SHT_NOTE section that appears in <name>.ko. This is the counterpart for each module of /sys/kernel/notes for vmlinux. Reading this delivers the contents of the module's SHT_NOTE sections. This lets userland easily glean any detailed information about that module's build that was stored there at compile time (e.g. by ld --build-id). Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17core_pattern: ignore RLIMIT_CORE if core_pattern is a pipeNeil Horman
For some time /proc/sys/kernel/core_pattern has been able to set its output destination as a pipe, allowing a user space helper to receive and intellegently process a core. This infrastructure however has some shortcommings which can be enhanced. Specifically: 1) The coredump code in the kernel should ignore RLIMIT_CORE limitation when core_pattern is a pipe, since file system resources are not being consumed in this case, unless the user application wishes to save the core, at which point the app is restricted by usual file system limits and restrictions. 2) The core_pattern code should be able to parse and pass options to the user space helper as an argv array. The real core limit of the uid of the crashing proces should also be passable to the user space helper (since it is overridden to zero when called). 3) Some miscellaneous bugs need to be cleaned up (specifically the recognition of a recursive core dump, should the user mode helper itself crash. Also, the core dump code in the kernel should not wait for the user mode helper to exit, since the same context is responsible for writing to the pipe, and a read of the pipe by the user mode helper will result in a deadlock. This patch: Remove the check of RLIMIT_CORE if core_pattern is a pipe. In the event that core_pattern is a pipe, the entire core will be fed to the user mode helper. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: <martin.pitt@ubuntu.com> Cc: <wwoods@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Add in SunOS 4.1.x compatible mode for UFSMark Fortescue
Add in support for SunOS 4.1.x flavor of BSD 4.2 UFS filing system Macros have been put in to alow suport for the old static table Cylinder Groups but this implementation does not use them yet. This also fixes Solaris UFS filing system access by disabling fast symbolic links as Sun's version of UFS does not support on-disk fast symbolic links. Tested by: Ppartitioning a new disk using SunOS 4.1.1, creating a UFS filing system on one of the partitions and writing some files to the filing system. Using Linux-2.6.22 (patched) to read the files and then write a shed load of files to the UFS partition. Using SunOS 4.1.1 to verify the filing system is OK and to check the files. The test host is a sun4c SS1 Clone. [akpm@linux-foundation.org: coding style fixes] [adobriyan@gmail.com: fix oops] Signed-off-by: Mark Fortescue <mark@mtfhpc.demon.co.uk> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17fs: remove the unused mempages parameterDenis Cheng
Since the mempages parameter is actually not used, they should be removed. Now there is only files_init use the mempages parameter, files_init(mempages); but I don't think the adaptation to mempages in files_init is really useful; and if files_init also changed to the prototype void (*func)(void), the wrapper vfs_caches_init would also not need the mempages parameter. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Remove "unsafe" from module structRusty Russell
Adrian Bunk points out that "unsafe" was used to mark modules touched by the deprecated MOD_INC_USE_COUNT interface, which has long gone. It's time to remove the member from the module structure, as well. If you want a module which can't unload, don't register an exit function. (Vlad Yasevich says SCTP is now safe to unload, so just remove the __unsafe there). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ext4: show all mount optionsMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ext3: show all mount optionsMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17ext2: show all mount optionsMiklos Szeredi
Using mtab is problematic for various reasons, one of them is that unprivileged mounts won't turn up in there. So we want to get rid of it, and use /proc/mounts instead. But most filesystems are lazy, and are not showing all mount options. Which means, that without mtab, the user won't be able to see some or all of the options. It would be nice if the generic code could remember the mount options, and show them without the need to add extra code to filesystems. But this is not easy, because different filesystems handle mount options given options, and not tough the rest. This is not taken into account by mount(8) either, so /etc/mtab will be broken in this case. This series fixes up ->show_options() in ext[234]. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Remove sysctl.h from fs.hAlexey Dobriyan
Rrrr, addition of sysctl.h to fs.h was't very smart, because simple editing of the former will buy you big recompile, where it shouldn't have to. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17unexport asm/shmparam.hOlaf Hering
SHMLBA cant possible be used in userspace, see sparc versions of that header. Do not export asm/shmparam.h during make headers_install_all This removes another uservisible place of PAGE_SIZE Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Driver for the Atmel on-chip SSC on AT32AP and AT91Hans-Christian Egtvedt
The Synchronous Serial Controller (SSC) on Atmel microprocessors are capable of tranceiving many frame based protocols, like I2S. Tested on the AT32AP7000/ATSTK1000. This driver is used in the ALSA sound driver for the AT73C213 external DAC on the ATSTK1000 development board for AVR32. This sound driver will be submitted soon. Hardware documentation can be found in the AT32AP7000 data sheet, which can be downloaded from http://www.atmel.com/dyn/products/datasheets.asp?family_id=682 [akpm@linux-foundation.org: init spinlock at compile time] Signed-off-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: David Brownell <david-b@pacbell.net> Cc: Andrew Victor <andrew@sanpeople.com> Cc: Patrice Vilchez <patrice.vilchez@rfo.atmel.com> Cc: Nicolas Ferre <nicolas.ferre@rfo.atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Force erroneous inclusions of compiler-*.h files to be errorsRobert P. J. Day
Replace worthless comments with actual preprocessor errors when including the wrong versions of the compiler.h files. [akpm@linux-foundation.org: make it work] Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17softlockup: add a /proc tuning parameterRavikiran G Thirumalai
Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_thresh to identify possible softlockups earlier. This patch: 1. Adds a sysctl softlockup_thresh with valid values of 1-60s (Higher value to disable false positives) 2. Changes the softlockup printk to print the cpu softlockup time [akpm@linux-foundation.org: Fix various warnings and add definition of "two"] Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Immunize rcu_dereference() against crazy compiler writersPaul E. McKenney
Turns out that compiler writers are a bit more aggressive about optimizing than one might expect. This patch prevents a number of such optimizations from messing up rcu_deference(). This is not merely a theoretical problem, as evidenced by the rmb() in mce_log(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Josh Triplett <josh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Make unregister_binfmt() return voidAlexey Dobriyan
list_del() hardly can fail, so checking for return value is pointless (and current code always return 0). Nobody really cared that return value anyway. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Use list_head in binfmt handlingAlexey Dobriyan
Switch single-linked binfmt formats list to usual list_head's. This leads to one-liners in register_binfmt() and unregister_binfmt(). The downside is one pointer more in struct linux_binfmt. This is not a problem, since the set of registered binfmts on typical box is very small -- (ELF + something distro enabled for you). Test-booted, played with executable .txt files, modprobe/rmmod binfmt_misc. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17fs/reiserfs/: cleanupsAdrian Bunk
- remove the following no longer used functions: - bitmap.c: reiserfs_claim_blocks_to_be_allocated() - bitmap.c: reiserfs_release_claimed_blocks() - bitmap.c: reiserfs_can_fit_pages() - make the following functions static: - inode.c: restart_transaction() - journal.c: reiserfs_async_progress_wait() Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Vladimir V. Saveliev <vs@namesys.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17mm: test and set zone reclaim lock before starting reclaimDavid Rientjes
Introduces new zone flag interface for testing and setting flags: int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag) Instead of setting and clearing ZONE_RECLAIM_LOCKED each time shrink_zone() is called, this flag is test and set before starting zone reclaim. Zone reclaim starts in __alloc_pages() when a zone's watermark fails and the system is in zone_reclaim_mode. If it's already in reclaim, there's no need to start again so it is simply considered full for that allocation attempt. There is a change of behavior with regard to concurrent zone shrinking. It is now possible for try_to_free_pages() or kswapd to already be shrinking a particular zone when __alloc_pages() starts zone reclaim. In this case, it is possible for two concurrent threads to invoke shrink_zone() for a single zone. This change forbids a zone to be in zone reclaim twice, which was always the behavior, but allows for concurrent try_to_free_pages() or kswapd shrinking when starting zone reclaim. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17oom: add header file to Kbuild as unifdefDavid Rientjes
Preprocess include/linux/oom.h before exporting it to userspace. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17oom: prevent including sched.h in header fileDavid Rientjes
It's not necessary to include all of linux/sched.h in linux/oom.h. Instead, simply include prototypes for the relevant structs and include linux/types.h for gfp_t. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17oom: compare cpuset mems_allowed instead of exclusive ancestorsDavid Rientjes
Instead of testing for overlap in the memory nodes of the the nearest exclusive ancestor of both current and the candidate task, it is better to simply test for intersection between the task's mems_allowed in their task descriptors. This does not require taking callback_mutex since it is only used as a hint in the badness scoring. Tasks that do not have an intersection in their mems_allowed with the current task are not explicitly restricted from being OOM killed because it is quite possible that the candidate task has allocated memory there before and has since changed its mems_allowed. Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17oom: add per-zone lockingDavid Rientjes
OOM killer synchronization should be done with zone granularity so that memory policy and cpuset allocations may have their corresponding zones locked and allow parallel kills for other OOM conditions that may exist elsewhere in the system. DMA allocations can be targeted at the zone level, which would not be possible if locking was done in nodes or globally. Synchronization shall be done with a variation of "trylocks." The goal is to put the current task to sleep and restart the failed allocation attempt later if the trylock fails. Otherwise, the OOM killer is invoked. Each zone in the zonelist that __alloc_pages() was called with is checked for the newly-introduced ZONE_OOM_LOCKED flag. If any zone has this flag present, the "trylock" to serialize the OOM killer fails and returns zero. Otherwise, all the zones have ZONE_OOM_LOCKED set and the try_set_zone_oom() function returns non-zero. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17oom: change all_unreclaimable zone member to flagsDavid Rientjes
Convert the int all_unreclaimable member of struct zone to unsigned long flags. This can now be used to specify several different zone flags such as all_unreclaimable and reclaim_in_progress, which can now be removed and converted to a per-zone flag. Flags are set and cleared as follows: zone_set_flag(struct zone *zone, zone_flags_t flag) zone_clear_flag(struct zone *zone, zone_flags_t flag) Defines the first zone flags, ZONE_ALL_UNRECLAIMABLE and ZONE_RECLAIM_LOCKED, which have the same semantics as the old zone->all_unreclaimable and zone->reclaim_in_progress, respectively. Also converts all current users that set or clear either flag to use the new interface. Helper functions are defined to test the flags: int zone_is_all_unreclaimable(const struct zone *zone) int zone_is_reclaim_locked(const struct zone *zone) All flag operators are of the atomic variety because there are currently readers that are implemented that do not take zone->lock. [akpm@linux-foundation.org: add needed include] Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17oom: move constraints to enumDavid Rientjes
The OOM killer's CONSTRAINT definitions are really more appropriate in an enum, so define them in include/linux/oom.h. Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17oom: move prototypes to appropriate header fileDavid Rientjes
Move the OOM killer's extern function prototypes to include/linux/oom.h and include it where necessary. [clg@fr.ibm.com: build fix] Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Slab API: remove useless ctor parameter and reorder parametersChristoph Lameter
Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void *object, struct kmem_cache *s, unsigned long flags) to ctor(struct kmem_cache *s, void *object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17mm: dirty balancing for tasksPeter Zijlstra
Based on ideas of Andrew: http://marc.info/?l=linux-kernel&m=102912915020543&w=2 Scale the bdi dirty limit inversly with the tasks dirty rate. This makes heavy writers have a lower dirty limit than the occasional writer. Andrea proposed something similar: http://lwn.net/Articles/152277/ The main disadvantage to his patch is that he uses an unrelated quantity to measure time, which leaves him with a workload dependant tunable. Other than that the two approaches appear quite similar. [akpm@linux-foundation.org: fix warning] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17mm: per device dirty thresholdPeter Zijlstra
Scale writeback cache per backing device, proportional to its writeout speed. By decoupling the BDI dirty thresholds a number of problems we currently have will go away, namely: - mutual interference starvation (for any number of BDIs); - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts). It might be that all dirty pages are for a single BDI while other BDIs are idling. By giving each BDI a 'fair' share of the dirty limit, each one can have dirty pages outstanding and make progress. A global threshold also creates a deadlock for stacked BDIs; when A writes to B, and A generates enough dirty pages to get throttled, B will never start writeback until the dirty pages go away. Again, by giving each BDI its own 'independent' dirty limit, this problem is avoided. So the problem is to determine how to distribute the total dirty limit across the BDIs fairly and efficiently. A DBI that has a large dirty limit but does not have any dirty pages outstanding is a waste. What is done is to keep a floating proportion between the DBIs based on writeback completions. This way faster/more active devices get a larger share than slower/idle devices. [akpm@linux-foundation.org: fix warnings] [hugh@veritas.com: Fix occasional hang when a task couldn't get out of balance_dirty_pages] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: floating proportionsPeter Zijlstra
Given a set of objects, floating proportions aims to efficiently give the proportional 'activity' of a single item as compared to the whole set. Where 'activity' is a measure of a temporal property of the items. It is efficient in that it need not inspect any other items of the set in order to provide the answer. It is not even needed to know how many other items there are. It has one parameter, and that is the period of 'time' over which the 'activity' is measured. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17mm: count writeback pages per BDIPeter Zijlstra
Count per BDI writeback pages. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17mm: count reclaimable pages per BDIPeter Zijlstra
Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17mm: scalable bdi statistics countersPeter Zijlstra
Provide scalable per backing_dev_info statistics counters. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17mm: bdi init hooksPeter Zijlstra
provide BDI constructor/destructor hooks [akpm@linux-foundation.org: compile fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter_init_irqPeter Zijlstra
provide a way to tell lockdep about percpu_counters that are supposed to be used from irq safe contexts. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter_init error handlingPeter Zijlstra
alloc_percpu can fail, propagate that error. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_count_sum()Peter Zijlstra
Provide an accurate version of percpu_counter_read. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter_sum_positivePeter Zijlstra
s/percpu_counter_sum/&_positive/ Because its consitent with percpu_counter_read* Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter_setPeter Zijlstra
Provide a method to set a percpu counter to a specified value. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: make percpu_counter_add take s64Peter Zijlstra
percpu_counter is a s64 counter, make _add consitent. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter variable batchPeter Zijlstra
Because the current batch setup has an quadric error bound on the counter, allow for an alternative setup. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter_subPeter Zijlstra
Hugh spotted that some code does: percpu_counter_add(&counter, -unsignedlong) which, when the amount argument is of type s32, sort-of works thanks to two's-complement. However when we'd change the type to s64 this breaks on 32bit machines, because the promotion rules zero extend the unsigned number. Provide percpu_counter_sub() to hide the s64 cast. That is: percpu_counter_sub(&counter, foo) is equal to: percpu_counter_add(&counter, -(s64)foo); Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17lib: percpu_counter_addPeter Zijlstra
s/percpu_counter_mod/percpu_counter_add/ Because its a better name, _mod implies modulo. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17nfs: remove congestion_end()Peter Zijlstra
These patches aim to improve balance_dirty_pages() and directly address three issues: 1) inter device starvation 2) stacked device deadlocks 3) inter process starvation 1 and 2 are a direct result from removing the global dirty limit and using per device dirty limits. By giving each device its own dirty limit is will no longer starve another device, and the cyclic dependancy on the dirty limit is broken. In order to efficiently distribute the dirty limit across the independant devices a floating proportion is used, this will allocate a share of the total limit proportional to the device's recent activity. 3 is done by also scaling the dirty limit proportional to the current task's recent dirty rate. This patch: nfs: remove congestion_end(). It's redundant, clear_bdi_congested() already wakes the waiters. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17powerpc: add Altivec/VMX state to coredumpsMark Nelson
Update dump_task_altivec() (which has so far never been put to use) so that it dumps the Altivec/VMX registers (VR[0] - VR[31], VSCR and VRSAVE) in the same format as the ptrace get_vrregs(), and add the appropriate glue typedef and #defines to make it work. A new note type of NT_PPC_VMX was chosen to be 0x100 (arbitrarily) because it allows the low range values to be used for more generic purposes and 0x100 seems an adequate starting point for PowerPC extensions. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17partially fix up the lookup_one_noperm messChristoph Hellwig
Try to fix the mess created by sysfs braindamage. - refactor code internal to fs/namei.c a little to avoid too much duplication: o __lookup_hash_kern is renamed back to __lookup_hash o the old __lookup_hash goes away, permission checks moves to the two callers o useless inline qualifiers on above functions go away - lookup_one_len_kern loses it's last argument and is renamed to lookup_one_noperm to make it's useage a little more clear - added kerneldoc comments to describe lookup_one_len aswell as lookup_one_noperm and make it very clear that no one should use the latter ever. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>