Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2006-12-12	[PATCH] lockdep: fix seqlock_init()	Ingo Molnar
	seqlock_init() needs to use spin_lock_init() for dynamic locks, so that lockdep is notified about the presence of a new lock. (this is a fallout of the recent networking merge, which started using the so-far unused seqlock_init() API.) This fix solves the following lockdep-internal warning on current -git: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. __lock_acquire+0x10c/0x9f9 lock_acquire+0x56/0x72 _spin_lock+0x35/0x42 neigh_destroy+0x9d/0x12e neigh_periodic_timer+0x10a/0x15c run_timer_softirq+0x126/0x18e __do_softirq+0x6b/0xe6 do_softirq+0x64/0xd2 ksoftirqd+0x82/0x138 Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-12	[PATCH] remove blk_queue_activity_fn	Boaz Harrosh
	While working on bidi support at struct request level I have found that blk_queue_activity_fn is actually never used. The only user is in ide-probe.c with this code: /* enable led activity for disk drives only */ if (drive->media == ide_disk && hwif->led_act) blk_queue_activity_fn(q, hwif->led_act, drive); And led_act is never initialized anywhere. (Looking back at older kernels it was used in the PPC arch, but was removed around 2.6.18) Unless it is all for future use off course. (this patch is against linux-2.6-block.git as off 2006/12/4) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2006-12-11	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds
	* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits) [NETPOLL]: Fix local_bh_enable() warning. [IPVS]: Make ip_vs_sync.c <= 80col wide. [IPVS]: Use msleep_interruptable() instead of ssleep() aka msleep() [HAMRADIO]: Fix baycom_epp.c compile failure. [DCCP]: Whitespace cleanups [DCCP] ccid3: Fixup some type conversions related to rtts [DCCP] ccid3: BUG-FIX - conversion errors [DCCP] ccid3: Reorder packet history source file [DCCP] ccid3: Reorder packet history header file [DCCP] ccid3: Make debug output consistent [DCCP] ccid3: Perform history operations only after packet has been sent [DCCP] ccid3: TX history - remove unused field [DCCP] ccid3: Shift window counter computation [DCCP] ccid3: Sanity-check RTT samples [DCCP] ccid3: Initialise RTT values [DCCP] ccid: Deprecate ccid_hc_tx_insert_options [DCCP]: Warn when discarding packet due to internal errors [DCCP]: Only deliver to the CCID rx side in charge [DCCP]: Simplify TFRC calculation [DCCP]: Debug timeval operations ...
2006-12-11	Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (36 commits) [POWERPC] Generic BUG for powerpc [PPC] Fix compile failure do to introduction of PHY_POLL [POWERPC] Only export __mtdcr/__mfdcr if CONFIG_PPC_DCR is set [POWERPC] Remove old dcr.S [POWERPC] Fix SPU coredump code for max_fdset removal [POWERPC] Fix irq routing on some 32-bit PowerMacs [POWERPC] ps3: Add vuart support [POWERPC] Support ibm,dynamic-reconfiguration-memory nodes [POWERPC] dont allow pSeries_probe to succeed without initialising MMU [POWERPC] micro optimise pSeries_probe [POWERPC] Add SPURR SPR to sysfs [POWERPC] Add DSCR SPR to sysfs [POWERPC] Fix 440SPe CPU table entry [POWERPC] Add support for FP emulation for the e300c2 core [POWERPC] of_device_register: propagate device_create_file return code [POWERPC] Fix mmap of PCI resource with hack for X [POWERPC] iSeries: head_64.o needs to depend on lparmap.s [POWERPC] cbe_thermal: Fix initialization of sysfs attribute_group [POWERPC] Remove QE header files from lite5200.c [POWERPC] of_platform_make_bus_id(): make `magic' int ...
2006-12-11	[DCCP]: Whitespace cleanups	Arnaldo Carvalho de Melo
	That accumulated over the last months hackaton, shame on me for not using git-apply whitespace helping hand, will do that from now on. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11	[DCCP] ccid3: Finer-grained resolution of sending rates	Gerrit Renker
	This patch * resolves a bug where packets smaller than 32/64 bytes resulted in sending rates of 0 * supports all sending rates from 1/64 bytes/second up to 4Gbyte/second * simplifies the present overflow problems in calculations Current sending rate X and the cached value X_recv of the receiver-estimated sending rate are both scaled by 64 (2^6) in order to * cope with low sending rates (minimally 1 byte/second) * allow upgrading to use a packets-per-second implementation of CCID 3 * avoid calculation errors due to integer arithmetic cut-off The patch implements a revised strategy from http://www.mail-archive.com/dccp@vger.kernel.org/msg01040.html The only difference with regard to that strategy is that t_ipi is already used in the calculation of the nofeedback timeout, which saves one division. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
2006-12-11	Make sure we populate the initroot filesystem late enough	Linus Torvalds
	We should not initialize rootfs before all the core initializers have run. So do it as a separate stage just before starting the regular driver initializers. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-11	Make SLES9 "get_kernel_version" work on the kernel binary again	Linus Torvalds
	As reported by Andy Whitcroft, at least the SLES9 initrd build process depends on getting the kernel version from the kernel binary. It does that by simply trawling the binary and looking for the signature of the "linux_banner" string (the string "Linux version " to be exact. Which is really broken in itself, but whatever..) That got broken when the string was changed to allow /proc/version to change the UTS release information dynamically, and "get_kernel_version" thus returned "%s" (see commit a2ee8649ba6d71416712e798276bf7c40b64e6e5: "[PATCH] Fix linux banner utsname information"). This just restores "linux_banner" as a static string, which should fix the version finding. And /proc/version simply uses a different string. To avoid wasting even that miniscule amount of memory, the early boot string should really be marked __initdata, but that just causes the same bug in SLES9 to re-appear, since it will then find other occurrences of "Linux version " first. Cc: Andy Whitcroft <apw@shadowen.org> Acked-by: Herbert Poetzl <herbert@13thfloor.at> Cc: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@osdl.org> Cc: Steve Fox <drfickle@us.ibm.com> Acked-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PPC] Fix compile failure do to introduction of PHY_POLL	Kumar Gala
	PHY_POLL is defined in <linux/phy.h> include it in <linux/fsl_devices.h> so board code will have it defined. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2006-12-10	Merge branch 'master' of ↵	Linus Torvalds
	master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (132 commits) V4L/DVB 4949b: Fix container_of pointer retreival V4L/DVB (4949a): Fix INIT_WORK V4L/DVB (4949): Cxusb: codingstyle cleanups V4L/DVB (4948): Cxusb: Convert tuner functions to use dvb_pll_attach V4L/DVB (4947): Cx88: trivial cleanups V4L/DVB (4946): Cx88: Move cx88_dvb_bus_ctrl out of the card-specific area V4L/DVB (4945): Cx88: consolidate cx22702_config structs V4L/DVB (4944): Cx88: Convert DViCO FusionHDTV Hybrid to use dvb_pll_attach V4L/DVB (4943): Cx88: cleanup dvb_pll_attach for lgdt3302 tuners V4L/DVB (4953): Usbvision minor fixes V4L/DVB (4951): Add version.h, since it is required for VIDIOC_QUERYCAP V4L/DVB (4940): Or51211: Changed SNR and signal strength calculations V4L/DVB (4939): Or51132: Changed SNR and signal strength reporting V4L/DVB (4938): Cx88: Convert lgdt3302 tuning function to use dvb_pll_attach V4L/DVB (4941): Remove LINUX_VERSION_CODE and fix identations V4L/DVB (4942): Whitespace cleanups V4L/DVB (4937): Usbvision cleanup and code reorganization V4L/DVB (4936): Make MT4049FM5 tuner to set FM Gain to Normal V4L/DVB (4935): Added the capability of selecting fm gain by tuner V4L/DVB (4934): Usbvision radio requires GainNormal at e register ...
2006-12-10	[PATCH] kvm: userspace interface	Avi Kivity
	web site: http://kvm.sourceforge.net mailing list: kvm-devel@lists.sourceforge.net (http://lists.sourceforge.net/lists/listinfo/kvm-devel) The following patchset adds a driver for Intel's hardware virtualization extensions to the x86 architecture. The driver adds a character device (/dev/kvm) that exposes the virtualization capabilities to userspace. Using this driver, a process can run a virtual machine (a "guest") in a fully virtualized PC containing its own virtual hard disks, network adapters, and display. Using this driver, one can start multiple virtual machines on a host. Each virtual machine is a process on the host; a virtual cpu is a thread in that process. kill(1), nice(1), top(1) work as expected. In effect, the driver adds a third execution mode to the existing two: we now have kernel mode, user mode, and guest mode. Guest mode has its own address space mapping guest physical memory (which is accessible to user mode by mmap()ing /dev/kvm). Guest mode has no access to any I/O devices; any such access is intercepted and directed to user mode for emulation. The driver supports i386 and x86_64 hosts and guests. All combinations are allowed except x86_64 guest on i386 host. For i386 guests and hosts, both pae and non-pae paging modes are supported. SMP hosts and UP guests are supported. At the moment only Intel hardware is supported, but AMD virtualization support is being worked on. Performance currently is non-stellar due to the naive implementation of the mmu virtualization, which throws away most of the shadow page table entries every context switch. We plan to address this in two ways: - cache shadow page tables across tlb flushes - wait until AMD and Intel release processors with nested page tables Currently a virtual desktop is responsive but consumes a lot of CPU. Under Windows I tried playing pinball and watching a few flash movies; with a recent CPU one can hardly feel the virtualization. Linux/X is slower, probably due to X being in a separate process. In addition to the driver, you need a slightly modified qemu to provide I/O device emulation and the BIOS. Caveats (akpm: might no longer be true): - The Windows install currently bluescreens due to a problem with the virtual APIC. We are working on a fix. A temporary workaround is to use an existing image or install through qemu - Windows 64-bit does not work. That's also true for qemu, so it's probably a problem with the device model. [bero@arklinux.org: build fix] [simon.kagstrom@bth.se: build fix, other fixes] [uril@qumranet.com: KVM: Expose interrupt bitmap] [akpm@osdl.org: i386 build fix] [mingo@elte.hu: i386 fixes] [rdreier@cisco.com: add log levels to all printks] [randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings] [anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support] Signed-off-by: Yaniv Kamay <yaniv@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Simon Kagstrom <simon.kagstrom@bth.se> Cc: Bernhard Rosenkraenzer <bero@arklinux.org> Signed-off-by: Uri Lublin <uril@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Anthony Liguori <anthony@codemonkey.ws> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] clocksource: small cleanup	Daniel Walker
	Mostly changing alignment. Just some general cleanup. [akpm@osdl.org: build fix] Signed-off-by: Daniel Walker <dwalker@mvista.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] round_jiffies infrastructure	Arjan van de Ven
	Introduce a round_jiffies() function as well as a round_jiffies_relative() function. These functions round a jiffies value to the next whole second. The primary purpose of this rounding is to cause all "we don't care exactly when" timers to happen at the same jiffy. This avoids multiple timers firing within the second for no real reason; with dynamic ticks these extra timers cause wakeups from deep sleep CPU sleep states and thus waste power. The exact wakeup moment is skewed by the cpu number, to avoid all cpus from waking up at the exact same time (and hitting the same lock/cachelines there) [akpm@osdl.org: fix variable type] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] fdtable: Implement new pagesize-based fdtable allocator	Vadim Lobanov
	This patch provides an improved fdtable allocation scheme, useful for expanding fdtable file descriptor entries. The main focus is on the fdarray, as its memory usage grows 128 times faster than that of an fdset. The allocation algorithm sizes the fdarray in such a way that its memory usage increases in easy page-sized chunks. The overall algorithm expands the allowed size in powers of two, in order to amortize the cost of invoking vmalloc() for larger allocation sizes. Namely, the following sizes for the fdarray are considered, and the smallest that accommodates the requested fd count is chosen: pagesize / 4 pagesize / 2 pagesize <- memory allocator switch point pagesize * 2 pagesize * 4 ...etc... Unlike the current implementation, this allocation scheme does not require a loop to compute the optimal fdarray size, and can be done in efficient straightline code. Furthermore, since the fdarray overflows the pagesize boundary long before any of the fdsets do, it makes sense to optimize run-time by allocating both fdsets in a single swoop. Even together, they will still be, by far, smaller than the fdarray. The fdtable->open_fds is now used as the anchor for the fdset memory allocation. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] fdtable: Remove the free_files field	Vadim Lobanov
	An fdtable can either be embedded inside a files_struct or standalone (after being expanded). When an fdtable is being discarded after all RCU references to it have expired, we must either free it directly, in the standalone case, or free the files_struct it is contained within, in the embedded case. Currently the free_files field controls this behavior, but we can get rid of it entirely, as all the necessary information is already recorded. We can distinguish embedded and standalone fdtables using max_fds, and if it is embedded we can divine the relevant files_struct using container_of(). Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] fdtable: Make fdarray and fdsets equal in size	Vadim Lobanov
	Currently, each fdtable supports three dynamically-sized arrays of data: the fdarray and two fdsets. The code allows the number of fds supported by the fdarray (fdtable->max_fds) to differ from the number of fds supported by each of the fdsets (fdtable->max_fdset). In practice, it is wasteful for these two sizes to differ: whenever we hit a limit on the smaller-capacity structure, we will reallocate the entire fdtable and all the dynamic arrays within it, so any delta in the memory used by the larger-capacity structure will never be touched at all. Rather than hogging this excess, we shouldn't even allocate it in the first place, and keep the capacities of the fdarray and the fdsets equal. This patch removes fdtable->max_fdset. As an added bonus, most of the supporting code becomes simpler. Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] md: allow reads that have bypassed the cache to be retried on failure	Raz Ben-Jehuda(caro)
	If a bypass-the-cache read fails, we simply try again through the cache. If it fails again it will trigger normal recovery precedures. update 1: From: NeilBrown <neilb@suse.de> 1/ chunk_aligned_read and retry_aligned_read assume that data_disks == raid_disks - 1 which is not true for raid6. So when an aligned read request bypasses the cache, we can get the wrong data. 2/ The cloned bio is being used-after-free in raid5_align_endio (to test BIO_UPTODATE). 3/ We forgot to add rdev->data_offset when submitting a bio for aligned-read 4/ clone_bio calls blk_recount_segments and then we change bi_bdev, so we need to invalidate the segment counts. 5/ We don't de-reference the rdev when the read completes. This means we need to record the rdev to so it is still available in the end_io routine. Fortunately bi_next in the original bio is unused at this point so we can stuff it in there. 6/ We leak a cloned bio if the target rdev is not usable. From: NeilBrown <neilb@suse.de> update 2: 1/ When aligned requests fail (read error) they need to be retried via the normal method (stripe cache). As we cannot be sure that we can process a single read in one go (we may not be able to allocate all the stripes needed) we store a bio-being-retried and a list of bioes-that-still-need-to-be-retried. When find a bio that needs to be retried, we should add it to the list, not to single-bio... 2/ We were never incrementing 'scnt' when resubmitting failed aligned requests. [akpm@osdl.org: build fix] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] ide-cd: Handle strange interrupt on the Intel ESB2	Alan Cox
	The ESB2 appears to emit spurious DMA interrupts when configured for native mode and handling ATAPI devices. Stratus were able to pin this bug down and produce a patch. This is a rework which applies the fixup only to the ESB2 (for now). We can apply it to other chips later if the same problem is found. This code has been tested and confirmed to fix the problem on the tested systems. Signed-off-by: Alan Cox <alan@redhat.com> (Most of the hard work done by Stratus however) Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] sched: remove lb_stopbalance counter	Chen, Kenneth W
	Remove scheduler stats lb_stopbalance counter. This counter can be calculated by: lb_balanced - lb_nobusyg - lb_nobusyq. There is no need to create gazillion counters while we can derive the value. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] sched: decrease number of load balances	Siddha, Suresh B
	Currently at a particular domain, each cpu in the sched group will do a load balance at the frequency of balance_interval. More the cores and threads, more the cpus will be in each sched group at SMP and NUMA domain. And we endup spending quite a bit of time doing load balancing in those domains. Fix this by making only one cpu(first idle cpu or first cpu in the group if all the cpus are busy) in the sched group do the load balance at that particular sched domain and this load will slowly percolate down to the other cpus with in that group(when they do load balancing at lower domains). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] sched: add option to serialize load balancing	Christoph Lameter
	Large sched domains can be very expensive to scan. Add an option SD_SERIALIZE to the sched domain flags. If that flag is set then we make sure that no other such domain is being balanced. [akpm@osdl.org: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] sched: use softirq for load balancing	Christoph Lameter
	Call rebalance_tick (renamed to run_rebalance_domains) from a newly introduced softirq. We calculate the earliest time for each layer of sched domains to be rescanned (this is the rescan time for idle) and use the earliest of those to schedule the softirq via a new field "next_balance" added to struct rq. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] sched domain: increase the SMT busy rebalance interval	Siddha, Suresh B
	With SMT, if the logical processor is busy, load balance happens for every 8msec(min)-16msec(max). There is no need to do this often, as this is just for fairness(to maintain uniform runqueue lengths) and default time slice anyhow is 100msec. Appended patch increases this interval to 64msec(min)-128msec(max) when the logical processor is busy. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] io-accounting: via taskstats	Andrew Morton
	Deliver IO accounting via taskstats. Cc: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: David Wright <daw@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] cleanup taskstats.h	Andrew Morton
	Fix weird whitespace mangling in taskstats.h Cc: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: David Wright <daw@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] io-accounting: core statistics	Andrew Morton
	The present per-task IO accounting isn't very useful. It simply counts the number of bytes passed into read() and write(). So if a process reads 1MB from an already-cached file, it is accused of having performed 1MB of I/O, which is wrong. (David Wright had some comments on the applicability of the present logical IO accounting: For billing purposes it is useless but for workload analysis it is very useful read_bytes/read_calls average read request size write_bytes/write_calls average write request size read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing write_bytes/write_blocks ie logical/physical guess since pdflush writes can be missed I often look for logical larger than physical to see filesystem cache problems. And the bytes/cpusec can help find applications that are dominating the cache and causing slow interactive response from page cache contention. I want to find the IO intensive applications and make sure they are doing efficient IO. Thus the acctcms(sysV) or csacms command would give the high IO commands). This patchset adds new accounting which tries to be more accurate. We account for three things: reads: attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. Done at the submit_bio() level, so it is accurate for block-backed filesystems. I also attempt to wire up NFS and CIFS. writes: attempt to count the number of bytes which this process caused to be sent to the storage layer. This is done at page-dirtying time. The big inaccuracy here is truncate. If a process writes 1MB to a file and then deletes the file, it will in fact perform no writeout. But it will have been accounted as having caused 1MB of write. So... cancelled_writes: account the number of bytes which this process caused to not happen, by truncating pagecache. We _could_ just subtract this from the process's `write' accounting. But that means that some processes would be reported to have done negative amounts of write IO, which is silly. So we just report the raw number and punt this decision up to userspace. Now, we _could_ account for writes at the physical I/O level. But - This would require that we track memory-dirtying tasks at the per-page level (would require a new pointer in struct page). - It would mean that IO statistics for a process are usually only available long after that process has exitted. Which means that we probably cannot communicate this info via taskstats. This patch: Wire up the kernel-private data structures and the accessor functions to manipulate them. Cc: Jay Lan <jlan@sgi.com> Cc: Shailabh Nagar <nagar@watson.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Chris Sturtivant <csturtiv@sgi.com> Cc: Tony Ernst <tee@sgi.com> Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net> Cc: David Wright <daw@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] Fix noise in futex.h	David Woodhouse
	There are some kernel-only bits in the middle of <linux/futex.h> which should be removed in what we export to userspace. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] sysctl: remove unused "context" param	Alexey Dobriyan
	Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Howells <dhowells@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] rtc: Add rtc_merge_alarm()	Scott Wood
	Add rtc_merge_alarm(), which can be used by rtc drivers to turn a partially specified alarm expiry (i.e. most significant fields set to -1, as with the RTC_ALM_SET ioctl()) into a fully specified expiry. If the most significant specified field is earlier than the current time, the least significant unspecified field is incremented. Signed-off-by: Scott Wood <scottwood@freescale.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	[PATCH] freezer.h uses task_struct fields	Randy Dunlap
	freezer.h uses task_struct fields so it should include sched.h. CC [M] fs/jfs/jfs_txnmgr.o In file included from fs/jfs/jfs_txnmgr.c:49: include/linux/freezer.h: In function 'frozen': include/linux/freezer.h:9: error: dereferencing pointer to incomplete type include/linux/freezer.h:9: error: 'PF_FROZEN' undeclared (first use in this function) include/linux/freezer.h:9: error: (Each undeclared identifier is reported only once include/linux/freezer.h:9: error: for each function it appears in.) include/linux/freezer.h: In function 'freezing': include/linux/freezer.h:17: error: dereferencing pointer to incomplete type include/linux/freezer.h:17: error: 'PF_FREEZE' undeclared (first use in this function) include/linux/freezer.h: In function 'freeze': include/linux/freezer.h:26: error: dereferencing pointer to incomplete type include/linux/freezer.h:26: error: 'PF_FREEZE' undeclared (first use in this function) include/linux/freezer.h: In function 'do_not_freeze': include/linux/freezer.h:34: error: dereferencing pointer to incomplete type include/linux/freezer.h:34: error: 'PF_FREEZE' undeclared (first use in this function) include/linux/freezer.h: In function 'thaw_process': include/linux/freezer.h:43: error: dereferencing pointer to incomplete type include/linux/freezer.h:43: error: 'PF_FROZEN' undeclared (first use in this function) include/linux/freezer.h:44: warning: implicit declaration of function 'wake_up_process' include/linux/freezer.h: In function 'frozen_process': include/linux/freezer.h:55: error: dereferencing pointer to incomplete type include/linux/freezer.h:55: error: dereferencing pointer to incomplete type include/linux/freezer.h:55: error: 'PF_FREEZE' undeclared (first use in this function) include/linux/freezer.h:55: error: 'PF_FROZEN' undeclared (first use in this function) fs/jfs/jfs_txnmgr.c: In function 'freezing': include/linux/freezer.h:18: warning: control reaches end of non-void function make[2]: *** [fs/jfs/jfs_txnmgr.o] Error 1 Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10	V4L/DVB (4798): OmniVision OV7670 driver	Jonathan Corbet
	This patch adds a V4L2 driver for the OmniVision OV7670 camera. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2006-12-10	V4L/DVB (4797): Marvell 88ALP01 "cafe" driver	Jonathan Corbet
	A driver for the Marvell M88ALP01 "CAFE" CMOS integrated camera controller. This driver has been renamed "cafe_ccic" since my previous patch set. Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2006-12-10	V4L/DVB (4796): A couple of V4L2 defines needed by Cafe Camara driver	Jonathan Corbet
	Two defines for V4L2, needed by the Cafe camera driver: 1) Add the RGB444 image format 2) Add the "init" internal command which is separate from "reset". Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
2006-12-08	[NETLINK]: Put {IFA,IFLA}_{RTA,PAYLOAD} macros back for userspace.	David S. Miller
	GLIBC uses them etc. They are guarded by ifndef __KERNEL__ so nobody will start accidently using them in the kernel again, it's just for userspace. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08	[XFRM]: Fix XFRMGRP_REPORT to use correct multicast group.	J Hadi Salim
	XFRMGRP_REPORT uses 0x10 which is a group that belongs to events. The correct value is 0x20. We should really be using xfrm_nlgroups going forward; it was tempting to delete the definition of XFRMGRP_REPORT but it would break at least iproute2. Signed-off-by: J Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08	[NET]: Force a cache line split in hh_cache in SMP.	Eric Dumazet
	hh_lock was converted from rwlock to seqlock by Stephen. To have a 100% benefit of this change, I suggest to place read mostly fields of hh_cache in a separate cache line, because hh_refcnt may be changed quite frequently on some busy machines. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08	[NETLINK]: Restore API compatibility of address and neighbour bits	Thomas Graf
	Restore API compatibility due to bits moved from rtnetlink.h to separate headers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08	[NET]: Convert hh_lock to seqlock.	Stephen Hemminger
	The hard header cache is in the main output path, so using seqlock instead of reader/writer lock should reduce overhead. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-08	[PATCH] Generic HID layer - pb_fnmode	Jiri Kosina
	pb_fnmode parameter has to be passed to usbhid, both for compatibility reasons and also because it logically belongs there. Also removes empty hid-input.c file in drivers/usb/input. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08	[PATCH] Generic HID layer - input and event reporting	Jiri Kosina
	hid_input_report() was needlessly USB-specific in USB HID. This patch makes the function independent of HID implementation and fixes all the current users. Bluetooth patches comply with this prototype. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08	[PATCH] Generic HID layer - hiddev	Jiri Kosina
	- hiddev is USB-only (agreed with Marcel Holtmann that Bluetooth currently doesn't need it, and future planned interface (rawhid) will be more flexible and usable) - both HID and USB-hid can be now compiled as modules (wasn't possible before hiddev was fully separated from generic HID layer) Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08	[PATCH] Generic HID layer - USB API	Jiri Kosina
	- 'dev' in struct hid_device changed from struct usb_device to struct device and fixed all the users - renamed functions which are part of USB HID API from 'hid_' to 'usbhid_' - force feedback initialization moved from common part into USB-specific driver - added usbhid.h header for USB HID API users - removed USB-specific fields from struct hid_device and moved them to new usbhid_device, which is pointed to by hid_device->driver_data - fixed all USB users to use this new structure Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08	[PATCH] Generic HID layer - API	Jiri Kosina
	- fixed generic API (added neccessary EXPORT_SYMBOL, fixed hid.h to provide correct prototypes) - extended hid_device with open/close/event function pointers to driver-specific functions - added driver specific driver_data to hid_device Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08	[PATCH] Generic HID layer - code split	Jiri Kosina
	The "big main" split of USB HID code into generic HID code and USB-transport specific HID handling. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2006-12-08	[PATCH] dm: suspend: add noflush pushback	Kiyoshi Ueda
	In device-mapper I/O is sometimes queued within targets for later processing. For example the multipath target can be configured to store I/O when no paths are available instead of returning it -EIO. This patch allows the device-mapper core to instruct a target to transfer the contents of any such in-target queue back into the core. This frees up the resources used by the target so the core can replace that target with an alternative one and then resend the I/O to it. Without this patch the only way to change the target in such circumstances involves returning the I/O with an error back to the filesystem/application. In the multipath case, this patch will let us add new paths for existing I/O to try after all the existing paths have failed. DMF_NOFLUSH_SUSPENDING ---------------------- If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It is always cleared before dm_suspend() returns. The flag must be visible while the target is flushing pending I/Os so it is set before presuspend where the flush starts and unset after the wait for md->pending where the flush ends. Target drivers can check this flag by calling dm_noflush_suspending(). DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE ----------------------------------- A target's map() function can now return DM_MAPIO_REQUEUE to request the device mapper core queue the bio. Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request the same. This has been labelled 'pushback'. The __map_bio() and clone_endio() functions in the core treat these return values as errors and call dec_pending() to end the I/O. dec_pending ----------- dec_pending() saves the pushback request in struct dm_io->error. Once all the split clones have ended, dec_pending() will put the original bio on the md->pushback list. Note that this supercedes any I/O errors. It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while in progress (e.g. by user interrupt). dec_pending() checks for this and returns -EIO if it happened. pushdback list and pushback_lock -------------------------------- The bio is queued on md->pushback temporarily in dec_pending(), and after all pending I/Os return, md->pushback is merged into md->deferred in dm_suspend() for re-issuing at resume time. md->pushback_lock protects md->pushback. The lock should be held with irq disabled because dec_pending() can be called from interrupt context. Queueing bios to md->pushback in dec_pending() must be done atomically with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is held when checking the flag. Otherwise dec_pending() may queue a bio to md->pushback after the interrupted dm_suspend() flushes md->pushback. Then the bio would be left in md->pushback. Flag setting in dm_suspend() can be done without md->pushback_lock because the flag is checked only after presuspend and the set value is already made visible via the target's presuspend function. The flag can be checked without md->pushback_lock (e.g. the first part of the dec_pending() or target drivers), because the flag is checked again with md->pushback_lock held when the bio is really queued to md->pushback as described above. So even if the flag is cleared after the lockless checkings, the bio isn't left in md->pushback but returned to applications with -EIO. Other notes on the current patch -------------------------------- - md->pushback is added to the struct mapped_device instead of using md->deferred directly because md->io_lock which protects md->deferred is rw_semaphore and can't be used in interrupt context like dec_pending(), and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too. - Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG ioctl option is specified, because I/Os generated by lock_fs() would be pushed back and never return if there were no valid devices. - If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING flag is set, md->pushback must be flushed because I/Os may be queued to the list already. (flush_and_out label in dm_suspend()) Test results ------------ I have tested using multipath target with the next patch. The following tests are for regression/compatibility: - I/Os succeed when valid paths exist; - I/Os fail when there are no valid paths and queue_if_no_path is not set; - I/Os are queued in the multipath target when there are no valid paths and queue_if_no_path is set; - The queued I/Os above fail when suspend is issued without the DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also fail. The following tests are for the normal code path of new pushback feature: - Queued I/Os in the multipath target are flushed from the target but don't return when suspend is issued with the DM_NOFLUSH_FLAG ioctl option; - The I/Os above are queued in the multipath target again when resume is issued without path recovery; - The I/Os above succeed when resume is issued after path recovery or table load; - Queued I/Os in the multipath target succeed when resume is issued with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os spanning 2 multipath targets also succeed. The following tests are for the error paths of the new pushback feature: - When the bdget_disk() fails in dm_suspend(), the DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the pushback list are flushed properly. - When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted, o I/Os which had already been queued to the pushback list at the time don't return, and are re-issued at resume time; o I/Os which hadn't been returned at the time return with EIO. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08	[PATCH] dm: ioctl: add noflush suspend	Kiyoshi Ueda
	Provide a dm ioctl option to request noflush suspending. (See next patch for what this is for.) As the interface is extended, the version number is incremented. Other than accepting the new option through the interface, There is no change to existing behaviour. Test results: Confirmed the option is given from user-space correctly. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08	[PATCH] dm: map and endio return code clarification	Kiyoshi Ueda
	Tighten the use of return values from the target map and end_io functions. Values of 2 and above are now explictly reserved for future use. There are no existing targets using such values. The patch has no effect on existing behaviour. o Reserve return values of 2 and above from target map functions. Any positive value currently indicates "mapping complete", but all existing drivers use the value 1. We now make that a requirement so we can assign new meaning to higher values in future. The new definition of return values from target map functions is: < 0 : error = 0 : The target will handle the io (DM_MAPIO_SUBMITTED). = 1 : Mapping completed (DM_MAPIO_REMAPPED). > 1 : Reserved (undefined). Previously this was the same as '= 1'. o Reserve return values of 2 and above from target end_io functions for similar reasons. DM_ENDIO_INCOMPLETE is introduced for a return value of 1. Test results: I have tested by using the multipath target. I/Os succeed when valid paths exist. I/Os are queued in the multipath target when there are no valid paths and queue_if_no_path is set. I/Os fail when there are no valid paths and queue_if_no_path is not set. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08	[PATCH] dm: suspend: parameter change	Kiyoshi Ueda
	Change the interface of dm_suspend() so that we can pass several options without increasing the number of parameters. The existing 'do_lockfs' integer parameter is replaced by a flag DM_SUSPEND_LOCKFS_FLAG. There is no functional change to the code. Test results: I have tested 'dmsetup suspend' command with/without the '--nolockfs' option and confirmed the do_lockfs value is correctly set. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08	[PATCH] fbcmap.c: mark structs const or __read_mostly	Helge Deller
	- Mark the default colormaps read-only, as nobody should be allowed to modify them - Additionally mark color values as __read_mostly since they will only be modified (very seldom) by fb_invert_cmaps() - Add named C99-initializers in fb_cmap structs and use the ARRAY_SIZE() macro Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08	[PATCH] fault-injection: defaults likely to please a new user	Don Mullis
	Assign defaults most likely to please a new user: 1) generate some logging output (verbose=2) 2) avoid injecting failures likely to lock up UI (ignore_gfp_wait=1, ignore_gfp_highmem=1) Signed-off-by: Don Mullis <dwm@meer.net> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>