Kernel - My Linux kernel repository

Age	Commit message (Collapse)	Author
2005-06-21	[PATCH] msync: check pte dirty earlier	Abhijit Karmarkar
	It's common practice to msync a large address range regularly, in which often only a few ptes have actually been dirtied since the previous pass. sync_pte_range then goes much faster if it tests whether pte is dirty before locating and accessing each struct page cacheline; and it is hardly slowed by ptep_clear_flush_dirty repeating that test in the opposite case, when every pte actually is dirty. But beware, s390's pte_dirty always says false, since its dirty bit is kept in the storage key, located via the struct page address. So skip this optimization in its case: use a pte_maybe_dirty macro which just says true if page_test_and_clear_dirty is implemented. Signed-off-by: Abhijit Karmarkar <abhijitk@veritas.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] ia64: pfn_to_nid() implementation	Bob Picco
	pfn_to_nid is undefined. We haven't had this interface on ia64. The sys_mbind patches need it. Oh, the paddr_to_nid call could fail when DISCONTIG+NUMA is configured because there isn't any ACPI SRAT NUMA information. Signed-off-by: Bob Picco <bob.picco@hp.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] SN2 XPC build patches	Jes Sorensen
	This patch contains the bits to make the XPC code use the uncached allocator rather than calling into the mspec driver. It also includes the mspec.h header which is required to build the XPC modules. Signed-off-by: Jes Sorensen <jes@wildopensource.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] ia64 uncached alloc	Jes Sorensen
	This patch contains the ia64 uncached page allocator and the generic allocator (genalloc). The uncached allocator was formerly part of the SN2 mspec driver but there are several other users of it so it has been split off from the driver. The generic allocator can be used by device driver to manage special memory etc. The generic allocator is based on the allocator from the sym53c8xx_2 driver. Various users on ia64 needs uncached memory. The SGI SN architecture requires it for inter-partition communication between partitions within a large NUMA cluster. The specific user for this is the XPC code. Another application is large MPI style applications which use it for synchronization, on SN this can be done using special 'fetchop' operations but it also benefits non SN hardware which may use regular uncached memory for this purpose. Performance of doing this through uncached vs cached memory is pretty substantial. This is handled by the mspec driver which I will push out in a seperate patch. Rather than creating a specific allocator for just uncached memory I came up with genalloc which is a generic purpose allocator that can be used by device drivers and other subsystems as they please. For instance to handle onboard device memory. It was derived from the sym53c7xx_2 driver's allocator which is also an example of a potential user (I am refraining from modifying sym2 right now as it seems to have been under fairly heavy development recently). On ia64 memory has various properties within a granule, ie. it isn't safe to access memory as uncached within the same granule as currently has memory accessed in cached mode. The regular system therefore doesn't utilize memory in the lower granules which is mixed in with device PAL code etc. The uncached driver walks the EFI memmap and pulls out the spill uncached pages and sticks them into the uncached pool. Only after these chunks have been utilized, will it start converting regular cached memory into uncached memory. Hence the reason for the EFI related code additions. Signed-off-by: Jes Sorensen <jes@wildopensource.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] Periodically drain non local pagesets	Christoph Lameter
	The pageset array can potentially acquire a huge amount of memory on large NUMA systems. F.e. on a system with 512 processors and 256 nodes there will be 256*512 pagesets. If each pageset only holds 5 pages then we are talking about 655360 pages.With a 16K page size on IA64 this results in potentially 10 Gigabytes of memory being trapped in pagesets. The typical cases are much less for smaller systems but there is still the potential of memory being trapped in off node pagesets. Off node memory may be rarely used if local memory is available and so we may potentially have memory in seldom used pagesets without this patch. The slab allocator flushes its per cpu caches every 2 seconds. The following patch flushes the off node pageset caches in the same way by tying into the slab flush. The patch also changes /proc/zoneinfo to include the number of pages currently in each pageset. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] __read_page_state(): pass unsigned long instead of unsigned	Benjamin LaHaise
	By making the offset argument of __read_page_state an unsigned long instead of unsigned, we can avoid forcing the compiler to sign extend a usually constant argument. This saves 1 instruction on x86-64. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] __mod_page_state(): pass unsigned long instead of unsigned	Benjamin LaHaise
	By making the offset argument of __mod_page_state an unsigned long instead of unsigned, we can avoid forcing the compiler to sign extend a usually constant argument. This saves 1 instruction on x86-64. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] vm: try_to_free_pages unused argument	Darren Hart
	try_to_free_pages accepts a third argument, order, but hasn't used it since before 2.6.0. The following patch removes the argument and updates all the calls to try_to_free_pages. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] mm: remove PG_highmem	Badari Pulavarty
	Remove PG_highmem, to save a page flag. Use is_highmem() instead. It'll generate a little more code, but we don't use PageHigheMem() in many places. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] Avoiding mmap fragmentation	Wolfgang Wander
	Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: Wolfgang Wander <wwc@rentec.com> Credit-to: "Richard Purdie" <rpurdie@rpsys.net> Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> (partly) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] node local per-cpu-pages	Christoph Lameter
	This patch modifies the way pagesets in struct zone are managed. Each zone has a per-cpu array of pagesets. So any particular CPU has some memory in each zone structure which belongs to itself. Even if that CPU is not local to that zone. So the patch relocates the pagesets for each cpu to the node that is nearest to the cpu instead of allocating the pagesets in the (possibly remote) target zone. This means that the operations to manage pages on remote zone can be done with information available locally. We play a macro trick so that non-NUMA pmachines avoid the additional pointer chase on the page allocator fastpath. AIM7 benchmark on a 32 CPU SGI Altix w/o patches: Tasks jobs/min jti jobs/min/task real cpu 1 484.68 100 484.6769 12.01 1.97 Fri Mar 25 11:01:42 2005 100 27140.46 89 271.4046 21.44 148.71 Fri Mar 25 11:02:04 2005 200 30792.02 82 153.9601 37.80 296.72 Fri Mar 25 11:02:42 2005 300 32209.27 81 107.3642 54.21 451.34 Fri Mar 25 11:03:37 2005 400 34962.83 78 87.4071 66.59 588.97 Fri Mar 25 11:04:44 2005 500 31676.92 75 63.3538 91.87 742.71 Fri Mar 25 11:06:16 2005 600 36032.69 73 60.0545 96.91 885.44 Fri Mar 25 11:07:54 2005 700 35540.43 77 50.7720 114.63 1024.28 Fri Mar 25 11:09:49 2005 800 33906.70 74 42.3834 137.32 1181.65 Fri Mar 25 11:12:06 2005 900 34120.67 73 37.9119 153.51 1325.26 Fri Mar 25 11:14:41 2005 1000 34802.37 74 34.8024 167.23 1465.26 Fri Mar 25 11:17:28 2005 with slab API changes and pageset patch: Tasks jobs/min jti jobs/min/task real cpu 1 485.00 100 485.0000 12.00 1.96 Fri Mar 25 11:46:18 2005 100 28000.96 89 280.0096 20.79 150.45 Fri Mar 25 11:46:39 2005 200 32285.80 79 161.4290 36.05 293.37 Fri Mar 25 11:47:16 2005 300 40424.15 84 134.7472 43.19 438.42 Fri Mar 25 11:47:59 2005 400 39155.01 79 97.8875 59.46 590.05 Fri Mar 25 11:48:59 2005 500 37881.25 82 75.7625 76.82 730.19 Fri Mar 25 11:50:16 2005 600 39083.14 78 65.1386 89.35 872.79 Fri Mar 25 11:51:46 2005 700 38627.83 77 55.1826 105.47 1022.46 Fri Mar 25 11:53:32 2005 800 39631.94 78 49.5399 117.48 1169.94 Fri Mar 25 11:55:30 2005 900 36903.70 79 41.0041 141.94 1310.78 Fri Mar 25 11:57:53 2005 1000 36201.23 77 36.2012 160.77 1458.31 Fri Mar 25 12:00:34 2005 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Shai Fultheim <Shai@Scalex86.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] Hugepage consolidation	David Gibson
	A lot of the code in arch/*/mm/hugetlbpage.c is quite similar. This patch attempts to consolidate a lot of the code across the arch's, putting the combined version in mm/hugetlb.c. There are a couple of uglyish hacks in order to covert all the hugepage archs, but the result is a very large reduction in the total amount of code. It also means things like hugepage lazy allocation could be implemented in one place, instead of six. Tested, at least a little, on ppc64, i386 and x86_64. Notes: - this patch changes the meaning of set_huge_pte() to be more analagous to set_pte() - does SH4 need s special huge_ptep_get_and_clear()?? Acked-by: William Lee Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] VM: rate limit early reclaim	Martin Hicks
	When early zone reclaim is turned on the LRU is scanned more frequently when a zone is low on memory. This limits when the zone reclaim can be called by skipping the scan if another thread (either via kswapd or sync reclaim) is already reclaiming from the zone. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] VM: add __GFP_NORECLAIM	Martin Hicks
	When using the early zone reclaim, it was noticed that allocating new pages that should be spread across the whole system caused eviction of local pages. This adds a new GFP flag to prevent early reclaim from happening during certain allocation attempts. The example that is implemented here is for page cache pages. We want page cache pages to be spread across the whole system, and we don't want page cache pages to evict other pages to get local memory. Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] VM: early zone reclaim	Martin Hicks
	This is the core of the (much simplified) early reclaim. The goal of this patch is to reclaim some easily-freed pages from a zone before falling back onto another zone. One of the major uses of this is NUMA machines. With the default allocator behavior the allocator would look for memory in another zone, which might be off-node, before trying to reclaim from the current zone. This adds a zone tuneable to enable early zone reclaim. It is selected on a per-zone basis and is turned on/off via syscall. Adding some extra throttling on the reclaim was also required (patch 4/4). Without the machine would grind to a crawl when doing a "make -j" kernel build. Even with this patch the System Time is higher on average, but it seems tolerable. Here are some numbers for kernbench runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run: wall user sys %cpu ctx sw. sleeps ---- ---- --- ---- ------ ------ No patch 1009 1384 847 258 298170 504402 w/patch, no reclaim 880 1376 667 288 254064 396745 w/patch & reclaim 1079 1385 926 252 291625 548873 These numbers are the average of 2 runs of 3 "make -j" runs done right after system boot. Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to seee that with reclaim the benchmark still finishes in a reasonable amount of time. I also looked at the NUMA hit/miss stats for the "make -j" runs and the reclaim doesn't make any difference when the machine is thrashing away. Doing a "make -j8" on a single node that is filled with page cache pages takes 700 seconds with reclaim turned on and 735 seconds without reclaim (due to remote memory accesses). The simple zone_reclaim syscall program is at http://www.bork.org/~mort/sgi/zone_reclaim.c Signed-off-by: Martin Hicks <mort@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] smp_processor_id() cleanup	Ingo Molnar
	This patch implements a number of smp_processor_id() cleanup ideas that Arjan van de Ven and I came up with. The previous __smp_processor_id/_smp_processor_id/smp_processor_id API spaghetti was hard to follow both on the implementational and on the usage side. Some of the complexity arose from picking wrong names, some of the complexity comes from the fact that not all architectures defined __smp_processor_id. In the new code, there are two externally visible symbols: - smp_processor_id(): debug variant. - raw_smp_processor_id(): nondebug variant. Replaces all existing uses of _smp_processor_id() and __smp_processor_id(). Defined by every SMP architecture in include/asm-*/smp.h. There is one new internal symbol, dependent on DEBUG_PREEMPT: - debug_smp_processor_id(): internal debug variant, mapped to smp_processor_id(). Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new lib/smp_processor_id.c file. All related comments got updated and/or clarified. I have build/boot tested the following 8 .config combinations on x86: {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT} I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other architectures are untested, but should work just fine.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes	Suresh Siddha
	Appended patch will setup compatibility mode TASK_SIZE properly. This will fix atleast three known bugs that can be encountered while running compatibility mode apps. a) A malicious 32bit app can have an elf section at 0xffffe000. During exec of this app, we will have a memory leak as insert_vm_struct() is not checking for return value in syscall32_setup_pages() and thus not freeing the vma allocated for the vsyscall page. And instead of exec failing (as it has addresses > TASK_SIZE), we were allowing it to succeed previously. b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area may return addresses beyond 32bits, ultimately causing corruption because of wrap-around and resulting in SEGFAULT, instead of returning ENOMEM. c) 32bit app doing this below mmap will now fail. mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ\|PROT_WRITE, MAP_FIXED\|MAP_PRIVATE\|MAP_ANON, 0, 0); Signed-off-by: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	[PATCH] arm: irqs_disabled() type fix	Andrew Morton
	kernel/sched.c: In function `__might_sleep': kernel/sched.c:5461: warning: int format, long unsigned int arg (arg 3) We expect irqs_disabled() to return an int (poor man's bool). Acked-by: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-21	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6	Linus Torvalds

2005-06-21	[SPARC64]: Add prefetch support.	David S. Miller
	The implementation is optimal for UltraSPARC-III and later. It will work, however suboptimally, on UltraSPARC-II and be treated as a NOP on UltraSPARC-I. It is not worth code patching this thing as the highest cost is the code space, and code patching cannot eliminate that. Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21	[NETFILTER]: Kill nf_debug	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21	[NETFILTER]: Kill lockhelp.h	Patrick McHardy
	Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21	[IPV6]: V6 route events reported with wrong netlink PID and seq number	Jamal Hadi Salim
	Essentially netlink at the moment always reports a pid and sequence of 0 always for v6 route activities. To understand the repurcassions of this look at: http://lists.quagga.net/pipermail/quagga-dev/2005-June/003507.html While fixing this, i took the liberty to resolve the outstanding issue of IPV6 routes inserted via ioctls to have the correct pids as well. This patch tries to behave as close as possible to the v4 routes i.e maintains whatever PID the socket issuing the command owns as opposed to the process. That made the patch a little bulky. I have tested against both netlink derived utility to add/del routes as well as ioctl derived one. The Quagga folks have tested against quagga. This fixes the problem and so far hasnt been detected to introduce any new issues. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-21	[NETLINK]: netlink_callback structure needs 5 args not 4	Alexey Kuznetsov
	net/ipv4/tcp_diag.c uses up to ->args[4] Signed-off-by: David S. Miller <davem@davemloft.net>
2005-06-20	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6	Linus Torvalds

2005-06-20	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds

2005-06-20	[PATCH] sysfs-iattr: add sysfs_setattr	Maneesh Soni
	o This adds ->i_op->setattr VFS method for sysfs inodes. The changed attribues are saved in the persistent sysfs_dirent structure as a pointer to struct iattr. The struct iattr is allocated only for those sysfs_dirent's for which default attributes are getting changed. Thanks to Jon Smirl for this suggestion. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] I2C: add i2c sensor_device_attribute and macros	Yani Ioannou
	This patch creates a new header with a potential standard i2c sensor attribute type (which simply includes an int representing the sensor number/index) and the associated macros, SENSOR_DEVICE_ATTR to define a static attribute and to_sensor_dev_attr to get a sensor_device_attribute reference from an embedded device_attribute reference. Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com>
2005-06-20	[PATCH] Driver Core: include: update device attribute callbacks	Yani Ioannou
	Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] Driver core: change device_attribute callbacks	Yani Ioannou
	This patch adds the device_attribute paramerter to the device_attribute store and show sysfs callback functions, and passes a reference to the attribute when the callbacks are called. Signed-off-by: Yani Ioannou <yani.ioannou@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] libfs: add simple attribute files	Arnd Bergmann
	Based on the discussion about spufs attributes, this is my suggestion for a more generic attribute file support that can be used by both debugfs and spufs. Simple attribute files behave similarly to sequential files from a kernel programmers perspective in that a standard set of file operations is provided and only an open operation needs to be written that registers file specific get() and set() functions. These operations are defined as void foo_set(void data, u64 val); and u64 foo_get(void data); where data is the inode->u.generic_ip pointer of the file and the operations just need to make send of that pointer. The infrastructure makes sure this works correctly with concurrent access and partial read calls. A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify using the attributes. This patch already contains the changes for debugfs to use attributes for its internal file operations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] Driver core: unregister_node() for hotplug use	Keiichiro Tokunaga
	This adds a generic function 'unregister_node()'. It is used to remove objects of a node going away for hotplug. All the devices on the node must be unregistered before calling this function. Signed-off-by: Keiichiro Tokunaga <tokunaga.keiich@jp.fujitsu.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> diff -puN drivers/base/node.c~numa_hp_base drivers/base/node.c
2005-06-20	[PATCH] Driver Core: fix bk-driver-core kills ppc64	Patrick Mochel
	There's no check to see if the device is already bound to a driver, which could do bad things. The first thing to go wrong is that it will try to match a driver with a device already bound to one. In some cases (it appears with USB with drivers/usb/core/usb.c::usb_match_id()), some drivers will match a device based on the class type, so it would be common (especially for HID devices) to match a device that is already bound. The fun comes when ->probe() is called, it fails, then driver_probe_device() does this: dev->driver = NULL; Later on, that pointer could be be dereferenced without checking and cause hell to break loose. This problem could be nasty. It's very hardware dependent, since some devices could have a different set of matching qualifiers than others. Now, I don't quite see exactly where/how you were getting that crash. You're dereferencing bad memory, but I'm not sure which pointer was bad and where it came from, but it could have come from a couple of different places. The patch below will hopefully fix it all up for you. It's against 2.6.12-rc2-mm1, and does the following: - Move logic to driver_probe_device() and comments uncommon returns: 1 - If device is bound 0 - If device not bound, and no error error - If there was an error. - Move locking to caller of that function, since we want to lock a device for the entire time we're trying to bind it to a driver (to prevent against a driver being loaded at the same time). - Update __device_attach() and __driver_attach() to do that locking. - Check if device is already bound in __driver_attach() - Update the converse device_release_driver() so it locks the device around all of the operations. - Mark driver_probe_device() as static and remove export. It's an internal function, it should stay that way, and there are no other callers. If there is ever a need to export it, we can audit it as necessary. Signed-off-by: Andrew Morton <akpm@osdl.org>
2005-06-20	[PATCH] Use a klist for device child lists.	mochel@digitalimplant.org
	- Use klist iterator in device_for_each_child(), making it safe to use for removing devices. - Remove unused list_to_dev() function. - Kills all usage of devices_subsys.rwsem. Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] Remove struct device::driver_list.	mochel@digitalimplant.org
	Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] Remove struct device::bus_list.	mochel@digitalimplant.org
	Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] add klist_node_attached() to determine if a node is on a list or not.	mochel@digitalimplant.org
	Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> diff -Nru a/include/linux/klist.h b/include/linux/klist.h
2005-06-20	[PATCH] Remove the unused device_find().	mochel@digitalimplant.org
	Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] Add a klist to struct device_driver for the devices bound to it.	mochel@digitalimplant.org
	- Use it in driver_for_each_device() instead of the regular list_head and stop using the bus's rwsem for protection. - Use driver_for_each_device() in driver_detach() so we don't deadlock on the bus's rwsem. - Remove ->devices. - Move klist access and sysfs link access out from under device's semaphore, since they're synchronized through other means. Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] Add a klist to struct bus_type for its drivers.	mochel@digitalimplant.org
	- Use it in bus_for_each_drv(). - Use the klist spinlock instead of the bus rwsem. Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] Add a klist to struct bus_type for its devices.	mochel@digitalimplant.org
	- Use it for bus_for_each_dev(). - Use the klist spinlock instead of the bus rwsem. Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] Add initial implementation of klist helpers.	mochel@digitalimplant.org
	This klist interface provides a couple of structures that wrap around struct list_head to provide explicit list "head" (struct klist) and list "node" (struct klist_node) objects. For struct klist, a spinlock is included that protects access to the actual list itself. struct klist_node provides a pointer to the klist that owns it and a kref reference count that indicates the number of current users of that node in the list. The entire point is to provide an interface for iterating over a list that is safe and allows for modification of the list during the iteration (e.g. insertion and removal), including modification of the current node on the list. It works using a 3rd object type - struct klist_iter - that is declared and initialized before an iteration. klist_next() is used to acquire the next element in the list. It returns NULL if there are no more items. This klist interface provides a couple of structures that wrap around struct list_head to provide explicit list "head" (struct klist) and list "node" (struct klist_node) objects. For struct klist, a spinlock is included that protects access to the actual list itself. struct klist_node provides a pointer to the klist that owns it and a kref reference count that indicates the number of current users of that node in the list. The entire point is to provide an interface for iterating over a list that is safe and allows for modification of the list during the iteration (e.g. insertion and removal), including modification of the current node on the list. It works using a 3rd object type - struct klist_iter - that is declared and initialized before an iteration. klist_next() is used to acquire the next element in the list. It returns NULL if there are no more items. Internally, that routine takes the klist's lock, decrements the reference count of the previous klist_node and increments the count of the next klist_node. It then drops the lock and returns. There are primitives for adding and removing nodes to/from a klist. When deleting, klist_del() will simply decrement the reference count. Only when the count goes to 0 is the node removed from the list. klist_remove() will try to delete the node from the list and block until it is actually removed. This is useful for objects (like devices) that have been removed from the system and must be freed (but must wait until all accessors have finished). Internally, that routine takes the klist's lock, decrements the reference count of the previous klist_node and increments the count of the next klist_node. It then drops the lock and returns. There are primitives for adding and removing nodes to/from a klist. When deleting, klist_del() will simply decrement the reference count. Only when the count goes to 0 is the node removed from the list. klist_remove() will try to delete the node from the list and block until it is actually removed. This is useful for objects (like devices) that have been removed from the system and must be freed (but must wait until all accessors have finished). Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> diff -Nru a/include/linux/klist.h b/include/linux/klist.h
2005-06-20	[PATCH] Add driver_for_each_device().	mochel@digitalimplant.org
	Now there's an iterator for accessing each device bound to a driver. Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Index: linux-2.6.12-rc2/drivers/base/driver.c ===================================================================
2005-06-20	[PATCH] Add a semaphore to struct device to synchronize calls to its driver.	mochel@digitalimplant.org
	This adds a per-device semaphore that is taken before every call from the core to a driver method. This prevents e.g. simultaneous calls to the ->suspend() or ->resume() and ->probe() or ->release(), potentially saving a whole lot of headaches. It also moves us a step closer to removing the bus rwsem, since it protects the fields in struct device that are modified by the core. Signed-off-by: Patrick Mochel <mochel@digitalimplant.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] class: remove class_simple code, as no one in the tree is using it ↵	gregkh@suse.de
	anymore. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] USB: move the usb hcd code to use the new class code.	gregkh@suse.de
	This moves a kref into the main hcd structure, which detaches it from the class device structure. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] INPUT: move to use the new class code, instead of class_simple	gregkh@suse.de
	Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] CLASS: move a "simple" class logic into the class core.	gregkh@suse.de
	One step on improving the class api so that it can not be used incorrectly. This also fixes the module owner issue with the dev files that happened when the devt logic moved to the class core. Based on a patch originally written by Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] Make attributes names const char *	Dmitry Torokhov
	sysfs: make attributes and attribute_group's names const char * Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-06-20	[PATCH] make driver's name be const char *	Dmitry Torokhov
	Driver core: change driver's, bus's, class's and platform device's names to be const char * so one can use const char *drv_name = "asdfg"; when initializing structures. Also kill couple of whitespaces. Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>