aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2007-05-09Revert "md: improve partition detection in md array"Linus Torvalds
This reverts commit 5b479c91da90eef605f851508744bfe8269591a0. Quoth Neil Brown: "It causes an oops when auto-detecting raid arrays, and it doesn't seem easy to fix. The array may not be 'open' when do_md_run is called, so bdev->bd_disk might be NULL, so bd_set_size can oops. This whole approach of opening an md device before it has been assembled just seems to get more and more painful. I think I'm going to have to come up with something clever to provide both backward comparability with usage expectation, and sane integration into the rest of the kernel." Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6: ide: fix PIO setup on resume for ATAPI devices ide: legacy PCI bus order probing fixes ide: add ide_proc_register_port() ide: add "initializing" argument to ide_register_hw() ide: cable detection fixes (take 2) ide: move IDE settings handling to ide-proc.c ide: split off ioctl handling from IDE settings (v2) ide: make /proc/ide/ optional ide: add ide_tune_dma() helper ide: rework the code for selecting the best DMA transfer mode (v3) ide: fix UDMA/MWDMA/SWDMA masks (v3)
2007-05-10ide: legacy PCI bus order probing fixesBartlomiej Zolnierkiewicz
IDE PCI host drivers should register themselves with IDE core only when IDE driver is built-in, otherwise (IDE driver is modular and thus IDE PCI host drivers are also modular) the code has no effect and just complicates the probing. Fix it by adding new config option CONFIG_IDEPCI_PCIBUS (defined only when needed and invisible to the user) and covering by #ifdef/#endif the code in question. It turned out that "ide=reverse" was silently accepted but did nothing in case when IDE driver was modular, this is fixed now. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-10ide: add ide_proc_register_port()Bartlomiej Zolnierkiewicz
* create_proc_ide_interfaces() tries to add /proc entries for every probed and initialized IDE port, replace it by ide_proc_register_port() which does it only for the given port (also rename destroy_proc_ide_interface() to ide_proc_unregister_port() for consistency) * convert {create,destroy}_proc_ide_interface[s]() users to use new functions * pmac driver depended on proc_ide_create() to add /proc port entries, fix it * au1xxx-ide, swarm and cs5520 drivers depended indirectly on ide-generic driver (CONFIG_IDE_GENERIC=y) to add port /proc entries, fix them * there is now no need to add /proc entries for IDE ports in proc_ide_create() so don't do it * proc_ide_create() needs now to be called before drivers are probed - fix it, while at it make proc_ide_create() create /proc "ide" directory Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-10ide: add "initializing" argument to ide_register_hw()Bartlomiej Zolnierkiewicz
Add "initializing" argument to ide_register_hw() and use it instead of ide.c wide variable of the same name. Update all users of ide_register_hw() accordingly. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-10ide: cable detection fixes (take 2)Bartlomiej Zolnierkiewicz
Tejun's recent eighty_ninty_three() fix has inspired me to do more thorough review of the cable detection code... * print user-friendly warning about limiting the maximum transfer speed to UDMA33 (and the reason behind it) when 80-wire cable is not detected, also while at it cleanup eighty_ninty_three() a bit * use eighty_ninty_three() in ide_ata66_check(), this actually fixes 3 bugs: - bit 14 (word 93 validity check) == 1 && bit 13 (80-wire cable test) == 1 were used as 80-wire cable present test for CONFIG_IDEDMA_IVB=n case (please see FIXME comment in eighty_ninty_three() for more details) - CONFIG_IDEDMA_IVB=y/n cases were interchanged - check for SATA devices was missing * remove private cable warnings from pdc_202xx{old,new} drivers now that core code provides this functionality (plus, in pdc202xx_new case the test could give false warnings for ATAPI devices because pdc202xx_new driver doesn't even support ATAPI DMA) Cc: Tejun Heo <htejun@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-10ide: move IDE settings handling to ide-proc.cBartlomiej Zolnierkiewicz
* move __ide_add_setting() ide_add_setting() __ide_remove_setting() auto_remove_settings() ide_find_setting_by_name() ide_read_setting() ide_write_setting() set_xfer_rate() ide_add_generic_settings() ide_register_subdriver() ide_unregister_subdriver() from ide.c to ide-proc.c * set_{io_32bit,pio_mode,using_dma}() cannot be marked static now, fix it * rename ide_[un]register_subdriver() to ide_proc_[un]register_driver(), update device drivers to use new names * add CONFIG_IDE_PROC_FS=n versions of ide_proc_[un]register_driver() and ide_add_generic_settings() * make ide_find_setting_by_name(), ide_{read,write}_setting() and ide_{add,remove}_proc_entries() static * cover IDE settings code in device drivers with CONFIG_IDE_PROC_FS #ifdef, also while at it cover with CONFIG_IDE_PROC_FS #ifdef ide_driver_t.proc * remove bogus comment from ide.h * cover with CONFIG_IDE_PROC_FS #ifdef .proc and .settings in ide_drive_t Besides saner code this patch results in the IDE core smaller by ~2 kB (on x86-32) and IDE disk driver by ~1 kB (ditto) when CONFIG_IDE_PROC_FS=n. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-10ide: split off ioctl handling from IDE settings (v2)Bartlomiej Zolnierkiewicz
* do write permission and min/max checks in ide_procset_t functions * ide-disk.c: drive->id is always available so cleanup "multcount" setting accordingly * ide-disk.c: "address" setting was incorrectly defined as type TYPE_INTA, fix it by using type TYPE_BYTE and updating ide_drive_t->adressing field, the bug didn't trigger because this IDE setting uses custom ->set function * ide.c: add set_ksettings() for handling HDIO_SET_KEEPSETTINGS ioctl * ide.c: add set_unmaskirq() for handling HDIO_SET_UNMASKINTR ioctl * handle ioctls directly in generic_ide_ioclt() and idedisk_ioctl() instead of using IDE settings to deal with them * remove no longer needed ide_find_setting_by_ioctl() and {read,write}_ioctl fields from ide_settings_t, also remove now unused TYPE_INTA handling v2: * add missing EXPORT_SYMBOL_GPL(ide_setting_sem) needed now for ide-disk Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-10ide: make /proc/ide/ optionalBartlomiej Zolnierkiewicz
All important information/features should be already available through sysfs and ioctl interfaces. Add CONFIG_IDE_PROC_FS (CONFIG_SCSI_PROC_FS rip-off) config option, disabling it makes IDE driver ~5 kB smaller (on x86-32). While at it add CONFIG_PROC_FS=n versions of proc_ide_{create,destroy}() and remove no longer needed #ifdefs. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-10ide: add ide_tune_dma() helperBartlomiej Zolnierkiewicz
After reworking the code responsible for selecting the best DMA transfer mode it is now possible to add generic ide_tune_dma() helper. Convert some IDE PCI host drivers to use it (the ones left need more work). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-10ide: rework the code for selecting the best DMA transfer mode (v3)Bartlomiej Zolnierkiewicz
Depends on the "ide: fix UDMA/MWDMA/SWDMA masks" patch. * add ide_hwif_t.udma_filter hook for filtering UDMA mask (use it in alim15x3, hpt366, siimage and serverworks drivers) * add ide_max_dma_mode() for finding best DMA mode for the device (loosely based on some older libata-core.c code) * convert ide_dma_speed() users to use ide_max_dma_mode() * make ide_rate_filter() take "ide_drive_t *drive" as an argument instead of "u8 mode" and teach it to how to use UDMA mask to do filtering * use ide_rate_filter() in hpt366 driver * remove no longer needed ide_dma_speed() and *_ratemask() * unexport eighty_ninty_three() v2: * rename ->filter_udma_mask to ->udma_filter [ Suggested by Sergei Shtylyov <sshtylyov@ru.mvista.com>. ] v3: * updated for scc_pata driver (fixes XFER_UDMA_6 filtering for user-space originated transfer mode change requests when 100MHz clock is used) Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-10ide: fix UDMA/MWDMA/SWDMA masks (v3)Bartlomiej Zolnierkiewicz
* use 0x00 instead of 0x80 to disable ->{ultra,mwdma,swdma}_mask * add udma_mask field to ide_pci_device_t and use it to initialize ->ultra_mask in aec62xx, cmd64x, pdc202xx_{new,old} and piix drivers * fix UDMA masks to match with chipset specific *_ratemask() (alim15x3, hpt366, serverworks and siimage drivers need UDMA mask filtering method - done in the next patch) v2: * piix: fix cable detection for 82801AA_1 and 82372FB_1 [ Noticed by Sergei Shtylyov <sshtylyov@ru.mvista.com>. ] * cmd64x: use hwif->cds->udma_mask [ Suggested by Sergei Shtylyov <sshtylyov@ru.mvista.com>. ] * aec62xx: fix newly introduced bug - check DMA status not command register [ Noticed by Sergei Shtylyov <sshtylyov@ru.mvista.com>. ] v3: * piix: use hwif->cds->udma_mask [ Suggested by Sergei Shtylyov <sshtylyov@ru.mvista.com>. ] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-05-09Merge git://git.infradead.org/mtd-2.6Linus Torvalds
* git://git.infradead.org/mtd-2.6: (21 commits) [MTD] [CHIPS] Remove MTD_OBSOLETE_CHIPS (jedec, amd_flash, sharp) [MTD] Delete allegedly obsolete "bank_size" field of mtd_info. [MTD] Remove unnecessary user space check from mtd.h. [MTD] [MAPS] Remove flash maps for no longer supported 405LP boards [MTD] [MAPS] Fix missing printk() parameter in physmap_of.c MTD driver [MTD] [NAND] platform NAND driver: add driver [MTD] [NAND] platform NAND driver: update header [JFFS2] Simplify and clean up jffs2_add_tn_to_tree() some more. [JFFS2] Remove another bogus optimisation in jffs2_add_tn_to_tree() [JFFS2] Remove broken insert_point optimisation in jffs2_add_tn_to_tree() [JFFS2] Remember to calculate overlap on nodes which replace older nodes [JFFS2] Don't advance c->wbuf_ofs to next eraseblock after wbuf flush [MTD] [NAND] at91_nand.c: CMDLINE_PARTS support [MTD] [NAND] Tidy up handling of page number in nand_block_bad() [MTD] block2mtd_paramline[] mustn't be __initdata [MTD] [NAND] Support multiple chips in CAFÉ driver [MTD] [NAND] Rename cafe.c to cafe_nand.c and remove the multi-obj magic [MTD] [NAND] Use rslib for CAFÉ ECC [RSLIB] Support non-canonical GF representations [JFFS2] Remove dead file histo_mips.h ...
2007-05-09Merge branch 'master' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: [POWERPC] Further fixes for the removal of 4level-fixup hack from ppc32 [POWERPC] EEH: log all PCI-X and PCI-E AER registers [POWERPC] EEH: capture and log pci state on error [POWERPC] EEH: Split up long error msg [POWERPC] EEH: log error only after driver notification. [POWERPC] fsl_soc: Make mac_addr const in fs_enet_of_init(). [POWERPC] Don't use SLAB/SLUB for PTE pages [POWERPC] Spufs support for 64K LS mappings on 4K kernels [POWERPC] Add ability to 4K kernel to hash in 64K pages [POWERPC] Introduce address space "slices" [POWERPC] Small fixes & cleanups in segment page size demotion [POWERPC] iSeries: Make HVC_ISERIES the default [POWERPC] iSeries: suppress build warning in lparmap.c [POWERPC] Mark pages that don't exist as nosave [POWERPC] swsusp: Introduce register_nosave_region_late
2007-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits) sound: convert "sound" subdirectory to UTF-8 MAINTAINERS: Add cxacru website/mailing list include files: convert "include" subdirectory to UTF-8 general: convert "kernel" subdirectory to UTF-8 documentation: convert the Documentation directory to UTF-8 Convert the toplevel files CREDITS and MAINTAINERS to UTF-8. remove broken URLs from net drivers' output Magic number prefix consistency change to Documentation/magic-number.txt trivial: s/i_sem /i_mutex/ fix file specification in comments drivers/base/platform.c: fix small typo in doc misc doc and kconfig typos Remove obsolete fat_cvf help text Fix occurrences of "the the " Fix minor typoes in kernel/module.c Kconfig: Remove reference to external mqueue library Kconfig: A couple of grammatical fixes in arch/i386/Kconfig Correct comments in genrtc.c to refer to correct /proc file. Fix more "deprecated" spellos. Fix "deprecated" typoes. ... Fix trivial comment conflict in kernel/relay.c.
2007-05-09md: improve partition detection in md arrayNeilBrown
md currently uses ->media_changed to make sure rescan_partitions is call on md array after they are assembled. However that doesn't happen until the array is opened, which is later than some people would like. So use blkdev_ioctl to do the rescan immediately that the array has been assembled. This means we can remove all the ->change infrastructure as it was only used to trigger a partition rescan. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09fbdev: add support for AVR32Haavard Skinnemoen
Provide framebuffer page protection flags and definitions of fb_readl/fb_writel for AVR32. Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09svgalib: move fb_get_caps to svgalibAntonino A. Daplas
Move fb_get_caps() method to svgalib.c as svga_get_caps() so it can be used by s3fb, arkfb and vt8623fb. Signed-off-by: Antonino Daplas <adaplas@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09compiler: introduce __used and __maybe_unusedDavid Rientjes
__used is defined to be __attribute__((unused)) for all pre-3.3 gcc compilers to suppress warnings for unused functions because perhaps they are referenced only in inline assembly. It is defined to be __attribute__((used)) for gcc 3.3 and later so that the code is still emitted for such functions. __maybe_unused is defined to be __attribute__((unused)) for both function and variable use if it could possibly be unreferenced due to the evaluation of preprocessor macros. Function prototypes shall be marked with __maybe_unused if the actual definition of the function is dependant on preprocessor macros. No update to compiler-intel.h is necessary because ICC supports both __attribute__((used)) and __attribute__((unused)) as specified by the gcc manual. __attribute_used__ is deprecated and will be removed once all current code is converted to using __used. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09rename thread_info to stackRoman Zippel
This finally renames the thread_info field in task structure to stack, so that the assumptions about this field are gone and archs have more freedom about placing the thread_info structure. Nonbroken archs which have a proper thread pointer can do the access to both current thread and task structure via a single pointer. It'll allow for a few more cleanups of the fork code, from which e.g. ia64 could benefit. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> [akpm@linux-foundation.org: build fix] Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Andi Kleen <ak@muc.de> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Allow arch to initialize arch field of the module structureRoman Zippel
This will later allow an arch to add module specific information via linker generated tables instead of poking directly in the module object structure. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09clocksource: fix resume logicThomas Gleixner
We need to make sure that the clocksources are resumed, when timekeeping is resumed. The current resume logic does not guarantee this. Add a resume function pointer to the clocksource struct, so clocksource drivers which need to reinitialize the clocksource can provide a resume function. Add a resume function, which calls the maybe available clocksource resume functions and resets the watchdog function, so a stable TSC can be used accross suspend/resume. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Move remote node draining out of slab allocatorsChristoph Lameter
Currently the slab allocators contain callbacks into the page allocator to perform the draining of pagesets on remote nodes. This requires SLUB to have a whole subsystem in order to be compatible with SLAB. Moving node draining out of the slab allocators avoids a section of code in SLUB. Move the node draining so that is is done when the vm statistics are updated. At that point we are already touching all the cachelines with the pagesets of a processor. Add a expire counter there. If we have to update per zone or global vm statistics then assume that the pageset will require subsequent draining. The expire counter will be decremented on each vm stats update pass until it reaches zero. Then we will drain one batch from the pageset. The draining will cause vm counter updates which will then cause another expiration until the pcp is empty. So we will drain a batch every 3 seconds. Note that remote node draining is a somewhat esoteric feature that is required on large NUMA systems because otherwise significant portions of system memory can become trapped in pcp queues. The number of pcp is determined by the number of processors and nodes in a system. A system with 4 processors and 2 nodes has 8 pcps which is okay. But a system with 1024 processors and 512 nodes has 512k pcps with a high potential for large amount of memory being caught in them. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09vmstat: use our own timer eventsChristoph Lameter
vmstat is currently using the cache reaper to periodically bring the statistics up to date. The cache reaper does only exists in SLUB as a way to provide compatibility with SLAB. This patch removes the vmstat calls from the slab allocators and provides its own handling. The advantage is also that we can use a different frequency for the updates. Refreshing vm stats is a pretty fast job so we can run this every second and stagger this by only one tick. This will lead to some overlap in large systems. F.e a system running at 250 HZ with 1024 processors will have 4 vm updates occurring at once. However, the vm stats update only accesses per node information. It is only necessary to stagger the vm statistics updates per processor in each node. Vm counter updates occurring on distant nodes will not cause cacheline contention. We could implement an alternate approach that runs the first processor on each node at the second and then each of the other processor on a node on a subsequent tick. That may be useful to keep a large amount of the second free of timer activity. Maybe the timer folks will have some feedback on this one? [jirislaby@gmail.com: add missing break] Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Add suspend-related notifications for CPU hotplugRafael J. Wysocki
Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09fs: deprecate memclear_highpage_flushNate Diller
Now that all the in-tree users are converted over to zero_user_page(), deprecate the old memclear_highpage_flush() call. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09fs: convert core functions to zero_user_pageNate Diller
It's very common for file systems to need to zero part or all of a page, the simplist way is just to use kmap_atomic() and memset(). There's actually a library function in include/linux/highmem.h that does exactly that, but it's confusingly named memclear_highpage_flush(), which is descriptive of *how* it does the work rather than what the *purpose* is. So this patchset renames the function to zero_user_page(), and calls it from the various places that currently open code it. This first patch introduces the new function call, and converts all the core kernel callsites, both the open-coded ones and the old memclear_highpage_flush() ones. Following this patch is a series of conversions for each file system individually, per AKPM, and finally a patch deprecating the old call. The diffstat below shows the entire patchset. [akpm@linux-foundation.org: fix a few things] Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09FUTEX: new PRIVATE futexesEric Dumazet
Analysis of current linux futex code : -------------------------------------- A central hash table futex_queues[] holds all contexts (futex_q) of waiting threads. Each futex_wait()/futex_wait() has to obtain a spinlock on a hash slot to perform lookups or insert/deletion of a futex_q. When a futex_wait() is done, calling thread has to : 1) - Obtain a read lock on mmap_sem to be able to validate the user pointer (calling find_vma()). This validation tells us if the futex uses an inode based store (mapped file), or mm based store (anonymous mem) 2) - compute a hash key 3) - Atomic increment of reference counter on an inode or a mm_struct 4) - lock part of futex_queues[] hash table 5) - perform the test on value of futex. (rollback is value != expected_value, returns EWOULDBLOCK) (various loops if test triggers mm faults) 6) queue the context into hash table, release the lock got in 4) 7) - release the read_lock on mmap_sem <block> 8) Eventually unqueue the context (but rarely, as this part  may be done by the futex_wake()) Futexes were designed to improve scalability but current implementation has various problems : - Central hashtable : This means scalability problems if many processes/threads want to use futexes at the same time. This means NUMA unbalance because this hashtable is located on one node. - Using mmap_sem on every futex() syscall : Even if mmap_sem is a rw_semaphore, up_read()/down_read() are doing atomic ops on mmap_sem, dirtying cache line : - lot of cache line ping pongs on SMP configurations. mmap_sem is also extensively used by mm code (page faults, mmap()/munmap()) Highly threaded processes might suffer from mmap_sem contention. mmap_sem is also used by oprofile code. Enabling oprofile hurts threaded programs because of contention on the mmap_sem cache line. - Using an atomic_inc()/atomic_dec() on inode ref counter or mm ref counter: It's also a cache line ping pong on SMP. It also increases mmap_sem hold time because of cache misses. Most of these scalability problems come from the fact that futexes are in one global namespace. As we use a central hash table, we must make sure they are all using the same reference (given by the mm subsystem). We chose to force all futexes be 'shared'. This has a cost. But fact is POSIX defined PRIVATE and SHARED, allowing clear separation, and optimal performance if carefuly implemented. Time has come for linux to have better threading performance. The goal is to permit new futex commands to avoid : - Taking the mmap_sem semaphore, conflicting with other subsystems. - Modifying a ref_count on mm or an inode, still conflicting with mm or fs. This is possible because, for one process using PTHREAD_PROCESS_PRIVATE futexes, we only need to distinguish futexes by their virtual address, no matter the underlying mm storage is. If glibc wants to exploit this new infrastructure, it should use new _PRIVATE futex subcommands for PTHREAD_PROCESS_PRIVATE futexes. And be prepared to fallback on old subcommands for old kernels. Using one global variable with the FUTEX_PRIVATE_FLAG or 0 value should be OK. PTHREAD_PROCESS_SHARED futexes should still use the old subcommands. Compatibility with old applications is preserved, they still hit the scalability problems, but new applications can fly :) Note : the same SHARED futex (mapped on a file) can be used by old binaries *and* new binaries, because both binaries will use the old subcommands. Note : Vast majority of futexes should be using PROCESS_PRIVATE semantic, as this is the default semantic. Almost all applications should benefit of this changes (new kernel and updated libc) Some bench results on a Pentium M 1.6 GHz (SMP kernel on a UP machine) /* calling futex_wait(addr, value) with value != *addr */ 433 cycles per futex(FUTEX_WAIT) call (mixing 2 futexes) 424 cycles per futex(FUTEX_WAIT) call (using one futex) 334 cycles per futex(FUTEX_WAIT_PRIVATE) call (mixing 2 futexes) 334 cycles per futex(FUTEX_WAIT_PRIVATE) call (using one futex) For reference : 187 cycles per getppid() call 188 cycles per umask() call 181 cycles per ni_syscall() call Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Pierre Peiffer <pierre.peiffer@bull.net> Cc: "Ulrich Drepper" <drepper@gmail.com> Cc: "Nick Piggin" <nickpiggin@yahoo.com.au> Cc: "Ingo Molnar" <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09futex_requeue_pi optimizationPierre Peiffer
This patch provides the futex_requeue_pi functionality, which allows some threads waiting on a normal futex to be requeued on the wait-queue of a PI-futex. This provides an optimization, already used for (normal) futexes, to be used with the PI-futexes. This optimization is currently used by the glibc in pthread_broadcast, when using "normal" mutexes. With futex_requeue_pi, it can be used with PRIO_INHERIT mutexes too. Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Make futex_wait() use an hrtimer for timeoutPierre Peiffer
This patch modifies futex_wait() to use an hrtimer + schedule() in place of schedule_timeout(). schedule_timeout() is tick based, therefore the timeout granularity is the tick (1 ms, 4 ms or 10 ms depending on HZ). By using a high resolution timer for timeout wakeup, we can attain a much finer timeout granularity (in the microsecond range). This parallels what is already done for futex_lock_pi(). The timeout passed to the syscall is no longer converted to jiffies and is therefore passed to do_futex() and futex_wait() as an absolute ktime_t therefore keeping nanosecond resolution. Also this removes the need to pass the nanoseconds timeout part to futex_lock_pi() in val2. In futex_wait(), if there is no timeout then a regular schedule() is performed. Otherwise, an hrtimer is fired before schedule() is called. [akpm@linux-foundation.org: fix `make headers_check'] Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Signed-off-by: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09declare struct ktimeAndrew Morton
Some smarty went and inflicted ktime_t as a typedef upon us, so we cannot forward declare it. Create a new `union ktime', map ktime_t onto that. Now we need to kill off this ktime_t thing. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09aio is unlikelyAndrew Morton
Stick an unlikely() around is_aio(): I assert that most IO is synchronous. Cc: Suparna Bhattacharya <suparna@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09RPC: add wrapper for svc_reserve to account for checksumJeff Layton
When the kernel calls svc_reserve to downsize the expected size of an RPC reply, it fails to account for the possibility of a checksum at the end of the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O then you'll generally see messages similar to this in the server's ring buffer: RPC request reserved 164 but used 208 While I was never able to verify it, I suspect that this problem is also the root cause of some oopses I've seen under these conditions: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726 This is probably also a problem for other sec= types and for NFSv4. The large reserved size for NFSv4 compound packets seems to generally paper over the problem, however. This patch adds a wrapper for svc_reserve that accounts for the possibility of a checksum. It also fixes up the appropriate callers of svc_reserve to call the wrapper. For now, it just uses a hardcoded value that I determined via testing. That value may need to be revised upward as things change, or we may want to eventually add a new auth_op that attempts to calculate this somehow. Unfortunately, there doesn't seem to be a good way to reliably determine the expected checksum length prior to actually calculating it, particularly with schemes like spkm3. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09knfsd: rename sk_defer_lock to sk_lockNeilBrown
Now that sk_defer_lock protects two different things, make the name more generic. Also don't bother with disabling _bh as the lock is only ever taken from process context. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09remove nfs4_acl_add_ace()Adrian Bunk
nfs4_acl_add_ace() can now be removed. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Neil Brown <neilb@cse.unsw.edu.au> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09change kernel threads to ignore signals instead of blocking themOleg Nesterov
Currently kernel threads use sigprocmask(SIG_BLOCK) to protect against signals. This doesn't prevent the signal delivery, this only blocks signal_wake_up(). Every "killall -33 kthreadd" means a "struct siginfo" leak. Change kthreadd_setup() to set all handlers to SIG_IGN instead of blocking them (make a new helper ignore_signals() for that). If the kernel thread needs some signal, it should use allow_signal() anyway, and in that case it should not use CLONE_SIGHAND. Note that we can't change daemonize() (should die!) in the same way, because it can be used along with CLONE_SIGHAND. This means that allow_signal() still should unblock the signal to work correctly with daemonize()ed threads. However, disallow_signal() doesn't block the signal any longer but ignores it. NOTE: with or without this patch the kernel threads are not protected from handle_stop_signal(), this seems harmless, but not good. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09kthread: don't depend on work queuesEric W. Biederman
Currently there is a circular reference between work queue initialization and kthread initialization. This prevents the kthread infrastructure from initializing until after work queues have been initialized. We want the properties of tasks created with kthread_create to be as close as possible to the init_task and to not be contaminated by user processes. The later we start our kthreadd that creates these tasks the harder it is to avoid contamination from user processes and the more of a mess we have to clean up because the defaults have changed on us. So this patch modifies the kthread support to not use work queues but to instead use a simple list of structures, and to have kthreadd start from init_task immediately after our kernel thread that execs /sbin/init. By being a true child of init_task we only have to change those process settings that we want to have different from init_task, such as our process name, the cpus that are allowed, blocking all signals and setting SIGCHLD to SIG_IGN so that all of our children are reaped automatically. By being a true child of init_task we also naturally get our ppid set to 0 and do not wind up as a child of PID == 1. Ensuring that tasks generated by kthread_create will not slow down the functioning of the wait family of functions. [akpm@linux-foundation.org: use interruptible sleeps] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09unify flush_work/flush_work_keventd and rename it to cancel_work_syncOleg Nesterov
flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09workqueue: kill NOAUTOREL worksOleg Nesterov
We don't have any users, and it is not so trivial to use NOAUTOREL works correctly. It is better to simplify API. Delete NOAUTOREL support and rename work_release to work_clear_pending to avoid a confusion. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09make cancel_rearming_delayed_work() work on any workqueue, not just keventd_wqOleg Nesterov
cancel_rearming_delayed_workqueue(wq, dwork) doesn't need the first parameter. We don't hang on un-queued dwork any longer, and work->data doesn't change its type. This means we can always figure out "wq" from dwork when it is needed. Remove this parameter, and rename the function to cancel_rearming_delayed_work(). Re-create an inline "obsolete" cancel_rearming_delayed_workqueue(wq) which just calls cancel_rearming_delayed_work(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09workqueue: kill run_scheduled_work()Oleg Nesterov
Because it has no callers. Actually, I think the whole idea of run_scheduled_work() was not right, not good to mix "unqueue this work and execute its ->func()" in one function. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Define and use new events,CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASEGautham R Shenoy
This is an attempt to provide an alternate mechanism for postponing a hotplug event instead of using a global mechanism like lock_cpu_hotplug. The proposal is to add two new events namely CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE. The notification for these two events would be sent out before and after a cpu_hotplug event respectively. During the CPU_LOCK_ACQUIRE event, a cpu-hotplug-aware subsystem is supposed to acquire any per-subsystem hotcpu mutex ( Eg. workqueue_mutex in kernel/workqueue.c ). During the CPU_LOCK_RELEASE release event the cpu-hotplug-aware subsystem is supposed to release the per-subsystem hotcpu mutex. The reasons for defining new events as opposed to reusing the existing events like CPU_UP_PREPARE/CPU_UP_FAILED/CPU_ONLINE for locking/unlocking of per-subsystem hotcpu mutexes are as follow: - CPU_LOCK_ACQUIRE: All hotcpu mutexes are taken before subsystems start handling pre-hotplug events like CPU_UP_PREPARE/CPU_DOWN_PREPARE etc, thus ensuring a clean handling of these events. - CPU_LOCK_RELEASE: The hotcpu mutexes will be released only after all subsystems have handled post-hotplug events like CPU_DOWN_FAILED, CPU_DEAD,CPU_ONLINE etc thereby ensuring that there are no subsequent clashes amongst the interdependent subsystems after a cpu hotplugs. This patch also uses __raw_notifier_call chain in _cpu_up to take care of the dependency between the two consequetive calls to raw_notifier_call_chain. [akpm@linux-foundation.org: fix a bug] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Extend notifier_call_chain to count nr_calls madeGautham R Shenoy
Since 2.6.18-something, the community has been bugged by the problem to provide a clean and a stable mechanism to postpone a cpu-hotplug event as lock_cpu_hotplug was badly broken. This is another proposal towards solving that problem. This one is along the lines of the solution provided in kernel/workqueue.c Instead of having a global mechanism like lock_cpu_hotplug, we allow the subsytems to define their own per-subsystem hot cpu mutexes. These would be taken(released) where ever we are currently calling lock_cpu_hotplug(unlock_cpu_hotplug). Also, in the per-subsystem hotcpu callback function,we take this mutex before we handle any pre-cpu-hotplug events and release it once we finish handling the post-cpu-hotplug events. A standard means for doing this has been provided in [PATCH 2/4] and demonstrated in [PATCH 3/4]. The ordering of these per-subsystem mutexes might still prove to be a problem, but hopefully lockdep should help us get out of that muddle. The patch set to be applied against linux-2.6.19-rc5 is as follows: [PATCH 1/4] : Extend notifier_call_chain with an option to specify the number of notifications to be sent and also count the number of notifications actually sent. [PATCH 2/4] : Define events CPU_LOCK_ACQUIRE and CPU_LOCK_RELEASE and send out notifications for these in _cpu_up and _cpu_down. This would help us standardise the acquire and release of the subsystem locks in the hotcpu callback functions of these subsystems. [PATCH 3/4] : Eliminate lock_cpu_hotplug from kernel/sched.c. [PATCH 4/4] : In workqueue_cpu_callback function, acquire(release) the workqueue_mutex while handling CPU_LOCK_ACQUIRE(CPU_LOCK_RELEASE). If the per-subsystem-locking approach survives the test of time, we can expect a slow phasing out of lock_cpu_hotplug, which has not yet been eliminated in these patches :) This patch: Provide notifier_call_chain with an option to call only a specified number of notifiers and also record the number of call to notifiers made. The need for this enhancement was identified in the post entitled "Slab - Eliminate lock_cpu_hotplug from slab" (http://lkml.org/lkml/2006/10/28/92) by Ravikiran G Thirumalai and Andrew Morton. This patch adds two additional parameters to notifier_call_chain API namely - int nr_to_calls : Number of notifier_functions to be called. The don't care value is -1. - unsigned int *nr_calls : Records the total number of notifier_funtions called by notifier_call_chain. The don't care value is NULL. [michal.k.k.piotrowski@gmail.com: build fix] Credit: Andrew Morton <akpm@osdl.org> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Michal Piotrowski <michal.k.k.piotrowski@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09relay: use plain timer instead of delayed workTom Zanussi
relay doesn't need to use schedule_delayed_work() for waking readers when a simple timer will do. Signed-off-by: Tom Zanussi <zanussi@comcast.net> Cc: Satyam Sharma <satyam.sharma@gmail.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09kblockd: use flush_workAndrew Morton
Switch the kblockd flushing from a global flush to a more specific flush_work(). (akpm: bypassed maintainers, sorry. There are other patches which depend on this) Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <axboe@suse.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09implement flush_work()Oleg Nesterov
A basic problem with flush_scheduled_work() is that it blocks behind _all_ presently-queued works, rather than just the work whcih the caller wants to flush. If the caller holds some lock, and if one of the queued work happens to want that lock as well then accidental deadlocks can occur. One example of this is the phy layer: it wants to flush work while holding rtnl_lock(). But if a linkwatch event happens to be queued, the phy code will deadlock because the linkwatch callback function takes rtnl_lock. So we implement a new function which will flush a *single* work - just the one which the caller wants to free up. Thus we avoid the accidental deadlocks which can arise from unrelated subsystems' callbacks taking shared locks. flush_work() non-blockingly dequeues the work_struct which we want to kill, then it waits for its handler to complete on all CPUs. Add ->current_work to the "struct cpu_workqueue_struct", it points to currently running "struct work_struct". When flush_work(work) detects ->current_work == work, it inserts a barrier at the _head_ of ->worklist (and thus right _after_ that work) and waits for completition. This means that the next work fired on that CPU will be this barrier, or another barrier queued by concurrent flush_work(), so the caller of flush_work() will be woken before any "regular" work has a chance to run. When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect against CPU hotplug), CPU may go away. But in that case take_over_work() will move a barrier we queued to another CPU, it will be fired sometime, and wait_on_work() will be woken. Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before take_over_work(), so cwq->thread should complete its ->worklist (and thus the barrier), because currently we don't check kthread_should_stop() in run_workqueue(). But even if we did, everything should be ok. [akpm@osdl.org: cleanup] [akpm@osdl.org: add flush_work_keventd() wrapper] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09mutex_lock_interruptible(): add __must_checkAndrew Morton
It's not sane to use mutex_lock_interruptible() and to then ignore the result. Ditto down_interruptible(), but I'm lazy. Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Move sig_kernel_* et al macros to linux/signal.hRoland McGrath
This patch moves the sig_kernel_* and related macros from kernel/signal.c to linux/signal.h, and cleans them up slightly. I need the sig_kernel_* macros for default signal behavior in the utrace code, and want to avoid duplication or overhead to share the knowledge. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09mca: add integrated device bus matchingJames Bottomley
The MCA bus has a few "integrated" functions, which are effectively virtual slots on the bus. The problem is that these special functions don't have dedicated pos IDs, so we have to manufacture ids for them outside the pos space ... and these ids can't be matched by the standard matching function, so add a special registration that requests a list of pos ids or a particular integrated function. Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09Remove hardcoding of hard_smp_processor_id on UP systemsFernando Luis Vazquez Cao
With the advent of kdump, the assumption that the boot CPU when booting an UP kernel is always the CPU with a particular hardware ID (often 0) (usually referred to as BSP on some architectures) is not valid anymore. The reason being that the dump capture kernel boots on the crashed CPU (the CPU that invoked crash_kexec), which may be or may not be that particular CPU. Move definition of hard_smp_processor_id for the UP case to architecture-specific code ("asm/smp.h") where it belongs, so that each architecture can provide its own implementation. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Acked-by: Andi Kleen <ak@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>