diff options
Diffstat (limited to 'Documentation')
59 files changed, 2472 insertions, 1302 deletions
diff --git a/Documentation/ABI/removed/raw1394_legacy_isochronous b/Documentation/ABI/removed/raw1394_legacy_isochronous new file mode 100644 index 00000000000..1b629622d88 --- /dev/null +++ b/Documentation/ABI/removed/raw1394_legacy_isochronous @@ -0,0 +1,16 @@ +What: legacy isochronous ABI of raw1394 (1st generation iso ABI) +Date: June 2007 (scheduled), removed in kernel v2.6.23 +Contact: linux1394-devel@lists.sourceforge.net +Description: + The two request types RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN have + been deprecated for quite some time. They are very inefficient as they + come with high interrupt load and several layers of callbacks for each + packet. Because of these deficiencies, the video1394 and dv1394 drivers + and the 3rd-generation isochronous ABI in raw1394 (rawiso) were created. + +Users: + libraw1394 users via the long deprecated API raw1394_iso_write, + raw1394_start_iso_write, raw1394_start_iso_rcv, raw1394_stop_iso_rcv + + libdc1394, which optionally uses these old libraw1394 calls + alternatively to the more efficient video1394 ABI diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb index f9937add033..9734577d171 100644 --- a/Documentation/ABI/testing/sysfs-bus-usb +++ b/Documentation/ABI/testing/sysfs-bus-usb @@ -39,3 +39,16 @@ Description: If you want to suspend a device immediately but leave it free to wake up in response to I/O requests, you should write "0" to power/autosuspend. + +What: /sys/bus/usb/devices/.../power/persist +Date: May 2007 +KernelVersion: 2.6.23 +Contact: Alan Stern <stern@rowland.harvard.edu> +Description: + If CONFIG_USB_PERSIST is set, then each USB device directory + will contain a file named power/persist. The file holds a + boolean value (0 or 1) indicating whether or not the + "USB-Persist" facility is enabled for the device. Since the + facility is inherently dangerous, it is disabled by default + for all devices except hubs. For more information, see + Documentation/usb/persist.txt. diff --git a/Documentation/BUG-HUNTING b/Documentation/BUG-HUNTING index 65b97e1dbf7..35f5bd24333 100644 --- a/Documentation/BUG-HUNTING +++ b/Documentation/BUG-HUNTING @@ -191,6 +191,30 @@ e.g. crash dump output as shown by Dave Miller. > mov 0x8(%ebp), %ebx ! %ebx = skb->sk > mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt +In addition, you can use GDB to figure out the exact file and line +number of the OOPS from the vmlinux file. If you have +CONFIG_DEBUG_INFO enabled, you can simply copy the EIP value from the +OOPS: + + EIP: 0060:[<c021e50e>] Not tainted VLI + +And use GDB to translate that to human-readable form: + + gdb vmlinux + (gdb) l *0xc021e50e + +If you don't have CONFIG_DEBUG_INFO enabled, you use the function +offset from the OOPS: + + EIP is at vt_ioctl+0xda8/0x1482 + +And recompile the kernel with CONFIG_DEBUG_INFO enabled: + + make vmlinux + gdb vmlinux + (gdb) p vt_ioctl + (gdb) l *(0x<address of vt_ioctl> + 0xda8) + Another very useful option of the Kernel Hacking section in menuconfig is Debug memory allocations. This will help you see whether data has been initialised and not set before use etc. To see the values that get assigned diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt index 028614cdd06..e07f2530326 100644 --- a/Documentation/DMA-mapping.txt +++ b/Documentation/DMA-mapping.txt @@ -664,109 +664,6 @@ It is that simple. Well, not for some odd devices. See the next section for information about that. - DAC Addressing for Address Space Hungry Devices - -There exists a class of devices which do not mesh well with the PCI -DMA mapping API. By definition these "mappings" are a finite -resource. The number of total available mappings per bus is platform -specific, but there will always be a reasonable amount. - -What is "reasonable"? Reasonable means that networking and block I/O -devices need not worry about using too many mappings. - -As an example of a problematic device, consider compute cluster cards. -They can potentially need to access gigabytes of memory at once via -DMA. Dynamic mappings are unsuitable for this kind of access pattern. - -To this end we've provided a small API by which a device driver -may use DAC cycles to directly address all of physical memory. -Not all platforms support this, but most do. It is easy to determine -whether the platform will work properly at probe time. - -First, understand that there may be a SEVERE performance penalty for -using these interfaces on some platforms. Therefore, you MUST only -use these interfaces if it is absolutely required. %99 of devices can -use the normal APIs without any problems. - -Note that for streaming type mappings you must either use these -interfaces, or the dynamic mapping interfaces above. You may not mix -usage of both for the same device. Such an act is illegal and is -guaranteed to put a banana in your tailpipe. - -However, consistent mappings may in fact be used in conjunction with -these interfaces. Remember that, as defined, consistent mappings are -always going to be SAC addressable. - -The first thing your driver needs to do is query the PCI platform -layer if it is capable of handling your devices DAC addressing -capabilities: - - int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask); - -You may not use the following interfaces if this routine fails. - -Next, DMA addresses using this API are kept track of using the -dma64_addr_t type. It is guaranteed to be big enough to hold any -DAC address the platform layer will give to you from the following -routines. If you have consistent mappings as well, you still -use plain dma_addr_t to keep track of those. - -All mappings obtained here will be direct. The mappings are not -translated, and this is the purpose of this dialect of the DMA API. - -All routines work with page/offset pairs. This is the _ONLY_ way to -portably refer to any piece of memory. If you have a cpu pointer -(which may be validly DMA'd too) you may easily obtain the page -and offset using something like this: - - struct page *page = virt_to_page(ptr); - unsigned long offset = offset_in_page(ptr); - -Here are the interfaces: - - dma64_addr_t pci_dac_page_to_dma(struct pci_dev *pdev, - struct page *page, - unsigned long offset, - int direction); - -The DAC address for the tuple PAGE/OFFSET are returned. The direction -argument is the same as for pci_{map,unmap}_single(). The same rules -for cpu/device access apply here as for the streaming mapping -interfaces. To reiterate: - - The cpu may touch the buffer before pci_dac_page_to_dma. - The device may touch the buffer after pci_dac_page_to_dma - is made, but the cpu may NOT. - -When the DMA transfer is complete, invoke: - - void pci_dac_dma_sync_single_for_cpu(struct pci_dev *pdev, - dma64_addr_t dma_addr, - size_t len, int direction); - -This must be done before the CPU looks at the buffer again. -This interface behaves identically to pci_dma_sync_{single,sg}_for_cpu(). - -And likewise, if you wish to let the device get back at the buffer after -the cpu has read/written it, invoke: - - void pci_dac_dma_sync_single_for_device(struct pci_dev *pdev, - dma64_addr_t dma_addr, - size_t len, int direction); - -before letting the device access the DMA area again. - -If you need to get back to the PAGE/OFFSET tuple from a dma64_addr_t -the following interfaces are provided: - - struct page *pci_dac_dma_to_page(struct pci_dev *pdev, - dma64_addr_t dma_addr); - unsigned long pci_dac_dma_to_offset(struct pci_dev *pdev, - dma64_addr_t dma_addr); - -This is possible with the DAC interfaces purely because they are -not translated in any way. - Optimizing Unmap State Space Consumption On many platforms, pci_unmap_{single,page}() is simply a nop. diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 38f88b6ae40..46bcff2849b 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -643,4 +643,70 @@ X!Idrivers/video/console/fonts.c !Edrivers/spi/spi.c </chapter> + <chapter id="i2c"> + <title>I<superscript>2</superscript>C and SMBus Subsystem</title> + + <para> + I<superscript>2</superscript>C (or without fancy typography, "I2C") + is an acronym for the "Inter-IC" bus, a simple bus protocol which is + widely used where low data rate communications suffice. + Since it's also a licensed trademark, some vendors use another + name (such as "Two-Wire Interface", TWI) for the same bus. + I2C only needs two signals (SCL for clock, SDA for data), conserving + board real estate and minimizing signal quality issues. + Most I2C devices use seven bit addresses, and bus speeds of up + to 400 kHz; there's a high speed extension (3.4 MHz) that's not yet + found wide use. + I2C is a multi-master bus; open drain signaling is used to + arbitrate between masters, as well as to handshake and to + synchronize clocks from slower clients. + </para> + + <para> + The Linux I2C programming interfaces support only the master + side of bus interactions, not the slave side. + The programming interface is structured around two kinds of driver, + and two kinds of device. + An I2C "Adapter Driver" abstracts the controller hardware; it binds + to a physical device (perhaps a PCI device or platform_device) and + exposes a <structname>struct i2c_adapter</structname> representing + each I2C bus segment it manages. + On each I2C bus segment will be I2C devices represented by a + <structname>struct i2c_client</structname>. Those devices will + be bound to a <structname>struct i2c_driver</structname>, + which should follow the standard Linux driver model. + (At this writing, a legacy model is more widely used.) + There are functions to perform various I2C protocol operations; at + this writing all such functions are usable only from task context. + </para> + + <para> + The System Management Bus (SMBus) is a sibling protocol. Most SMBus + systems are also I2C conformant. The electrical constraints are + tighter for SMBus, and it standardizes particular protocol messages + and idioms. Controllers that support I2C can also support most + SMBus operations, but SMBus controllers don't support all the protocol + options that an I2C controller will. + There are functions to perform various SMBus protocol operations, + either using I2C primitives or by issuing SMBus commands to + i2c_adapter devices which don't support those I2C operations. + </para> + +!Iinclude/linux/i2c.h +!Fdrivers/i2c/i2c-boardinfo.c i2c_register_board_info +!Edrivers/i2c/i2c-core.c + </chapter> + + <chapter id="splice"> + <title>splice API</title> + <para>) + splice is a method for moving blocks of data around inside the + kernel, without continually transferring it between the kernel + and user space. + </para> +!Iinclude/linux/splice.h +!Ffs/splice.c + </chapter> + + </book> diff --git a/Documentation/HOWTO b/Documentation/HOWTO index ced9207bedc..98e2701c746 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO @@ -322,39 +322,34 @@ kernel releases as described above. Here is a list of some of the different kernel trees available: git trees: - Kbuild development tree, Sam Ravnborg <sam@ravnborg.org> - kernel.org:/pub/scm/linux/kernel/git/sam/kbuild.git + git.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild.git - ACPI development tree, Len Brown <len.brown@intel.com> - kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git + git.kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git - Block development tree, Jens Axboe <axboe@suse.de> - kernel.org:/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git + git.kernel.org:/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git - DRM development tree, Dave Airlie <airlied@linux.ie> - kernel.org:/pub/scm/linux/kernel/git/airlied/drm-2.6.git + git.kernel.org:/pub/scm/linux/kernel/git/airlied/drm-2.6.git - ia64 development tree, Tony Luck <tony.luck@intel.com> - kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6.git - - - ieee1394 development tree, Jody McIntyre <scjody@modernduck.com> - kernel.org:/pub/scm/linux/kernel/git/scjody/ieee1394.git + git.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6.git - infiniband, Roland Dreier <rolandd@cisco.com> - kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git + git.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git - libata, Jeff Garzik <jgarzik@pobox.com> - kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git + git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git - network drivers, Jeff Garzik <jgarzik@pobox.com> - kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git + git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git - pcmcia, Dominik Brodowski <linux@dominikbrodowski.net> - kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git + git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git - SCSI, James Bottomley <James.Bottomley@SteelEye.com> - kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git - - Other git kernel trees can be found listed at http://kernel.org/git + git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git quilt trees: - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> @@ -362,6 +357,9 @@ Here is a list of some of the different kernel trees available: - x86-64, partly i386, Andi Kleen <ak@suse.de> ftp.firstfloor.org:/pub/ak/x86_64/quilt/ + Other kernel trees can be found listed at http://git.kernel.org/ and in + the MAINTAINERS file. + Bug Reporting ------------- diff --git a/Documentation/SM501.txt b/Documentation/SM501.txt new file mode 100644 index 00000000000..3a1bd95d376 --- /dev/null +++ b/Documentation/SM501.txt @@ -0,0 +1,66 @@ + SM501 Driver + ============ + +Copyright 2006, 2007 Simtec Electronics + +Core +---- + +The core driver in drivers/mfd provides common services for the +drivers which manage the specific hardware blocks. These services +include locking for common registers, clock control and resource +management. + +The core registers drivers for both PCI and generic bus based +chips via the platform device and driver system. + +On detection of a device, the core initialises the chip (which may +be specified by the platform data) and then exports the selected +peripheral set as platform devices for the specific drivers. + +The core re-uses the platform device system as the platform device +system provides enough features to support the drivers without the +need to create a new bus-type and the associated code to go with it. + + +Resources +--------- + +Each peripheral has a view of the device which is implicitly narrowed to +the specific set of resources that peripheral requires in order to +function correctly. + +The centralised memory allocation allows the driver to ensure that the +maximum possible resource allocation can be made to the video subsystem +as this is by-far the most resource-sensitive of the on-chip functions. + +The primary issue with memory allocation is that of moving the video +buffers once a display mode is chosen. Indeed when a video mode change +occurs the memory footprint of the video subsystem changes. + +Since video memory is difficult to move without changing the display +(unless sufficient contiguous memory can be provided for the old and new +modes simultaneously) the video driver fully utilises the memory area +given to it by aligning fb0 to the start of the area and fb1 to the end +of it. Any memory left over in the middle is used for the acceleration +functions, which are transient and thus their location is less critical +as it can be moved. + + +Configuration +------------- + +The platform device driver uses a set of platform data to pass +configurations through to the core and the subsidiary drivers +so that there can be support for more than one system carrying +an SM501 built into a single kernel image. + +The PCI driver assumes that the PCI card behaves as per the Silicon +Motion reference design. + +There is an errata (AB-5) affecting the selection of the +of the M1XCLK and M1CLK frequencies. These two clocks +must be sourced from the same PLL, although they can then +be divided down individually. If this is not set, then SM501 may +lock and hang the whole system. The driver will refuse to +attach if the PLL selection is different. diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist index 3af3e65cf43..6ebffb57e3d 100644 --- a/Documentation/SubmitChecklist +++ b/Documentation/SubmitChecklist @@ -84,3 +84,9 @@ kernel patches. 24: Avoid whitespace damage such as indenting with spaces or whitespace at the end of lines. You can test this by feeding the patch to "git apply --check --whitespace=error-all" + +25: Check your patch for general style as detailed in + Documentation/CodingStyle. Check for trivial violations with the + patch style checker prior to submission (scripts/checkpatch.pl). + You should be able to justify all violations that remain in + your patch. diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index a417b25fb1a..0958e97d4bf 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -118,7 +118,20 @@ then only post say 15 or so at a time and wait for review and integration. -4) Select e-mail destination. +4) Style check your changes. + +Check your patch for basic style violations, details of which can be +found in Documentation/CodingStyle. Failure to do so simply wastes +the reviewers time and will get your patch rejected, probabally +without even being read. + +At a minimum you should check your patches with the patch style +checker prior to submission (scripts/patchcheck.pl). You should +be able to justify all violations that remain in your patch. + + + +5) Select e-mail destination. Look through the MAINTAINERS file and the source code, and determine if your change applies to a specific subsystem of the kernel, with @@ -146,7 +159,7 @@ discussed should the patch then be submitted to Linus. -5) Select your CC (e-mail carbon copy) list. +6) Select your CC (e-mail carbon copy) list. Unless you have a reason NOT to do so, CC linux-kernel@vger.kernel.org. @@ -187,8 +200,7 @@ URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/> - -6) No MIME, no links, no compression, no attachments. Just plain text. +7) No MIME, no links, no compression, no attachments. Just plain text. Linus and other kernel developers need to be able to read and comment on the changes you are submitting. It is important for a kernel @@ -223,9 +235,9 @@ pref("mailnews.display.disable_format_flowed_support", true); -7) E-mail size. +8) E-mail size. -When sending patches to Linus, always follow step #6. +When sending patches to Linus, always follow step #7. Large changes are not appropriate for mailing lists, and some maintainers. If your patch, uncompressed, exceeds 40 kB in size, @@ -234,7 +246,7 @@ server, and provide instead a URL (link) pointing to your patch. -8) Name your kernel version. +9) Name your kernel version. It is important to note, either in the subject line or in the patch description, the kernel version to which this patch applies. @@ -244,7 +256,7 @@ Linus will not apply it. -9) Don't get discouraged. Re-submit. +10) Don't get discouraged. Re-submit. After you have submitted your change, be patient and wait. If Linus likes your change and applies it, it will appear in the next version @@ -270,7 +282,7 @@ When in doubt, solicit comments on linux-kernel mailing list. -10) Include PATCH in the subject +11) Include PATCH in the subject Due to high e-mail traffic to Linus, and to linux-kernel, it is common convention to prefix your subject line with [PATCH]. This lets Linus @@ -279,7 +291,7 @@ e-mail discussions. -11) Sign your work +12) Sign your work To improve tracking of who did what, especially with patches that can percolate to their final resting place in the kernel through several @@ -328,7 +340,32 @@ now, but you can do this to mark internal company procedures or just point out some special detail about the sign-off. -12) The canonical patch format +13) When to use Acked-by: + +The Signed-off-by: tag indicates that the signer was involved in the +development of the patch, or that he/she was in the patch's delivery path. + +If a person was not directly involved in the preparation or handling of a +patch but wishes to signify and record their approval of it then they can +arrange to have an Acked-by: line added to the patch's changelog. + +Acked-by: is often used by the maintainer of the affected code when that +maintainer neither contributed to nor forwarded the patch. + +Acked-by: is not as formal as Signed-off-by:. It is a record that the acker +has at least reviewed the patch and has indicated acceptance. Hence patch +mergers will sometimes manually convert an acker's "yep, looks good to me" +into an Acked-by:. + +Acked-by: does not necessarily indicate acknowledgement of the entire patch. +For example, if a patch affects multiple subsystems and has an Acked-by: from +one subsystem maintainer then this usually indicates acknowledgement of just +the part which affects that maintainer's code. Judgement should be used here. + When in doubt people should refer to the original discussion in the mailing +list archives. + + +14) The canonical patch format The canonical patch subject line is: @@ -427,6 +464,10 @@ section Linus Computer Science 101. Nuff said. If your code deviates too much from this, it is likely to be rejected without further review, and without comment. +Check your patches with the patch style checker prior to submission +(scripts/checkpatch.pl). You should be able to justify all +violations that remain in your patch. + 2) #ifdefs are ugly diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 2a63d5662a9..05851e9982e 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt @@ -149,7 +149,7 @@ defined which accomplish this: void smp_mb__before_atomic_dec(void); void smp_mb__after_atomic_dec(void); void smp_mb__before_atomic_inc(void); - void smp_mb__after_atomic_dec(void); + void smp_mb__after_atomic_inc(void); For example, smp_mb__before_atomic_dec() can be used like so: diff --git a/Documentation/blackfin/kgdb.txt b/Documentation/blackfin/kgdb.txt new file mode 100644 index 00000000000..84f6a484ae9 --- /dev/null +++ b/Documentation/blackfin/kgdb.txt @@ -0,0 +1,155 @@ + A Simple Guide to Configure KGDB + + Sonic Zhang <sonic.zhang@analog.com> + Aug. 24th 2006 + + +This KGDB patch enables the kernel developer to do source level debugging on +the kernel for the Blackfin architecture. The debugging works over either the +ethernet interface or one of the uarts. Both software breakpoints and +hardware breakpoints are supported in this version. +http://docs.blackfin.uclinux.org/doku.php?id=kgdb + + +2 known issues: +1. This bug: + http://blackfin.uclinux.org/tracker/index.php?func=detail&aid=544&group_id=18&atid=145 + The GDB client for Blackfin uClinux causes incorrect values of local + variables to be displayed when the user breaks the running of kernel in GDB. +2. Because of a hardware bug in Blackfin 533 v1.0.3: + 05000067 - Watchpoints (Hardware Breakpoints) are not supported + Hardware breakpoints cannot be set properly. + + +Debug over Ethernet: + +1. Compile and install the cross platform version of gdb for blackfin, which + can be found at $(BINROOT)/bfin-elf-gdb. + +2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under + "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". + With this selected, option "Full Symbolic/Source Debugging support" and + "Compile the kernel with frame pointers" are also selected. + +3. Select option "KGDB: connect over (Ethernet)". Add "kgdboe=@target-IP/,@host-IP/" to + the option "Compiled-in Kernel Boot Parameter" under "Kernel hacking". + +4. Connect minicom to the serial port and boot the kernel image. + +5. Configure the IP "/> ifconfig eth0 target-IP" + +6. Start GDB client "bfin-elf-gdb vmlinux". + +7. Connect to the target "(gdb) target remote udp:target-IP:6443". + +8. Set software breakpoint "(gdb) break sys_open". + +9. Continue "(gdb) c". + +10. Run ls in the target console "/> ls". + +11. Breakpoint hits. "Breakpoint 1: sys_open(..." + +12. Display local variables and function paramters. + (*) This operation gives wrong results, see known issue 1. + +13. Single stepping "(gdb) si". + +14. Remove breakpoint 1. "(gdb) del 1" + +15. Set hardware breakpoint "(gdb) hbreak sys_open". + +16. Continue "(gdb) c". + +17. Run ls in the target console "/> ls". + +18. Hardware breakpoint hits. "Breakpoint 1: sys_open(...". + (*) This hardware breakpoint will not be hit, see known issue 2. + +19. Continue "(gdb) c". + +20. Interrupt the target in GDB "Ctrl+C". + +21. Detach from the target "(gdb) detach". + +22. Exit GDB "(gdb) quit". + + +Debug over the UART: + +1. Compile and install the cross platform version of gdb for blackfin, which + can be found at $(BINROOT)/bfin-elf-gdb. + +2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under + "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". + With this selected, option "Full Symbolic/Source Debugging support" and + "Compile the kernel with frame pointers" are also selected. + +3. Select option "KGDB: connect over (UART)". Set "KGDB: UART port number" to be + a different one from the console. Don't forget to change the mode of + blackfin serial driver to PIO. Otherwise kgdb works incorrectly on UART. + +4. If you want connect to kgdb when the kernel boots, enable + "KGDB: Wait for gdb connection early" + +5. Compile kernel. + +6. Connect minicom to the serial port of the console and boot the kernel image. + +7. Start GDB client "bfin-elf-gdb vmlinux". + +8. Set the baud rate in GDB "(gdb) set remotebaud 57600". + +9. Connect to the target on the second serial port "(gdb) target remote /dev/ttyS1". + +10. Set software breakpoint "(gdb) break sys_open". + +11. Continue "(gdb) c". + +12. Run ls in the target console "/> ls". + +13. A breakpoint is hit. "Breakpoint 1: sys_open(..." + +14. All other operations are the same as that in KGDB over Ethernet. + + +Debug over the same UART as console: + +1. Compile and install the cross platform version of gdb for blackfin, which + can be found at $(BINROOT)/bfin-elf-gdb. + +2. Apply this patch to the 2.6.x kernel. Select the menuconfig option under + "Kernel hacking" -> "Kernel debugging" -> "KGDB: kernel debug with remote gdb". + With this selected, option "Full Symbolic/Source Debugging support" and + "Compile the kernel with frame pointers" are also selected. + +3. Select option "KGDB: connect over UART". Set "KGDB: UART port number" to console. + Don't forget to change the mode of blackfin serial driver to PIO. + Otherwise kgdb works incorrectly on UART. + +4. If you want connect to kgdb when the kernel boots, enable + "KGDB: Wait for gdb connection early" + +5. Connect minicom to the serial port and boot the kernel image. + +6. (Optional) Ask target to wait for gdb connection by entering Ctrl+A. In minicom, you should enter Ctrl+A+A. + +7. Start GDB client "bfin-elf-gdb vmlinux". + +8. Set the baud rate in GDB "(gdb) set remotebaud 57600". + +9. Connect to the target "(gdb) target remote /dev/ttyS0". + +10. Set software breakpoint "(gdb) break sys_open". + +11. Continue "(gdb) c". Then enter Ctrl+C twice to stop GDB connection. + +12. Run ls in the target console "/> ls". Dummy string can be seen on the console. + +13. Then connect the gdb to target again. "(gdb) target remote /dev/ttyS0". + Now you will find a breakpoint is hit. "Breakpoint 1: sys_open(..." + +14. All other operations are the same as that in KGDB over Ethernet. The only + difference is that after continue command in GDB, please stop GDB + connection by 2 "Ctrl+C"s and connect again after breakpoints are hit or + Ctrl+A is entered. diff --git a/Documentation/block/barrier.txt b/Documentation/block/barrier.txt index a272c3db809..7d279f2f5bb 100644 --- a/Documentation/block/barrier.txt +++ b/Documentation/block/barrier.txt @@ -82,23 +82,12 @@ including draining and flushing. typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq); int blk_queue_ordered(request_queue_t *q, unsigned ordered, - prepare_flush_fn *prepare_flush_fn, - unsigned gfp_mask); - -int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered, - prepare_flush_fn *prepare_flush_fn, - unsigned gfp_mask); - -The only difference between the two functions is whether or not the -caller is holding q->queue_lock on entry. The latter expects the -caller is holding the lock. + prepare_flush_fn *prepare_flush_fn); @q : the queue in question @ordered : the ordered mode the driver/device supports @prepare_flush_fn : this function should prepare @rq such that it flushes cache to physical medium when executed -@gfp_mask : gfp_mask used when allocating data structures - for ordered processing For example, SCSI disk driver's prepare_flush_fn looks like the following. @@ -106,9 +95,10 @@ following. static void sd_prepare_flush(request_queue_t *q, struct request *rq) { memset(rq->cmd, 0, sizeof(rq->cmd)); - rq->flags |= REQ_BLOCK_PC; + rq->cmd_type = REQ_TYPE_BLOCK_PC; rq->timeout = SD_TIMEOUT; rq->cmd[0] = SYNCHRONIZE_CACHE; + rq->cmd_len = 10; } The following seven ordered modes are supported. The following table diff --git a/Documentation/driver-model/platform.txt b/Documentation/driver-model/platform.txt index 19c4a6e1367..2a97320ee17 100644 --- a/Documentation/driver-model/platform.txt +++ b/Documentation/driver-model/platform.txt @@ -96,6 +96,46 @@ System setup also associates those clocks with the device, so that that calls to clk_get(&pdev->dev, clock_name) return them as needed. +Legacy Drivers: Device Probing +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Some drivers are not fully converted to the driver model, because they take +on a non-driver role: the driver registers its platform device, rather than +leaving that for system infrastructure. Such drivers can't be hotplugged +or coldplugged, since those mechanisms require device creation to be in a +different system component than the driver. + +The only "good" reason for this is to handle older system designs which, like +original IBM PCs, rely on error-prone "probe-the-hardware" models for hardware +configuration. Newer systems have largely abandoned that model, in favor of +bus-level support for dynamic configuration (PCI, USB), or device tables +provided by the boot firmware (e.g. PNPACPI on x86). There are too many +conflicting options about what might be where, and even educated guesses by +an operating system will be wrong often enough to make trouble. + +This style of driver is discouraged. If you're updating such a driver, +please try to move the device enumeration to a more appropriate location, +outside the driver. This will usually be cleanup, since such drivers +tend to already have "normal" modes, such as ones using device nodes that +were created by PNP or by platform device setup. + +None the less, there are some APIs to support such legacy drivers. Avoid +using these calls except with such hotplug-deficient drivers. + + struct platform_device *platform_device_alloc( + char *name, unsigned id); + +You can use platform_device_alloc() to dynamically allocate a device, which +you will then initialize with resources and platform_device_register(). +A better solution is usually: + + struct platform_device *platform_device_register_simple( + char *name, unsigned id, + struct resource *res, unsigned nres); + +You can use platform_device_register_simple() as a one-step call to allocate +and register a device. + + Device Naming and Driver Binding ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The platform_device.dev.bus_id is the canonical name for the devices. diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 2d7ea85075b..092c65dd35c 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -49,16 +49,6 @@ Who: Adrian Bunk <bunk@stusta.de> --------------------------- -What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN -When: June 2007 -Why: Deprecated in favour of the more efficient and robust rawiso interface. - Affected are applications which use the deprecated part of libraw1394 - (raw1394_iso_write, raw1394_start_iso_write, raw1394_start_iso_rcv, - raw1394_stop_iso_rcv) or bypass libraw1394. -Who: Dan Dennedy <dan@dennedy.org>, Stefan Richter <stefanr@s5r6.in-berlin.de> - ---------------------------- - What: old NCR53C9x driver When: October 2007 Why: Replaced by the much better esp_scsi driver. Actual low-level @@ -70,6 +60,7 @@ Who: David Miller <davem@davemloft.net> What: Video4Linux API 1 ioctls and video_decoder.h from Video devices. When: December 2006 +Files: include/linux/video_decoder.h Why: V4L1 AP1 was replaced by V4L2 API. during migration from 2.4 to 2.6 series. The old API have lots of drawbacks and don't provide enough means to work with all video and audio standards. The newer API is @@ -103,6 +94,7 @@ Who: Dominik Brodowski <linux@brodo.de> What: remove EXPORT_SYMBOL(kernel_thread) When: August 2006 Files: arch/*/kernel/*_ksyms.c +Funcs: kernel_thread Why: kernel_thread is a low-level implementation detail. Drivers should use the <linux/kthread.h> API instead which shields them from implementation details and provides a higherlevel interface that @@ -204,28 +196,6 @@ Who: Adrian Bunk <bunk@stusta.de> --------------------------- -What: ACPI hooks (X86_SPEEDSTEP_CENTRINO_ACPI) in speedstep-centrino driver -When: December 2006 -Why: Speedstep-centrino driver with ACPI hooks and acpi-cpufreq driver are - functionally very much similar. They talk to ACPI in same way. Only - difference between them is the way they do frequency transitions. - One uses MSRs and the other one uses IO ports. Functionaliy of - speedstep_centrino with ACPI hooks is now merged into acpi-cpufreq. - That means one common driver will support all Intel Enhanced Speedstep - capable CPUs. That means less confusion over name of - speedstep-centrino driver (with that driver supposed to be used on - non-centrino platforms). That means less duplication of code and - less maintenance effort and no possibility of these two drivers - going out of sync. - Current users of speedstep_centrino with ACPI hooks are requested to - switch over to acpi-cpufreq driver. speedstep-centrino will continue - to work using older non-ACPI static table based scheme even after this - date. - -Who: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> - ---------------------------- - What: /sys/firmware/acpi/namespace When: 2.6.21 Why: The ACPI namespace is effectively the symbol list for @@ -256,14 +226,6 @@ Who: Len Brown <len.brown@intel.com> --------------------------- -What: sk98lin network driver -When: July 2007 -Why: In kernel tree version of driver is unmaintained. Sk98lin driver - replaced by the skge driver. -Who: Stephen Hemminger <shemminger@osdl.org> - ---------------------------- - What: Compaq touchscreen device emulation When: Oct 2007 Files: drivers/input/tsdev.c @@ -278,25 +240,6 @@ Who: Richard Purdie <rpurdie@rpsys.net> --------------------------- -What: Multipath cached routing support in ipv4 -When: in 2.6.23 -Why: Code was merged, then submitter immediately disappeared leaving - us with no maintainer and lots of bugs. The code should not have - been merged in the first place, and many aspects of it's - implementation are blocking more critical core networking - development. It's marked EXPERIMENTAL and no distribution - enables it because it cause obscure crashes due to unfixable bugs - (interfaces don't return errors so memory allocation can't be - handled, calling contexts of these interfaces make handling - errors impossible too because they get called after we've - totally commited to creating a route object, for example). - This problem has existed for years and no forward progress - has ever been made, and nobody steps up to try and salvage - this code, so we're going to finally just get rid of it. -Who: David S. Miller <davem@davemloft.net> - ---------------------------- - What: read_dev_chars(), read_conf_data{,_lpm}() (s390 common I/O layer) When: December 2007 Why: These functions are a leftover from 2.4 times. They have several @@ -346,3 +289,18 @@ Who: Tejun Heo <htejun@gmail.com> --------------------------- +What: Legacy RTC drivers (under drivers/i2c/chips) +When: November 2007 +Why: Obsolete. We have a RTC subsystem with better drivers. +Who: Jean Delvare <khali@linux-fr.org> + +--------------------------- + +What: iptables SAME target +When: 1.1. 2008 +Files: net/ipv4/netfilter/ipt_SAME.c, include/linux/netfilter_ipv4/ipt_SAME.h +Why: Obsolete for multiple years now, NAT core provides the same behaviour. + Unfixable broken wrt. 32/64 bit cleanness. +Who: Patrick McHardy <kaber@trash.net> + +--------------------------- diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index 6dd050878a2..145e4408635 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt @@ -94,10 +94,10 @@ largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15 Note that trying to mount a tmpfs with an mpol option will fail if the running kernel does not support NUMA; and will fail if its nodelist -specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs -being mounted, but from time to time runs a kernel built without NUMA -capability (perhaps a safe recovery kernel), or configured to support -fewer nodes, then it is advisable to omit the mpol option from automatic +specifies a node which is not online. If your system relies on that +tmpfs being mounted, but from time to time runs a kernel built without +NUMA capability (perhaps a safe recovery kernel), or with fewer nodes +online, then it is advisable to omit the mpol option from automatic mount options. It can be added later, when the tmpfs is already mounted on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'. @@ -121,4 +121,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root. Author: Christoph Rohland <cr@sap.com>, 1.12.01 Updated: - Hugh Dickins <hugh@veritas.com>, 19 February 2006 + Hugh Dickins <hugh@veritas.com>, 4 June 2007 diff --git a/Documentation/firmware_class/README b/Documentation/firmware_class/README index e9cc8bb26f7..c3480aa66ba 100644 --- a/Documentation/firmware_class/README +++ b/Documentation/firmware_class/README @@ -1,7 +1,7 @@ request_firmware() hotplug interface: ------------------------------------ - Copyright (C) 2003 Manuel Estrada Sainz <ranty@debian.org> + Copyright (C) 2003 Manuel Estrada Sainz Why: --- diff --git a/Documentation/firmware_class/firmware_sample_driver.c b/Documentation/firmware_class/firmware_sample_driver.c index 87feccdb5c9..6865cbe075e 100644 --- a/Documentation/firmware_class/firmware_sample_driver.c +++ b/Documentation/firmware_class/firmware_sample_driver.c @@ -1,7 +1,7 @@ /* * firmware_sample_driver.c - * - * Copyright (c) 2003 Manuel Estrada Sainz <ranty@debian.org> + * Copyright (c) 2003 Manuel Estrada Sainz * * Sample code on how to use request_firmware() from drivers. * diff --git a/Documentation/firmware_class/firmware_sample_firmware_class.c b/Documentation/firmware_class/firmware_sample_firmware_class.c index 9e1b0e4051c..fba943aacf9 100644 --- a/Documentation/firmware_class/firmware_sample_firmware_class.c +++ b/Documentation/firmware_class/firmware_sample_firmware_class.c @@ -1,7 +1,7 @@ /* * firmware_sample_firmware_class.c - * - * Copyright (c) 2003 Manuel Estrada Sainz <ranty@debian.org> + * Copyright (c) 2003 Manuel Estrada Sainz * * NOTE: This is just a probe of concept, if you think that your driver would * be well served by this mechanism please contact me first. @@ -19,7 +19,7 @@ #include <linux/firmware.h> -MODULE_AUTHOR("Manuel Estrada Sainz <ranty@debian.org>"); +MODULE_AUTHOR("Manuel Estrada Sainz"); MODULE_DESCRIPTION("Hackish sample for using firmware class directly"); MODULE_LICENSE("GPL"); @@ -78,6 +78,7 @@ static CLASS_DEVICE_ATTR(loading, 0644, firmware_loading_show, firmware_loading_store); static ssize_t firmware_data_read(struct kobject *kobj, + struct bin_attribute *bin_attr, char *buffer, loff_t offset, size_t count) { struct class_device *class_dev = to_class_dev(kobj); @@ -88,6 +89,7 @@ static ssize_t firmware_data_read(struct kobject *kobj, return count; } static ssize_t firmware_data_write(struct kobject *kobj, + struct bin_attribute *bin_attr, char *buffer, loff_t offset, size_t count) { struct class_device *class_dev = to_class_dev(kobj); diff --git a/Documentation/hrtimer/timer_stats.txt b/Documentation/hrtimer/timer_stats.txt index 27f782e3593..22b0814d0ad 100644 --- a/Documentation/hrtimer/timer_stats.txt +++ b/Documentation/hrtimer/timer_stats.txt @@ -2,9 +2,10 @@ timer_stats - timer usage statistics ------------------------------------ timer_stats is a debugging facility to make the timer (ab)usage in a Linux -system visible to kernel and userspace developers. It is not intended for -production usage as it adds significant overhead to the (hr)timer code and the -(hr)timer data structures. +system visible to kernel and userspace developers. If enabled in the config +but not used it has almost zero runtime overhead, and a relatively small +data structure overhead. Even if collection is enabled runtime all the +locking is per-CPU and lookup is hashed. timer_stats should be used by kernel and userspace developers to verify that their code does not make unduly use of timers. This helps to avoid unnecessary diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index c34f0db78a3..fe6406f2f9a 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 @@ -5,8 +5,8 @@ Supported adapters: '810' and '810E' chipsets) * Intel 82801BA (ICH2 - part of the '815E' chipset) * Intel 82801CA/CAM (ICH3) - * Intel 82801DB (ICH4) (HW PEC supported, 32 byte buffer not supported) - * Intel 82801EB/ER (ICH5) (HW PEC supported, 32 byte buffer not supported) + * Intel 82801DB (ICH4) (HW PEC supported) + * Intel 82801EB/ER (ICH5) (HW PEC supported) * Intel 6300ESB * Intel 82801FB/FR/FW/FRW (ICH6) * Intel 82801G (ICH7) diff --git a/Documentation/i2c/busses/i2c-piix4 b/Documentation/i2c/busses/i2c-piix4 index 7cbe43fa270..fa0c786a8bf 100644 --- a/Documentation/i2c/busses/i2c-piix4 +++ b/Documentation/i2c/busses/i2c-piix4 @@ -6,7 +6,7 @@ Supported adapters: Datasheet: Publicly available at the Intel website * ServerWorks OSB4, CSB5, CSB6 and HT-1000 southbridges Datasheet: Only available via NDA from ServerWorks - * ATI IXP200, IXP300, IXP400 and SB600 southbridges + * ATI IXP200, IXP300, IXP400, SB600 and SB700 southbridges Datasheet: Not publicly available * Standard Microsystems (SMSC) SLC90E66 (Victory66) southbridge Datasheet: Publicly available at the SMSC website http://www.smsc.com diff --git a/Documentation/i2c/busses/i2c-taos-evm b/Documentation/i2c/busses/i2c-taos-evm new file mode 100644 index 00000000000..9146e33be6d --- /dev/null +++ b/Documentation/i2c/busses/i2c-taos-evm @@ -0,0 +1,46 @@ +Kernel driver i2c-taos-evm + +Author: Jean Delvare <khali@linux-fr.org> + +This is a driver for the evaluation modules for TAOS I2C/SMBus chips. +The modules include an SMBus master with limited capabilities, which can +be controlled over the serial port. Virtually all evaluation modules +are supported, but a few lines of code need to be added for each new +module to instantiate the right I2C chip on the bus. Obviously, a driver +for the chip in question is also needed. + +Currently supported devices are: + +* TAOS TSL2550 EVM + +For addtional information on TAOS products, please see + http://www.taosinc.com/ + + +Using this driver +----------------- + +In order to use this driver, you'll need the serport driver, and the +inputattach tool, which is part of the input-utils package. The following +commands will tell the kernel that you have a TAOS EVM on the first +serial port: + +# modprobe serport +# inputattach --taos-evm /dev/ttyS0 + + +Technical details +----------------- + +Only 4 SMBus transaction types are supported by the TAOS evaluation +modules: +* Receive Byte +* Send Byte +* Read Byte +* Write Byte + +The communication protocol is text-based and pretty simple. It is +described in a PDF document on the CD which comes with the evaluation +module. The communication is rather slow, because the serial port has +to operate at 1200 bps. However, I don't think this is a big concern in +practice, as these modules are meant for evaluation and testing only. diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875 index 96fec562a8e..a0cd8af2f40 100644 --- a/Documentation/i2c/chips/max6875 +++ b/Documentation/i2c/chips/max6875 @@ -99,7 +99,7 @@ And then read the data or - count = i2c_smbus_read_i2c_block_data(fd, 0x84, buffer); + count = i2c_smbus_read_i2c_block_data(fd, 0x84, 16, buffer); The block read should read 16 bytes. 0x84 is the block read command. diff --git a/Documentation/i2c/chips/x1205 b/Documentation/i2c/chips/x1205 deleted file mode 100644 index 09407c991fe..00000000000 --- a/Documentation/i2c/chips/x1205 +++ /dev/null @@ -1,38 +0,0 @@ -Kernel driver x1205 -=================== - -Supported chips: - * Xicor X1205 RTC - Prefix: 'x1205' - Addresses scanned: none - Datasheet: http://www.intersil.com/cda/deviceinfo/0,1477,X1205,00.html - -Authors: - Karen Spearel <kas11@tampabay.rr.com>, - Alessandro Zummo <a.zummo@towertech.it> - -Description ------------ - -This module aims to provide complete access to the Xicor X1205 RTC. -Recently Xicor has merged with Intersil, but the chip is -still sold under the Xicor brand. - -This chip is located at address 0x6f and uses a 2-byte register addressing. -Two bytes need to be written to read a single register, while most -other chips just require one and take the second one as the data -to be written. To prevent corrupting unknown chips, the user must -explicitely set the probe parameter. - -example: - -modprobe x1205 probe=0,0x6f - -The module supports one more option, hctosys, which is used to set the -software clock from the x1205. On systems where the x1205 is the -only hardware rtc, this parameter could be used to achieve a correct -date/time earlier in the system boot sequence. - -example: - -modprobe x1205 probe=0,0x6f hctosys=1 diff --git a/Documentation/i2c/summary b/Documentation/i2c/summary index aea60bf7e8f..003c7319b8c 100644 --- a/Documentation/i2c/summary +++ b/Documentation/i2c/summary @@ -67,7 +67,6 @@ i2c-proc: The /proc/sys/dev/sensors interface for device (client) drivers Algorithm drivers ----------------- -i2c-algo-8xx: An algorithm for CPM's I2C device in Motorola 8xx processors (NOT BUILT BY DEFAULT) i2c-algo-bit: A bit-banging algorithm i2c-algo-pcf: A PCF 8584 style algorithm i2c-algo-ibm_ocp: An algorithm for the I2C device in IBM 4xx processors (NOT BUILT BY DEFAULT) @@ -81,6 +80,5 @@ i2c-pcf-epp: PCF8584 on a EPP parallel port (uses i2c-algo-pcf) (NOT mkpatch i2c-philips-par: Philips style parallel port adapter (uses i2c-algo-bit) i2c-adap-ibm_ocp: IBM 4xx processor I2C device (uses i2c-algo-ibm_ocp) (NOT BUILT BY DEFAULT) i2c-pport: Primitive parallel port adapter (uses i2c-algo-bit) -i2c-rpx: RPX board Motorola 8xx I2C device (uses i2c-algo-8xx) (NOT BUILT BY DEFAULT) i2c-velleman: Velleman K8000 parallel port adapter (uses i2c-algo-bit) diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 3d8d36b0ad1..2c170032bf3 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients @@ -571,7 +571,7 @@ SMBus communication u8 command, u8 length, u8 *values); extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client, - u8 command, u8 *values); + u8 command, u8 length, u8 *values); These ones were removed in Linux 2.6.10 because they had no users, but could be added back later if needed: diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt index c04a421f4a7..75b3680c41e 100644 --- a/Documentation/i386/zero-page.txt +++ b/Documentation/i386/zero-page.txt @@ -37,6 +37,7 @@ Offset Type Description 0x1d0 unsigned long EFI memory descriptor map pointer 0x1d4 unsigned long EFI memory descriptor map size 0x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb +0x1e4 unsigned long Scratch field for the kernel setup code 0x1e8 char number of entries in E820MAP (below) 0x1e9 unsigned char number of entries in EDDBUF (below) 0x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below) diff --git a/Documentation/ia64/aliasing-test.c b/Documentation/ia64/aliasing-test.c index 3153167b41c..773a814d409 100644 --- a/Documentation/ia64/aliasing-test.c +++ b/Documentation/ia64/aliasing-test.c @@ -19,6 +19,7 @@ #include <sys/mman.h> #include <sys/stat.h> #include <unistd.h> +#include <linux/pci.h> int sum; @@ -34,13 +35,19 @@ int map_mem(char *path, off_t offset, size_t length, int touch) return -1; } + if (fnmatch("/proc/bus/pci/*", path, 0) == 0) { + rc = ioctl(fd, PCIIOC_MMAP_IS_MEM); + if (rc == -1) + perror("PCIIOC_MMAP_IS_MEM ioctl"); + } + addr = mmap(NULL, length, PROT_READ|PROT_WRITE, MAP_SHARED, fd, offset); if (addr == MAP_FAILED) return 1; if (touch) { c = (int *) addr; - while (c < (int *) (offset + length)) + while (c < (int *) (addr + length)) sum += *c++; } @@ -54,7 +61,7 @@ int map_mem(char *path, off_t offset, size_t length, int touch) return 0; } -int scan_sysfs(char *path, char *file, off_t offset, size_t length, int touch) +int scan_tree(char *path, char *file, off_t offset, size_t length, int touch) { struct dirent **namelist; char *name, *path2; @@ -93,7 +100,7 @@ int scan_sysfs(char *path, char *file, off_t offset, size_t length, int touch) } else { r = lstat(path2, &buf); if (r == 0 && S_ISDIR(buf.st_mode)) { - rc = scan_sysfs(path2, file, offset, length, touch); + rc = scan_tree(path2, file, offset, length, touch); if (rc < 0) return rc; } @@ -197,7 +204,7 @@ skip: return rc; } -main() +int main() { int rc; @@ -238,10 +245,15 @@ main() else fprintf(stderr, "FAIL: /dev/mem 0x0-0x100000 not accessible\n"); - scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0, 0xA0000, 1); - scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0xA0000, 0x20000, 0); - scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0xC0000, 0x40000, 1); - scan_sysfs("/sys/class/pci_bus", "legacy_mem", 0, 1024*1024, 0); + scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 0xA0000, 1); + scan_tree("/sys/class/pci_bus", "legacy_mem", 0xA0000, 0x20000, 0); + scan_tree("/sys/class/pci_bus", "legacy_mem", 0xC0000, 0x40000, 1); + scan_tree("/sys/class/pci_bus", "legacy_mem", 0, 1024*1024, 0); scan_rom("/sys/devices", "rom"); + + scan_tree("/proc/bus/pci", "??.?", 0, 0xA0000, 1); + scan_tree("/proc/bus/pci", "??.?", 0xA0000, 0x20000, 0); + scan_tree("/proc/bus/pci", "??.?", 0xC0000, 0x40000, 1); + scan_tree("/proc/bus/pci", "??.?", 0, 1024*1024, 0); } diff --git a/Documentation/ia64/aliasing.txt b/Documentation/ia64/aliasing.txt index 9a431a7d0f5..aa3e953f0f7 100644 --- a/Documentation/ia64/aliasing.txt +++ b/Documentation/ia64/aliasing.txt @@ -112,6 +112,18 @@ POTENTIAL ATTRIBUTE ALIASING CASES The /dev/mem mmap constraints apply. + mmap of /proc/bus/pci/.../??.? + + This is an MMIO mmap of PCI functions, which additionally may or + may not be requested as using the WC attribute. + + If WC is requested, and the region in kern_memmap is either WC + or UC, and the EFI memory map designates the region as WC, then + the WC mapping is allowed. + + Otherwise, the user mapping must use the same attribute as the + kernel mapping. + read/write of /dev/mem This uses copy_from_user(), which implicitly uses a kernel diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index aae2282600c..4d880b3d1f3 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -170,7 +170,10 @@ and is between 256 and 4096 characters. It is defined in the file acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS Format: To spoof as Windows 98: ="Microsoft Windows" - acpi_osi= [HW,ACPI] empty param disables _OSI + acpi_osi= [HW,ACPI] Modify list of supported OS interface strings + acpi_osi="string1" # add string1 -- only one string + acpi_osi="!string2" # remove built-in string2 + acpi_osi= # disable all strings acpi_serialize [HW,ACPI] force serialization of AML methods @@ -220,11 +223,6 @@ and is between 256 and 4096 characters. It is defined in the file acpi_fake_ecdt [HW,ACPI] Workaround failure due to BIOS lacking ECDT - acpi_generic_hotkey [HW,ACPI] - Allow consolidated generic hotkey driver to - override platform specific driver. - See also Documentation/acpi-hotkey.txt. - acpi_pm_good [IA-32,X86-64] Override the pmtimer bug detection: force the kernel to assume that this machine's pmtimer latches its value @@ -1016,49 +1014,6 @@ and is between 256 and 4096 characters. It is defined in the file mga= [HW,DRM] - migration_cost= - [KNL,SMP] debug: override scheduler migration costs - Format: <level-1-usecs>,<level-2-usecs>,... - This debugging option can be used to override the - default scheduler migration cost matrix. The numbers - are indexed by 'CPU domain distance'. - E.g. migration_cost=1000,2000,3000 on an SMT NUMA - box will set up an intra-core migration cost of - 1 msec, an inter-core migration cost of 2 msecs, - and an inter-node migration cost of 3 msecs. - - WARNING: using the wrong values here can break - scheduler performance, so it's only for scheduler - development purposes, not production environments. - - migration_debug= - [KNL,SMP] migration cost auto-detect verbosity - Format=<0|1|2> - If a system's migration matrix reported at bootup - seems erroneous then this option can be used to - increase verbosity of the detection process. - We default to 0 (no extra messages), 1 will print - some more information, and 2 will be really - verbose (probably only useful if you also have a - serial console attached to the system). - - migration_factor= - [KNL,SMP] multiply/divide migration costs by a factor - Format=<percent> - This debug option can be used to proportionally - increase or decrease the auto-detected migration - costs for all entries of the migration matrix. - E.g. migration_factor=150 will increase migration - costs by 50%. (and thus the scheduler will be less - eager migrating cache-hot tasks) - migration_factor=80 will decrease migration costs - by 20%. (thus the scheduler will be more eager to - migrate tasks) - - WARNING: using the wrong values here can break - scheduler performance, so it's only for scheduler - development purposes, not production environments. - mousedev.tap_time= [MOUSE] Maximum time between finger touching and leaving touchpad surface for touch to be considered @@ -1132,9 +1087,9 @@ and is between 256 and 4096 characters. It is defined in the file when set. Format: <int> - noaliencache [MM, NUMA] Disables the allcoation of alien caches in - the slab allocator. Saves per-node memory, but will - impact performance on real NUMA hardware. + noaliencache [MM, NUMA, SLAB] Disables the allocation of alien + caches in the slab allocator. Saves per-node memory, + but will impact performance. noalign [KNL,ARM] @@ -1613,6 +1568,37 @@ and is between 256 and 4096 characters. It is defined in the file slram= [HW,MTD] + slub_debug [MM, SLUB] + Enabling slub_debug allows one to determine the culprit + if slab objects become corrupted. Enabling slub_debug + creates guard zones around objects and poisons objects + when not in use. Also tracks the last alloc / free. + For more information see Documentation/vm/slub.txt. + + slub_max_order= [MM, SLUB] + Determines the maximum allowed order for slabs. Setting + this too high may cause fragmentation. + For more information see Documentation/vm/slub.txt. + + slub_min_objects= [MM, SLUB] + The minimum objects per slab. SLUB will increase the + slab order up to slub_max_order to generate a + sufficiently big slab to satisfy the number of objects. + The higher the number of objects the smaller the overhead + of tracking slabs. + For more information see Documentation/vm/slub.txt. + + slub_min_order= [MM, SLUB] + Determines the mininum page order for slabs. Must be + lower than slub_max_order + For more information see Documentation/vm/slub.txt. + + slub_nomerge [MM, SLUB] + Disable merging of slabs of similar size. May be + necessary if there is some reason to distinguish + allocs to different slabs. + For more information see Documentation/vm/slub.txt. + smart2= [HW] Format: <io1>[,<io2>[,...,<io8>]] diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index e06b6e3c1db..d63f480afb7 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX @@ -32,6 +32,8 @@ cops.txt - info on the COPS LocalTalk Linux driver cs89x0.txt - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver +cxacru.txt + - Conexant AccessRunner USB ADSL Modem de4x5.txt - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver decnet.txt @@ -94,9 +96,6 @@ routing.txt - the new routing mechanism shaper.txt - info on the module that can shape/limit transmitted traffic. -sk98lin.txt - - Marvell Yukon Chipset / SysKonnect SK-98xx compliant Gigabit - Ethernet Adapter family driver info skfp.txt - SysKonnect FDDI (SK-5xxx, Compaq Netelligent) driver info. smc9.txt diff --git a/Documentation/networking/cxacru.txt b/Documentation/networking/cxacru.txt new file mode 100644 index 00000000000..b074681a963 --- /dev/null +++ b/Documentation/networking/cxacru.txt @@ -0,0 +1,84 @@ +Firmware is required for this device: http://accessrunner.sourceforge.net/ + +While it is capable of managing/maintaining the ADSL connection without the +module loaded, the device will sometimes stop responding after unloading the +driver and it is necessary to unplug/remove power to the device to fix this. + +Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/ +these are directories named cxacruN where N is the device number. A symlink +named device points to the USB interface device's directory which contains +several sysfs attribute files for retrieving device statistics: + +* adsl_controller_version + +* adsl_headend +* adsl_headend_environment + Information about the remote headend. + +* downstream_attenuation (dB) +* downstream_bits_per_frame +* downstream_rate (kbps) +* downstream_snr_margin (dB) + Downstream stats. + +* upstream_attenuation (dB) +* upstream_bits_per_frame +* upstream_rate (kbps) +* upstream_snr_margin (dB) +* transmitter_power (dBm/Hz) + Upstream stats. + +* downstream_crc_errors +* downstream_fec_errors +* downstream_hec_errors +* upstream_crc_errors +* upstream_fec_errors +* upstream_hec_errors + Error counts. + +* line_startable + Indicates that ADSL support on the device + is/can be enabled, see adsl_start. + +* line_status + "initialising" + "down" + "attempting to activate" + "training" + "channel analysis" + "exchange" + "waiting" + "up" + + Changes between "down" and "attempting to activate" + if there is no signal. + +* link_status + "not connected" + "connected" + "lost" + +* mac_address + +* modulation + "ANSI T1.413" + "ITU-T G.992.1 (G.DMT)" + "ITU-T G.992.2 (G.LITE)" + +* startup_attempts + Count of total attempts to initialise ADSL. + +To enable/disable ADSL, the following can be written to the adsl_state file: + "start" + "stop + "restart" (stops, waits 1.5s, then starts) + "poll" (used to resume status polling if it was disabled due to failure) + +Changes in adsl/line state are reported via kernel log messages: + [4942145.150704] ATM dev 0: ADSL state: running + [4942243.663766] ATM dev 0: ADSL line: down + [4942249.665075] ATM dev 0: ADSL line: attempting to activate + [4942253.654954] ATM dev 0: ADSL line: training + [4942255.666387] ATM dev 0: ADSL line: channel analysis + [4942259.656262] ATM dev 0: ADSL line: exchange + [2635357.696901] ATM dev 0: ADSL line: up (8128 kb/s down | 832 kb/s up) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index af6a63ab902..32c2e9da5f3 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -433,6 +433,12 @@ tcp_workaround_signed_windows - BOOLEAN not receive a window scaling option from them. Default: 0 +tcp_dma_copybreak - INTEGER + Lower limit, in bytes, of the size of socket reads that will be + offloaded to a DMA copy engine, if one is present in the system + and CONFIG_NET_DMA is enabled. + Default: 4096 + CIPSOv4 Variables: cipso_cache_enable - BOOLEAN @@ -874,8 +880,7 @@ accept_redirects - BOOLEAN accept_source_route - INTEGER Accept source routing (routing extension header). - > 0: Accept routing header. - = 0: Accept only routing header type 2. + >= 0: Accept only routing header type 2. < 0: Do not accept routing header. Default: 0 diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt new file mode 100644 index 00000000000..2451f551c50 --- /dev/null +++ b/Documentation/networking/l2tp.txt @@ -0,0 +1,169 @@ +This brief document describes how to use the kernel's PPPoL2TP driver +to provide L2TP functionality. L2TP is a protocol that tunnels one or +more PPP sessions over a UDP tunnel. It is commonly used for VPNs +(L2TP/IPSec) and by ISPs to tunnel subscriber PPP sessions over an IP +network infrastructure. + +Design +====== + +The PPPoL2TP driver, drivers/net/pppol2tp.c, provides a mechanism by +which PPP frames carried through an L2TP session are passed through +the kernel's PPP subsystem. The standard PPP daemon, pppd, handles all +PPP interaction with the peer. PPP network interfaces are created for +each local PPP endpoint. + +The L2TP protocol http://www.faqs.org/rfcs/rfc2661.html defines L2TP +control and data frames. L2TP control frames carry messages between +L2TP clients/servers and are used to setup / teardown tunnels and +sessions. An L2TP client or server is implemented in userspace and +will use a regular UDP socket per tunnel. L2TP data frames carry PPP +frames, which may be PPP control or PPP data. The kernel's PPP +subsystem arranges for PPP control frames to be delivered to pppd, +while data frames are forwarded as usual. + +Each tunnel and session within a tunnel is assigned a unique tunnel_id +and session_id. These ids are carried in the L2TP header of every +control and data packet. The pppol2tp driver uses them to lookup +internal tunnel and/or session contexts. Zero tunnel / session ids are +treated specially - zero ids are never assigned to tunnels or sessions +in the network. In the driver, the tunnel context keeps a pointer to +the tunnel UDP socket. The session context keeps a pointer to the +PPPoL2TP socket, as well as other data that lets the driver interface +to the kernel PPP subsystem. + +Note that the pppol2tp kernel driver handles only L2TP data frames; +L2TP control frames are simply passed up to userspace in the UDP +tunnel socket. The kernel handles all datapath aspects of the +protocol, including data packet resequencing (if enabled). + +There are a number of requirements on the userspace L2TP daemon in +order to use the pppol2tp driver. + +1. Use a UDP socket per tunnel. + +2. Create a single PPPoL2TP socket per tunnel bound to a special null + session id. This is used only for communicating with the driver but + must remain open while the tunnel is active. Opening this tunnel + management socket causes the driver to mark the tunnel socket as an + L2TP UDP encapsulation socket and flags it for use by the + referenced tunnel id. This hooks up the UDP receive path via + udp_encap_rcv() in net/ipv4/udp.c. PPP data frames are never passed + in this special PPPoX socket. + +3. Create a PPPoL2TP socket per L2TP session. This is typically done + by starting pppd with the pppol2tp plugin and appropriate + arguments. A PPPoL2TP tunnel management socket (Step 2) must be + created before the first PPPoL2TP session socket is created. + +When creating PPPoL2TP sockets, the application provides information +to the driver about the socket in a socket connect() call. Source and +destination tunnel and session ids are provided, as well as the file +descriptor of a UDP socket. See struct pppol2tp_addr in +include/linux/if_ppp.h. Note that zero tunnel / session ids are +treated specially. When creating the per-tunnel PPPoL2TP management +socket in Step 2 above, zero source and destination session ids are +specified, which tells the driver to prepare the supplied UDP file +descriptor for use as an L2TP tunnel socket. + +Userspace may control behavior of the tunnel or session using +setsockopt and ioctl on the PPPoX socket. The following socket +options are supported:- + +DEBUG - bitmask of debug message categories. See below. +SENDSEQ - 0 => don't send packets with sequence numbers + 1 => send packets with sequence numbers +RECVSEQ - 0 => receive packet sequence numbers are optional + 1 => drop receive packets without sequence numbers +LNSMODE - 0 => act as LAC. + 1 => act as LNS. +REORDERTO - reorder timeout (in millisecs). If 0, don't try to reorder. + +Only the DEBUG option is supported by the special tunnel management +PPPoX socket. + +In addition to the standard PPP ioctls, a PPPIOCGL2TPSTATS is provided +to retrieve tunnel and session statistics from the kernel using the +PPPoX socket of the appropriate tunnel or session. + +Debugging +========= + +The driver supports a flexible debug scheme where kernel trace +messages may be optionally enabled per tunnel and per session. Care is +needed when debugging a live system since the messages are not +rate-limited and a busy system could be swamped. Userspace uses +setsockopt on the PPPoX socket to set a debug mask. + +The following debug mask bits are available: + +PPPOL2TP_MSG_DEBUG verbose debug (if compiled in) +PPPOL2TP_MSG_CONTROL userspace - kernel interface +PPPOL2TP_MSG_SEQ sequence numbers handling +PPPOL2TP_MSG_DATA data packets + +Sample Userspace Code +===================== + +1. Create tunnel management PPPoX socket + + kernel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); + if (kernel_fd >= 0) { + struct sockaddr_pppol2tp sax; + struct sockaddr_in const *peer_addr; + + peer_addr = l2tp_tunnel_get_peer_addr(tunnel); + memset(&sax, 0, sizeof(sax)); + sax.sa_family = AF_PPPOX; + sax.sa_protocol = PX_PROTO_OL2TP; + sax.pppol2tp.fd = udp_fd; /* fd of tunnel UDP socket */ + sax.pppol2tp.addr.sin_addr.s_addr = peer_addr->sin_addr.s_addr; + sax.pppol2tp.addr.sin_port = peer_addr->sin_port; + sax.pppol2tp.addr.sin_family = AF_INET; + sax.pppol2tp.s_tunnel = tunnel_id; + sax.pppol2tp.s_session = 0; /* special case: mgmt socket */ + sax.pppol2tp.d_tunnel = 0; + sax.pppol2tp.d_session = 0; /* special case: mgmt socket */ + + if(connect(kernel_fd, (struct sockaddr *)&sax, sizeof(sax) ) < 0 ) { + perror("connect failed"); + result = -errno; + goto err; + } + } + +2. Create session PPPoX data socket + + struct sockaddr_pppol2tp sax; + int fd; + + /* Note, the target socket must be bound already, else it will not be ready */ + sax.sa_family = AF_PPPOX; + sax.sa_protocol = PX_PROTO_OL2TP; + sax.pppol2tp.fd = tunnel_fd; + sax.pppol2tp.addr.sin_addr.s_addr = addr->sin_addr.s_addr; + sax.pppol2tp.addr.sin_port = addr->sin_port; + sax.pppol2tp.addr.sin_family = AF_INET; + sax.pppol2tp.s_tunnel = tunnel_id; + sax.pppol2tp.s_session = session_id; + sax.pppol2tp.d_tunnel = peer_tunnel_id; + sax.pppol2tp.d_session = peer_session_id; + + /* session_fd is the fd of the session's PPPoL2TP socket. + * tunnel_fd is the fd of the tunnel UDP socket. + */ + fd = connect(session_fd, (struct sockaddr *)&sax, sizeof(sax)); + if (fd < 0 ) { + return -errno; + } + return 0; + +Miscellanous +============ + +The PPPoL2TP driver was developed as part of the OpenL2TP project by +Katalix Systems Ltd. OpenL2TP is a full-featured L2TP client / server, +designed from the ground up to have the L2TP datapath in the +kernel. The project also implemented the pppol2tp plugin for pppd +which allows pppd to use the kernel driver. Details can be found at +http://openl2tp.sourceforge.net. diff --git a/Documentation/networking/mac80211-injection.txt b/Documentation/networking/mac80211-injection.txt new file mode 100644 index 00000000000..53ef7a06f49 --- /dev/null +++ b/Documentation/networking/mac80211-injection.txt @@ -0,0 +1,59 @@ +How to use packet injection with mac80211 +========================================= + +mac80211 now allows arbitrary packets to be injected down any Monitor Mode +interface from userland. The packet you inject needs to be composed in the +following format: + + [ radiotap header ] + [ ieee80211 header ] + [ payload ] + +The radiotap format is discussed in +./Documentation/networking/radiotap-headers.txt. + +Despite 13 radiotap argument types are currently defined, most only make sense +to appear on received packets. Currently three kinds of argument are used by +the injection code, although it knows to skip any other arguments that are +present (facilitating replay of captured radiotap headers directly): + + - IEEE80211_RADIOTAP_RATE - u8 arg in 500kbps units (0x02 --> 1Mbps) + + - IEEE80211_RADIOTAP_ANTENNA - u8 arg, 0x00 = ant1, 0x01 = ant2 + + - IEEE80211_RADIOTAP_DBM_TX_POWER - u8 arg, dBm + +Here is an example valid radiotap header defining these three parameters + + 0x00, 0x00, // <-- radiotap version + 0x0b, 0x00, // <- radiotap header length + 0x04, 0x0c, 0x00, 0x00, // <-- bitmap + 0x6c, // <-- rate + 0x0c, //<-- tx power + 0x01 //<-- antenna + +The ieee80211 header follows immediately afterwards, looking for example like +this: + + 0x08, 0x01, 0x00, 0x00, + 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0x13, 0x22, 0x33, 0x44, 0x55, 0x66, + 0x13, 0x22, 0x33, 0x44, 0x55, 0x66, + 0x10, 0x86 + +Then lastly there is the payload. + +After composing the packet contents, it is sent by send()-ing it to a logical +mac80211 interface that is in Monitor mode. Libpcap can also be used, +(which is easier than doing the work to bind the socket to the right +interface), along the following lines: + + ppcap = pcap_open_live(szInterfaceName, 800, 1, 20, szErrbuf); +... + r = pcap_inject(ppcap, u8aSendBuffer, nLength); + +You can also find sources for a complete inject test applet here: + +http://penumbra.warmcat.com/_twk/tiki-index.php?page=packetspammer + +Andy Green <andy@warmcat.com> diff --git a/Documentation/networking/multiqueue.txt b/Documentation/networking/multiqueue.txt new file mode 100644 index 00000000000..00b60cce222 --- /dev/null +++ b/Documentation/networking/multiqueue.txt @@ -0,0 +1,111 @@ + + HOWTO for multiqueue network device support + =========================================== + +Section 1: Base driver requirements for implementing multiqueue support +Section 2: Qdisc support for multiqueue devices +Section 3: Brief howto using PRIO or RR for multiqueue devices + + +Intro: Kernel support for multiqueue devices +--------------------------------------------------------- + +Kernel support for multiqueue devices is only an API that is presented to the +netdevice layer for base drivers to implement. This feature is part of the +core networking stack, and all network devices will be running on the +multiqueue-aware stack. If a base driver only has one queue, then these +changes are transparent to that driver. + + +Section 1: Base driver requirements for implementing multiqueue support +----------------------------------------------------------------------- + +Base drivers are required to use the new alloc_etherdev_mq() or +alloc_netdev_mq() functions to allocate the subqueues for the device. The +underlying kernel API will take care of the allocation and deallocation of +the subqueue memory, as well as netdev configuration of where the queues +exist in memory. + +The base driver will also need to manage the queues as it does the global +netdev->queue_lock today. Therefore base drivers should use the +netif_{start|stop|wake}_subqueue() functions to manage each queue while the +device is still operational. netdev->queue_lock is still used when the device +comes online or when it's completely shut down (unregister_netdev(), etc.). + +Finally, the base driver should indicate that it is a multiqueue device. The +feature flag NETIF_F_MULTI_QUEUE should be added to the netdev->features +bitmap on device initialization. Below is an example from e1000: + +#ifdef CONFIG_E1000_MQ + if ( (adapter->hw.mac.type == e1000_82571) || + (adapter->hw.mac.type == e1000_82572) || + (adapter->hw.mac.type == e1000_80003es2lan)) + netdev->features |= NETIF_F_MULTI_QUEUE; +#endif + + +Section 2: Qdisc support for multiqueue devices +----------------------------------------------- + +Currently two qdiscs support multiqueue devices. A new round-robin qdisc, +sch_rr, and sch_prio. The qdisc is responsible for classifying the skb's to +bands and queues, and will store the queue mapping into skb->queue_mapping. +Use this field in the base driver to determine which queue to send the skb +to. + +sch_rr has been added for hardware that doesn't want scheduling policies from +software, so it's a straight round-robin qdisc. It uses the same syntax and +classification priomap that sch_prio uses, so it should be intuitive to +configure for people who've used sch_prio. + +The PRIO qdisc naturally plugs into a multiqueue device. If PRIO has been +built with NET_SCH_PRIO_MQ, then upon load, it will make sure the number of +bands requested is equal to the number of queues on the hardware. If they +are equal, it sets a one-to-one mapping up between the queues and bands. If +they're not equal, it will not load the qdisc. This is the same behavior +for RR. Once the association is made, any skb that is classified will have +skb->queue_mapping set, which will allow the driver to properly queue skb's +to multiple queues. + + +Section 3: Brief howto using PRIO and RR for multiqueue devices +--------------------------------------------------------------- + +The userspace command 'tc,' part of the iproute2 package, is used to configure +qdiscs. To add the PRIO qdisc to your network device, assuming the device is +called eth0, run the following command: + +# tc qdisc add dev eth0 root handle 1: prio bands 4 multiqueue + +This will create 4 bands, 0 being highest priority, and associate those bands +to the queues on your NIC. Assuming eth0 has 4 Tx queues, the band mapping +would look like: + +band 0 => queue 0 +band 1 => queue 1 +band 2 => queue 2 +band 3 => queue 3 + +Traffic will begin flowing through each queue if your TOS values are assigning +traffic across the various bands. For example, ssh traffic will always try to +go out band 0 based on TOS -> Linux priority conversion (realtime traffic), +so it will be sent out queue 0. ICMP traffic (pings) fall into the "normal" +traffic classification, which is band 1. Therefore pings will be send out +queue 1 on the NIC. + +Note the use of the multiqueue keyword. This is only in versions of iproute2 +that support multiqueue networking devices; if this is omitted when loading +a qdisc onto a multiqueue device, the qdisc will load and operate the same +if it were loaded onto a single-queue device (i.e. - sends all traffic to +queue 0). + +Another alternative to multiqueue band allocation can be done by using the +multiqueue option and specify 0 bands. If this is the case, the qdisc will +allocate the number of bands to equal the number of queues that the device +reports, and bring the qdisc online. + +The behavior of tc filters remains the same, where it will override TOS priority +classification. + + +Author: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt index ce1361f9524..37869295fc7 100644 --- a/Documentation/networking/netdevices.txt +++ b/Documentation/networking/netdevices.txt @@ -20,6 +20,30 @@ private data which gets freed when the network device is freed. If separately allocated data is attached to the network device (dev->priv) then it is up to the module exit handler to free that. +MTU +=== +Each network device has a Maximum Transfer Unit. The MTU does not +include any link layer protocol overhead. Upper layer protocols must +not pass a socket buffer (skb) to a device to transmit with more data +than the mtu. The MTU does not include link layer header overhead, so +for example on Ethernet if the standard MTU is 1500 bytes used, the +actual skb will contain up to 1514 bytes because of the Ethernet +header. Devices should allow for the 4 byte VLAN header as well. + +Segmentation Offload (GSO, TSO) is an exception to this rule. The +upper layer protocol may pass a large socket buffer to the device +transmit routine, and the device will break that up into separate +packets based on the current MTU. + +MTU is symmetrical and applies both to receive and transmit. A device +must be able to receive at least the maximum size packet allowed by +the MTU. A network device may use the MTU as mechanism to size receive +buffers, but the device should allow packets with VLAN header. With +standard Ethernet mtu of 1500 bytes, the device should allow up to +1518 byte packets (1500 + 14 header + 4 tag). The device may either: +drop, truncate, or pass up oversize packets, but dropping oversize +packets is preferred. + struct net_device synchronization rules ======================================= @@ -43,16 +67,17 @@ dev->get_stats: dev->hard_start_xmit: Synchronization: netif_tx_lock spinlock. + When the driver sets NETIF_F_LLTX in dev->features this will be called without holding netif_tx_lock. In this case the driver has to lock by itself when needed. It is recommended to use a try lock - for this and return -1 when the spin lock fails. + for this and return NETDEV_TX_LOCKED when the spin lock fails. The locking there should also properly protect against - set_multicast_list - Context: Process with BHs disabled or BH (timer). - Notes: netif_queue_stopped() is guaranteed false - Interrupts must be enabled when calling hard_start_xmit. - (Interrupts must also be enabled when enabling the BH handler.) + set_multicast_list. + + Context: Process with BHs disabled or BH (timer), + will be called with interrupts disabled by netconsole. + Return codes: o NETDEV_TX_OK everything ok. o NETDEV_TX_BUSY Cannot transmit packet, try later @@ -74,4 +99,5 @@ dev->poll: Synchronization: __LINK_STATE_RX_SCHED bit in dev->state. See dev_close code and comments in net/core/dev.c for more info. Context: softirq + will be called with interrupts disabled by netconsole. diff --git a/Documentation/networking/radiotap-headers.txt b/Documentation/networking/radiotap-headers.txt new file mode 100644 index 00000000000..953331c7984 --- /dev/null +++ b/Documentation/networking/radiotap-headers.txt @@ -0,0 +1,152 @@ +How to use radiotap headers +=========================== + +Pointer to the radiotap include file +------------------------------------ + +Radiotap headers are variable-length and extensible, you can get most of the +information you need to know on them from: + +./include/net/ieee80211_radiotap.h + +This document gives an overview and warns on some corner cases. + + +Structure of the header +----------------------- + +There is a fixed portion at the start which contains a u32 bitmap that defines +if the possible argument associated with that bit is present or not. So if b0 +of the it_present member of ieee80211_radiotap_header is set, it means that +the header for argument index 0 (IEEE80211_RADIOTAP_TSFT) is present in the +argument area. + + < 8-byte ieee80211_radiotap_header > + [ <possible argument bitmap extensions ... > ] + [ <argument> ... ] + +At the moment there are only 13 possible argument indexes defined, but in case +we run out of space in the u32 it_present member, it is defined that b31 set +indicates that there is another u32 bitmap following (shown as "possible +argument bitmap extensions..." above), and the start of the arguments is moved +forward 4 bytes each time. + +Note also that the it_len member __le16 is set to the total number of bytes +covered by the ieee80211_radiotap_header and any arguments following. + + +Requirements for arguments +-------------------------- + +After the fixed part of the header, the arguments follow for each argument +index whose matching bit is set in the it_present member of +ieee80211_radiotap_header. + + - the arguments are all stored little-endian! + + - the argument payload for a given argument index has a fixed size. So + IEEE80211_RADIOTAP_TSFT being present always indicates an 8-byte argument is + present. See the comments in ./include/net/ieee80211_radiotap.h for a nice + breakdown of all the argument sizes + + - the arguments must be aligned to a boundary of the argument size using + padding. So a u16 argument must start on the next u16 boundary if it isn't + already on one, a u32 must start on the next u32 boundary and so on. + + - "alignment" is relative to the start of the ieee80211_radiotap_header, ie, + the first byte of the radiotap header. The absolute alignment of that first + byte isn't defined. So even if the whole radiotap header is starting at, eg, + address 0x00000003, still the first byte of the radiotap header is treated as + 0 for alignment purposes. + + - the above point that there may be no absolute alignment for multibyte + entities in the fixed radiotap header or the argument region means that you + have to take special evasive action when trying to access these multibyte + entities. Some arches like Blackfin cannot deal with an attempt to + dereference, eg, a u16 pointer that is pointing to an odd address. Instead + you have to use a kernel API get_unaligned() to dereference the pointer, + which will do it bytewise on the arches that require that. + + - The arguments for a given argument index can be a compound of multiple types + together. For example IEEE80211_RADIOTAP_CHANNEL has an argument payload + consisting of two u16s of total length 4. When this happens, the padding + rule is applied dealing with a u16, NOT dealing with a 4-byte single entity. + + +Example valid radiotap header +----------------------------- + + 0x00, 0x00, // <-- radiotap version + pad byte + 0x0b, 0x00, // <- radiotap header length + 0x04, 0x0c, 0x00, 0x00, // <-- bitmap + 0x6c, // <-- rate (in 500kHz units) + 0x0c, //<-- tx power + 0x01 //<-- antenna + + +Using the Radiotap Parser +------------------------- + +If you are having to parse a radiotap struct, you can radically simplify the +job by using the radiotap parser that lives in net/wireless/radiotap.c and has +its prototypes available in include/net/cfg80211.h. You use it like this: + +#include <net/cfg80211.h> + +/* buf points to the start of the radiotap header part */ + +int MyFunction(u8 * buf, int buflen) +{ + int pkt_rate_100kHz = 0, antenna = 0, pwr = 0; + struct ieee80211_radiotap_iterator iterator; + int ret = ieee80211_radiotap_iterator_init(&iterator, buf, buflen); + + while (!ret) { + + ret = ieee80211_radiotap_iterator_next(&iterator); + + if (ret) + continue; + + /* see if this argument is something we can use */ + + switch (iterator.this_arg_index) { + /* + * You must take care when dereferencing iterator.this_arg + * for multibyte types... the pointer is not aligned. Use + * get_unaligned((type *)iterator.this_arg) to dereference + * iterator.this_arg for type "type" safely on all arches. + */ + case IEEE80211_RADIOTAP_RATE: + /* radiotap "rate" u8 is in + * 500kbps units, eg, 0x02=1Mbps + */ + pkt_rate_100kHz = (*iterator.this_arg) * 5; + break; + + case IEEE80211_RADIOTAP_ANTENNA: + /* radiotap uses 0 for 1st ant */ + antenna = *iterator.this_arg); + break; + + case IEEE80211_RADIOTAP_DBM_TX_POWER: + pwr = *iterator.this_arg; + break; + + default: + break; + } + } /* while more rt headers */ + + if (ret != -ENOENT) + return TXRX_DROP; + + /* discard the radiotap header part */ + buf += iterator.max_length; + buflen -= iterator.max_length; + + ... + +} + +Andy Green <andy@warmcat.com> diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt deleted file mode 100644 index 8590a954df1..00000000000 --- a/Documentation/networking/sk98lin.txt +++ /dev/null @@ -1,568 +0,0 @@ -(C)Copyright 1999-2004 Marvell(R). -All rights reserved -=========================================================================== - -sk98lin.txt created 13-Feb-2004 - -Readme File for sk98lin v6.23 -Marvell Yukon/SysKonnect SK-98xx Gigabit Ethernet Adapter family driver for LINUX - -This file contains - 1 Overview - 2 Required Files - 3 Installation - 3.1 Driver Installation - 3.2 Inclusion of adapter at system start - 4 Driver Parameters - 4.1 Per-Port Parameters - 4.2 Adapter Parameters - 5 Large Frame Support - 6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) - 7 Troubleshooting - -=========================================================================== - - -1 Overview -=========== - -The sk98lin driver supports the Marvell Yukon and SysKonnect -SK-98xx/SK-95xx compliant Gigabit Ethernet Adapter on Linux. It has -been tested with Linux on Intel/x86 machines. -*** - - -2 Required Files -================= - -The linux kernel source. -No additional files required. -*** - - -3 Installation -=============== - -It is recommended to download the latest version of the driver from the -SysKonnect web site www.syskonnect.com. If you have downloaded the latest -driver, the Linux kernel has to be patched before the driver can be -installed. For details on how to patch a Linux kernel, refer to the -patch.txt file. - -3.1 Driver Installation ------------------------- - -The following steps describe the actions that are required to install -the driver and to start it manually. These steps should be carried -out for the initial driver setup. Once confirmed to be ok, they can -be included in the system start. - -NOTE 1: To perform the following tasks you need 'root' access. - -NOTE 2: In case of problems, please read the section "Troubleshooting" - below. - -The driver can either be integrated into the kernel or it can be compiled -as a module. Select the appropriate option during the kernel -configuration. - -Compile/use the driver as a module ----------------------------------- -To compile the driver, go to the directory /usr/src/linux and -execute the command "make menuconfig" or "make xconfig" and proceed as -follows: - -To integrate the driver permanently into the kernel, proceed as follows: - -1. Select the menu "Network device support" and then "Ethernet(1000Mbit)" -2. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" - with (*) -3. Build a new kernel when the configuration of the above options is - finished. -4. Install the new kernel. -5. Reboot your system. - -To use the driver as a module, proceed as follows: - -1. Enable 'loadable module support' in the kernel. -2. For automatic driver start, enable the 'Kernel module loader'. -3. Select the menu "Network device support" and then "Ethernet(1000Mbit)" -4. Mark "Marvell Yukon Chipset / SysKonnect SK-98xx family support" - with (M) -5. Execute the command "make modules". -6. Execute the command "make modules_install". - The appropriate modules will be installed. -7. Reboot your system. - - -Load the module manually ------------------------- -To load the module manually, proceed as follows: - -1. Enter "modprobe sk98lin". -2. If a Marvell Yukon or SysKonnect SK-98xx adapter is installed in - your computer and you have a /proc file system, execute the command: - "ls /proc/net/sk98lin/" - This should produce an output containing a line with the following - format: - eth0 eth1 ... - which indicates that your adapter has been found and initialized. - - NOTE 1: If you have more than one Marvell Yukon or SysKonnect SK-98xx - adapter installed, the adapters will be listed as 'eth0', - 'eth1', 'eth2', etc. - For each adapter, repeat steps 3 and 4 below. - - NOTE 2: If you have other Ethernet adapters installed, your Marvell - Yukon or SysKonnect SK-98xx adapter will be mapped to the - next available number, e.g. 'eth1'. The mapping is executed - automatically. - The module installation message (displayed either in a system - log file or on the console) prints a line for each adapter - found containing the corresponding 'ethX'. - -3. Select an IP address and assign it to the respective adapter by - entering: - ifconfig eth0 <ip-address> - With this command, the adapter is connected to the Ethernet. - - SK-98xx Gigabit Ethernet Server Adapters: The yellow LED on the adapter - is now active, the link status LED of the primary port is active and - the link status LED of the secondary port (on dual port adapters) is - blinking (if the ports are connected to a switch or hub). - SK-98xx V2.0 Gigabit Ethernet Adapters: The link status LED is active. - In addition, you will receive a status message on the console stating - "ethX: network connection up using port Y" and showing the selected - connection parameters (x stands for the ethernet device number - (0,1,2, etc), y stands for the port name (A or B)). - - NOTE: If you are in doubt about IP addresses, ask your network - administrator for assistance. - -4. Your adapter should now be fully operational. - Use 'ping <otherstation>' to verify the connection to other computers - on your network. -5. To check the adapter configuration view /proc/net/sk98lin/[devicename]. - For example by executing: - "cat /proc/net/sk98lin/eth0" - -Unload the module ------------------ -To stop and unload the driver modules, proceed as follows: - -1. Execute the command "ifconfig eth0 down". -2. Execute the command "rmmod sk98lin". - -3.2 Inclusion of adapter at system start ------------------------------------------ - -Since a large number of different Linux distributions are -available, we are unable to describe a general installation procedure -for the driver module. -Because the driver is now integrated in the kernel, installation should -be easy, using the standard mechanism of your distribution. -Refer to the distribution's manual for installation of ethernet adapters. - -*** - -4 Driver Parameters -==================== - -Parameters can be set at the command line after the module has been -loaded with the command 'modprobe'. -In some distributions, the configuration tools are able to pass parameters -to the driver module. - -If you use the kernel module loader, you can set driver parameters -in the file /etc/modprobe.conf (or /etc/modules.conf in 2.4 or earlier). -To set the driver parameters in this file, proceed as follows: - -1. Insert a line of the form : - options sk98lin ... - For "...", the same syntax is required as described for the command - line parameters of modprobe below. -2. To activate the new parameters, either reboot your computer - or - unload and reload the driver. - The syntax of the driver parameters is: - - modprobe sk98lin parameter=value1[,value2[,value3...]] - - where value1 refers to the first adapter, value2 to the second etc. - -NOTE: All parameters are case sensitive. Write them exactly as shown - below. - -Example: -Suppose you have two adapters. You want to set auto-negotiation -on the first adapter to ON and on the second adapter to OFF. -You also want to set DuplexCapabilities on the first adapter -to FULL, and on the second adapter to HALF. -Then, you must enter: - - modprobe sk98lin AutoNeg_A=On,Off DupCap_A=Full,Half - -NOTE: The number of adapters that can be configured this way is - limited in the driver (file skge.c, constant SK_MAX_CARD_PARAM). - The current limit is 16. If you happen to install - more adapters, adjust this and recompile. - - -4.1 Per-Port Parameters ------------------------- - -These settings are available for each port on the adapter. -In the following description, '?' stands for the port for -which you set the parameter (A or B). - -Speed ------ -Parameter: Speed_? -Values: 10, 100, 1000, Auto -Default: Auto - -This parameter is used to set the speed capabilities. It is only valid -for the SK-98xx V2.0 copper adapters. -Usually, the speed is negotiated between the two ports during link -establishment. If this fails, a port can be forced to a specific setting -with this parameter. - -Auto-Negotiation ----------------- -Parameter: AutoNeg_? -Values: On, Off, Sense -Default: On - -The "Sense"-mode automatically detects whether the link partner supports -auto-negotiation or not. - -Duplex Capabilities -------------------- -Parameter: DupCap_? -Values: Half, Full, Both -Default: Both - -This parameters is only relevant if auto-negotiation for this port is -not set to "Sense". If auto-negotiation is set to "On", all three values -are possible. If it is set to "Off", only "Full" and "Half" are allowed. -This parameter is useful if your link partner does not support all -possible combinations. - -Flow Control ------------- -Parameter: FlowCtrl_? -Values: Sym, SymOrRem, LocSend, None -Default: SymOrRem - -This parameter can be used to set the flow control capabilities the -port reports during auto-negotiation. It can be set for each port -individually. -Possible modes: - -- Sym = Symmetric: both link partners are allowed to send - PAUSE frames - -- SymOrRem = SymmetricOrRemote: both or only remote partner - are allowed to send PAUSE frames - -- LocSend = LocalSend: only local link partner is allowed - to send PAUSE frames - -- None = no link partner is allowed to send PAUSE frames - -NOTE: This parameter is ignored if auto-negotiation is set to "Off". - -Role in Master-Slave-Negotiation (1000Base-T only) --------------------------------------------------- -Parameter: Role_? -Values: Auto, Master, Slave -Default: Auto - -This parameter is only valid for the SK-9821 and SK-9822 adapters. -For two 1000Base-T ports to communicate, one must take the role of the -master (providing timing information), while the other must be the -slave. Usually, this is negotiated between the two ports during link -establishment. If this fails, a port can be forced to a specific setting -with this parameter. - - -4.2 Adapter Parameters ------------------------ - -Connection Type (SK-98xx V2.0 copper adapters only) ---------------- -Parameter: ConType -Values: Auto, 100FD, 100HD, 10FD, 10HD -Default: Auto - -The parameter 'ConType' is a combination of all five per-port parameters -within one single parameter. This simplifies the configuration of both ports -of an adapter card! The different values of this variable reflect the most -meaningful combinations of port parameters. - -The following table shows the values of 'ConType' and the corresponding -combinations of the per-port parameters: - - ConType | DupCap AutoNeg FlowCtrl Role Speed - ----------+------------------------------------------------------ - Auto | Both On SymOrRem Auto Auto - 100FD | Full Off None Auto (ignored) 100 - 100HD | Half Off None Auto (ignored) 100 - 10FD | Full Off None Auto (ignored) 10 - 10HD | Half Off None Auto (ignored) 10 - -Stating any other port parameter together with this 'ConType' variable -will result in a merged configuration of those settings. This due to -the fact, that the per-port parameters (e.g. Speed_? ) have a higher -priority than the combined variable 'ConType'. - -NOTE: This parameter is always used on both ports of the adapter card. - -Interrupt Moderation --------------------- -Parameter: Moderation -Values: None, Static, Dynamic -Default: None - -Interrupt moderation is employed to limit the maximum number of interrupts -the driver has to serve. That is, one or more interrupts (which indicate any -transmit or receive packet to be processed) are queued until the driver -processes them. When queued interrupts are to be served, is determined by the -'IntsPerSec' parameter, which is explained later below. - -Possible modes: - - -- None - No interrupt moderation is applied on the adapter card. - Therefore, each transmit or receive interrupt is served immediately - as soon as it appears on the interrupt line of the adapter card. - - -- Static - Interrupt moderation is applied on the adapter card. - All transmit and receive interrupts are queued until a complete - moderation interval ends. If such a moderation interval ends, all - queued interrupts are processed in one big bunch without any delay. - The term 'static' reflects the fact, that interrupt moderation is - always enabled, regardless how much network load is currently - passing via a particular interface. In addition, the duration of - the moderation interval has a fixed length that never changes while - the driver is operational. - - -- Dynamic - Interrupt moderation might be applied on the adapter card, - depending on the load of the system. If the driver detects that the - system load is too high, the driver tries to shield the system against - too much network load by enabling interrupt moderation. If - at a later - time - the CPU utilization decreases again (or if the network load is - negligible) the interrupt moderation will automatically be disabled. - -Interrupt moderation should be used when the driver has to handle one or more -interfaces with a high network load, which - as a consequence - leads also to a -high CPU utilization. When moderation is applied in such high network load -situations, CPU load might be reduced by 20-30%. - -NOTE: The drawback of using interrupt moderation is an increase of the round- -trip-time (RTT), due to the queueing and serving of interrupts at dedicated -moderation times. - -Interrupts per second ---------------------- -Parameter: IntsPerSec -Values: 30...40000 (interrupts per second) -Default: 2000 - -This parameter is only used if either static or dynamic interrupt moderation -is used on a network adapter card. Using this parameter if no moderation is -applied will lead to no action performed. - -This parameter determines the length of any interrupt moderation interval. -Assuming that static interrupt moderation is to be used, an 'IntsPerSec' -parameter value of 2000 will lead to an interrupt moderation interval of -500 microseconds. - -NOTE: The duration of the moderation interval is to be chosen with care. -At first glance, selecting a very long duration (e.g. only 100 interrupts per -second) seems to be meaningful, but the increase of packet-processing delay -is tremendous. On the other hand, selecting a very short moderation time might -compensate the use of any moderation being applied. - - -Preferred Port --------------- -Parameter: PrefPort -Values: A, B -Default: A - -This is used to force the preferred port to A or B (on dual-port network -adapters). The preferred port is the one that is used if both are detected -as fully functional. - -RLMT Mode (Redundant Link Management Technology) ------------------------------------------------- -Parameter: RlmtMode -Values: CheckLinkState,CheckLocalPort, CheckSeg, DualNet -Default: CheckLinkState - -RLMT monitors the status of the port. If the link of the active port -fails, RLMT switches immediately to the standby link. The virtual link is -maintained as long as at least one 'physical' link is up. - -Possible modes: - - -- CheckLinkState - Check link state only: RLMT uses the link state - reported by the adapter hardware for each individual port to - determine whether a port can be used for all network traffic or - not. - - -- CheckLocalPort - In this mode, RLMT monitors the network path - between the two ports of an adapter by regularly exchanging packets - between them. This mode requires a network configuration in which - the two ports are able to "see" each other (i.e. there must not be - any router between the ports). - - -- CheckSeg - Check local port and segmentation: This mode supports the - same functions as the CheckLocalPort mode and additionally checks - network segmentation between the ports. Therefore, this mode is only - to be used if Gigabit Ethernet switches are installed on the network - that have been configured to use the Spanning Tree protocol. - - -- DualNet - In this mode, ports A and B are used as separate devices. - If you have a dual port adapter, port A will be configured as eth0 - and port B as eth1. Both ports can be used independently with - distinct IP addresses. The preferred port setting is not used. - RLMT is turned off. - -NOTE: RLMT modes CLP and CLPSS are designed to operate in configurations - where a network path between the ports on one adapter exists. - Moreover, they are not designed to work where adapters are connected - back-to-back. -*** - - -5 Large Frame Support -====================== - -The driver supports large frames (also called jumbo frames). Using large -frames can result in an improved throughput if transferring large amounts -of data. -To enable large frames, set the MTU (maximum transfer unit) of the -interface to the desired value (up to 9000), execute the following -command: - ifconfig eth0 mtu 9000 -This will only work if you have two adapters connected back-to-back -or if you use a switch that supports large frames. When using a switch, -it should be configured to allow large frames and auto-negotiation should -be set to OFF. The setting must be configured on all adapters that can be -reached by the large frames. If one adapter is not set to receive large -frames, it will simply drop them. - -You can switch back to the standard ethernet frame size by executing the -following command: - ifconfig eth0 mtu 1500 - -To permanently configure this setting, add a script with the 'ifconfig' -line to the system startup sequence (named something like "S99sk98lin" -in /etc/rc.d/rc2.d). -*** - - -6 VLAN and Link Aggregation Support (IEEE 802.1, 802.1q, 802.3ad) -================================================================== - -The Marvell Yukon/SysKonnect Linux drivers are able to support VLAN and -Link Aggregation according to IEEE standards 802.1, 802.1q, and 802.3ad. -These features are only available after installation of open source -modules available on the Internet: -For VLAN go to: http://www.candelatech.com/~greear/vlan.html -For Link Aggregation go to: http://www.st.rim.or.jp/~yumo - -NOTE: SysKonnect GmbH does not offer any support for these open source - modules and does not take the responsibility for any kind of - failures or problems arising in connection with these modules. - -NOTE: Configuring Link Aggregation on a SysKonnect dual link adapter may - cause problems when unloading the driver. - - -7 Troubleshooting -================== - -If any problems occur during the installation process, check the -following list: - - -Problem: The SK-98xx adapter cannot be found by the driver. -Solution: In /proc/pci search for the following entry: - 'Ethernet controller: SysKonnect SK-98xx ...' - If this entry exists, the SK-98xx or SK-98xx V2.0 adapter has - been found by the system and should be operational. - If this entry does not exist or if the file '/proc/pci' is not - found, there may be a hardware problem or the PCI support may - not be enabled in your kernel. - The adapter can be checked using the diagnostics program which - is available on the SysKonnect web site: - www.syskonnect.com - - Some COMPAQ machines have problems dealing with PCI under Linux. - This problem is described in the 'PCI howto' document - (included in some distributions or available from the - web, e.g. at 'www.linux.org'). - - -Problem: Programs such as 'ifconfig' or 'route' cannot be found or the - error message 'Operation not permitted' is displayed. -Reason: You are not logged in as user 'root'. -Solution: Logout and login as 'root' or change to 'root' via 'su'. - - -Problem: Upon use of the command 'ping <address>' the message - "ping: sendto: Network is unreachable" is displayed. -Reason: Your route is not set correctly. -Solution: If you are using RedHat, you probably forgot to set up the - route in the 'network configuration'. - Check the existing routes with the 'route' command and check - if an entry for 'eth0' exists, and if so, if it is set correctly. - - -Problem: The driver can be started, the adapter is connected to the - network, but you cannot receive or transmit any packets; - e.g. 'ping' does not work. -Reason: There is an incorrect route in your routing table. -Solution: Check the routing table with the command 'route' and read the - manual help pages dealing with routes (enter 'man route'). - -NOTE: Although the 2.2.x kernel versions generate the routing entry - automatically, problems of this kind may occur here as well. We've - come across a situation in which the driver started correctly at - system start, but after the driver has been removed and reloaded, - the route of the adapter's network pointed to the 'dummy0'device - and had to be corrected manually. - - -Problem: Your computer should act as a router between multiple - IP subnetworks (using multiple adapters), but computers in - other subnetworks cannot be reached. -Reason: Either the router's kernel is not configured for IP forwarding - or the routing table and gateway configuration of at least one - computer is not working. - -Problem: Upon driver start, the following error message is displayed: - "eth0: -- ERROR -- - Class: internal Software error - Nr: 0xcc - Msg: SkGeInitPort() cannot init running ports" -Reason: You are using a driver compiled for single processor machines - on a multiprocessor machine with SMP (Symmetric MultiProcessor) - kernel. -Solution: Configure your kernel appropriately and recompile the kernel or - the modules. - - - -If your problem is not listed here, please contact SysKonnect's technical -support for help (linux@syskonnect.de). -When contacting our technical support, please ensure that the following -information is available: -- System Manufacturer and HW Informations (CPU, Memory... ) -- PCI-Boards in your system -- Distribution -- Kernel version -- Driver version -*** - - - -***End of Readme File*** diff --git a/Documentation/networking/spider_net.txt b/Documentation/networking/spider_net.txt new file mode 100644 index 00000000000..4b4adb8eb14 --- /dev/null +++ b/Documentation/networking/spider_net.txt @@ -0,0 +1,204 @@ + + The Spidernet Device Driver + =========================== + +Written by Linas Vepstas <linas@austin.ibm.com> + +Version of 7 June 2007 + +Abstract +======== +This document sketches the structure of portions of the spidernet +device driver in the Linux kernel tree. The spidernet is a gigabit +ethernet device built into the Toshiba southbridge commonly used +in the SONY Playstation 3 and the IBM QS20 Cell blade. + +The Structure of the RX Ring. +============================= +The receive (RX) ring is a circular linked list of RX descriptors, +together with three pointers into the ring that are used to manage its +contents. + +The elements of the ring are called "descriptors" or "descrs"; they +describe the received data. This includes a pointer to a buffer +containing the received data, the buffer size, and various status bits. + +There are three primary states that a descriptor can be in: "empty", +"full" and "not-in-use". An "empty" or "ready" descriptor is ready +to receive data from the hardware. A "full" descriptor has data in it, +and is waiting to be emptied and processed by the OS. A "not-in-use" +descriptor is neither empty or full; it is simply not ready. It may +not even have a data buffer in it, or is otherwise unusable. + +During normal operation, on device startup, the OS (specifically, the +spidernet device driver) allocates a set of RX descriptors and RX +buffers. These are all marked "empty", ready to receive data. This +ring is handed off to the hardware, which sequentially fills in the +buffers, and marks them "full". The OS follows up, taking the full +buffers, processing them, and re-marking them empty. + +This filling and emptying is managed by three pointers, the "head" +and "tail" pointers, managed by the OS, and a hardware current +descriptor pointer (GDACTDPA). The GDACTDPA points at the descr +currently being filled. When this descr is filled, the hardware +marks it full, and advances the GDACTDPA by one. Thus, when there is +flowing RX traffic, every descr behind it should be marked "full", +and everything in front of it should be "empty". If the hardware +discovers that the current descr is not empty, it will signal an +interrupt, and halt processing. + +The tail pointer tails or trails the hardware pointer. When the +hardware is ahead, the tail pointer will be pointing at a "full" +descr. The OS will process this descr, and then mark it "not-in-use", +and advance the tail pointer. Thus, when there is flowing RX traffic, +all of the descrs in front of the tail pointer should be "full", and +all of those behind it should be "not-in-use". When RX traffic is not +flowing, then the tail pointer can catch up to the hardware pointer. +The OS will then note that the current tail is "empty", and halt +processing. + +The head pointer (somewhat mis-named) follows after the tail pointer. +When traffic is flowing, then the head pointer will be pointing at +a "not-in-use" descr. The OS will perform various housekeeping duties +on this descr. This includes allocating a new data buffer and +dma-mapping it so as to make it visible to the hardware. The OS will +then mark the descr as "empty", ready to receive data. Thus, when there +is flowing RX traffic, everything in front of the head pointer should +be "not-in-use", and everything behind it should be "empty". If no +RX traffic is flowing, then the head pointer can catch up to the tail +pointer, at which point the OS will notice that the head descr is +"empty", and it will halt processing. + +Thus, in an idle system, the GDACTDPA, tail and head pointers will +all be pointing at the same descr, which should be "empty". All of the +other descrs in the ring should be "empty" as well. + +The show_rx_chain() routine will print out the the locations of the +GDACTDPA, tail and head pointers. It will also summarize the contents +of the ring, starting at the tail pointer, and listing the status +of the descrs that follow. + +A typical example of the output, for a nearly idle system, might be + +net eth1: Total number of descrs=256 +net eth1: Chain tail located at descr=20 +net eth1: Chain head is at 20 +net eth1: HW curr desc (GDACTDPA) is at 21 +net eth1: Have 1 descrs with stat=x40800101 +net eth1: HW next desc (GDACNEXTDA) is at 22 +net eth1: Last 255 descrs with stat=xa0800000 + +In the above, the hardware has filled in one descr, number 20. Both +head and tail are pointing at 20, because it has not yet been emptied. +Meanwhile, hw is pointing at 21, which is free. + +The "Have nnn decrs" refers to the descr starting at the tail: in this +case, nnn=1 descr, starting at descr 20. The "Last nnn descrs" refers +to all of the rest of the descrs, from the last status change. The "nnn" +is a count of how many descrs have exactly the same status. + +The status x4... corresponds to "full" and status xa... corresponds +to "empty". The actual value printed is RXCOMST_A. + +In the device driver source code, a different set of names are +used for these same concepts, so that + +"empty" == SPIDER_NET_DESCR_CARDOWNED == 0xa +"full" == SPIDER_NET_DESCR_FRAME_END == 0x4 +"not in use" == SPIDER_NET_DESCR_NOT_IN_USE == 0xf + + +The RX RAM full bug/feature +=========================== + +As long as the OS can empty out the RX buffers at a rate faster than +the hardware can fill them, there is no problem. If, for some reason, +the OS fails to empty the RX ring fast enough, the hardware GDACTDPA +pointer will catch up to the head, notice the not-empty condition, +ad stop. However, RX packets may still continue arriving on the wire. +The spidernet chip can save some limited number of these in local RAM. +When this local ram fills up, the spider chip will issue an interrupt +indicating this (GHIINT0STS will show ERRINT, and the GRMFLLINT bit +will be set in GHIINT1STS). When the RX ram full condition occurs, +a certain bug/feature is triggered that has to be specially handled. +This section describes the special handling for this condition. + +When the OS finally has a chance to run, it will empty out the RX ring. +In particular, it will clear the descriptor on which the hardware had +stopped. However, once the hardware has decided that a certain +descriptor is invalid, it will not restart at that descriptor; instead +it will restart at the next descr. This potentially will lead to a +deadlock condition, as the tail pointer will be pointing at this descr, +which, from the OS point of view, is empty; the OS will be waiting for +this descr to be filled. However, the hardware has skipped this descr, +and is filling the next descrs. Since the OS doesn't see this, there +is a potential deadlock, with the OS waiting for one descr to fill, +while the hardware is waiting for a different set of descrs to become +empty. + +A call to show_rx_chain() at this point indicates the nature of the +problem. A typical print when the network is hung shows the following: + +net eth1: Spider RX RAM full, incoming packets might be discarded! +net eth1: Total number of descrs=256 +net eth1: Chain tail located at descr=255 +net eth1: Chain head is at 255 +net eth1: HW curr desc (GDACTDPA) is at 0 +net eth1: Have 1 descrs with stat=xa0800000 +net eth1: HW next desc (GDACNEXTDA) is at 1 +net eth1: Have 127 descrs with stat=x40800101 +net eth1: Have 1 descrs with stat=x40800001 +net eth1: Have 126 descrs with stat=x40800101 +net eth1: Last 1 descrs with stat=xa0800000 + +Both the tail and head pointers are pointing at descr 255, which is +marked xa... which is "empty". Thus, from the OS point of view, there +is nothing to be done. In particular, there is the implicit assumption +that everything in front of the "empty" descr must surely also be empty, +as explained in the last section. The OS is waiting for descr 255 to +become non-empty, which, in this case, will never happen. + +The HW pointer is at descr 0. This descr is marked 0x4.. or "full". +Since its already full, the hardware can do nothing more, and thus has +halted processing. Notice that descrs 0 through 254 are all marked +"full", while descr 254 and 255 are empty. (The "Last 1 descrs" is +descr 254, since tail was at 255.) Thus, the system is deadlocked, +and there can be no forward progress; the OS thinks there's nothing +to do, and the hardware has nowhere to put incoming data. + +This bug/feature is worked around with the spider_net_resync_head_ptr() +routine. When the driver receives RX interrupts, but an examination +of the RX chain seems to show it is empty, then it is probable that +the hardware has skipped a descr or two (sometimes dozens under heavy +network conditions). The spider_net_resync_head_ptr() subroutine will +search the ring for the next full descr, and the driver will resume +operations there. Since this will leave "holes" in the ring, there +is also a spider_net_resync_tail_ptr() that will skip over such holes. + +As of this writing, the spider_net_resync() strategy seems to work very +well, even under heavy network loads. + + +The TX ring +=========== +The TX ring uses a low-watermark interrupt scheme to make sure that +the TX queue is appropriately serviced for large packet sizes. + +For packet sizes greater than about 1KBytes, the kernel can fill +the TX ring quicker than the device can drain it. Once the ring +is full, the netdev is stopped. When there is room in the ring, +the netdev needs to be reawakened, so that more TX packets are placed +in the ring. The hardware can empty the ring about four times per jiffy, +so its not appropriate to wait for the poll routine to refill, since +the poll routine runs only once per jiffy. The low-watermark mechanism +marks a descr about 1/4th of the way from the bottom of the queue, so +that an interrupt is generated when the descr is processed. This +interrupt wakes up the netdev, which can then refill the queue. +For large packets, this mechanism generates a relatively small number +of interrupts, about 1K/sec. For smaller packets, this will drop to zero +interrupts, as the hardware can empty the queue faster than the kernel +can fill it. + + + ======= END OF DOCUMENT ======== + diff --git a/Documentation/networking/xfrm_sysctl.txt b/Documentation/networking/xfrm_sysctl.txt new file mode 100644 index 00000000000..5bbd16792fe --- /dev/null +++ b/Documentation/networking/xfrm_sysctl.txt @@ -0,0 +1,4 @@ +/proc/sys/net/core/xfrm_* Variables: + +xfrm_acq_expires - INTEGER + default 30 - hard timeout in seconds for acquire requests diff --git a/Documentation/pci.txt b/Documentation/pci.txt index d38261b6790..7754f5aea4e 100644 --- a/Documentation/pci.txt +++ b/Documentation/pci.txt @@ -113,9 +113,6 @@ initialization with a pointer to a structure describing the driver (Please see Documentation/power/pci.txt for descriptions of PCI Power Management and the related functions.) - enable_wake Enable device to generate wake events from a low power - state. - shutdown Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any idling DMA operations. Useful for enabling wake-on-lan (NIC) or changing @@ -299,7 +296,10 @@ If the PCI device can use the PCI Memory-Write-Invalidate transaction, call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval and also ensures that the cache line size register is set correctly. Check the return value of pci_set_mwi() as not all architectures -or chip-sets may support Memory-Write-Invalidate. +or chip-sets may support Memory-Write-Invalidate. Alternatively, +if Mem-Wr-Inval would be nice to have but is not required, call +pci_try_set_mwi() to have the system do its best effort at enabling +Mem-Wr-Inval. 3.2 Request MMIO/IOP resources diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index e00b099a4b8..dd8fe43888d 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt @@ -164,7 +164,6 @@ struct pci_driver: int (*suspend) (struct pci_dev *dev, pm_message_t state); int (*resume) (struct pci_dev *dev); - int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); suspend @@ -251,42 +250,6 @@ The driver should update the current_state field in its pci_dev structure in this function, except for PM-capable devices when pci_set_power_state is used. -enable_wake ------------ - -Usage: - -if (dev->driver && dev->driver->enable_wake) - dev->driver->enable_wake(dev,state,enable); - -This callback is generally only relevant for devices that support the PCI PM -spec and have the ability to generate a PME# (Power Management Event Signal) -to wake the system up. (However, it is possible that a device may support -some non-standard way of generating a wake event on sleep.) - -Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's -PM Capabilities describe what power states the device supports generating a -wake event from: - -+------------------+ -| Bit | State | -+------------------+ -| 11 | D0 | -| 12 | D1 | -| 13 | D2 | -| 14 | D3hot | -| 15 | D3cold | -+------------------+ - -A device can use this to enable wake events: - - pci_enable_wake(dev,state,enable); - -Note that to enable PME# from D3cold, a value of 4 should be passed to -pci_enable_wake (since it uses an index into a bitmask). If a driver gets -a request to enable wake events from D3, two calls should be made to -pci_enable_wake (one for both D3hot and D3cold). - A reference implementation ------------------------- diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index 5b8d6953f05..152b510d1bb 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt @@ -393,6 +393,9 @@ safest thing is to unmount all filesystems on removable media (such USB, Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) before suspending; then remount them after resuming. +There is a work-around for this problem. For more information, see +Documentation/usb/persist.txt. + Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were compiled with the similar configuration files. Anyway I found that suspend to disk (and resume) is much slower on 2.6.16 compared to diff --git a/Documentation/power_supply_class.txt b/Documentation/power_supply_class.txt new file mode 100644 index 00000000000..9758cf433c0 --- /dev/null +++ b/Documentation/power_supply_class.txt @@ -0,0 +1,167 @@ +Linux power supply class +======================== + +Synopsis +~~~~~~~~ +Power supply class used to represent battery, UPS, AC or DC power supply +properties to user-space. + +It defines core set of attributes, which should be applicable to (almost) +every power supply out there. Attributes are available via sysfs and uevent +interfaces. + +Each attribute has well defined meaning, up to unit of measure used. While +the attributes provided are believed to be universally applicable to any +power supply, specific monitoring hardware may not be able to provide them +all, so any of them may be skipped. + +Power supply class is extensible, and allows to define drivers own attributes. +The core attribute set is subject to the standard Linux evolution (i.e. +if it will be found that some attribute is applicable to many power supply +types or their drivers, it can be added to the core set). + +It also integrates with LED framework, for the purpose of providing +typically expected feedback of battery charging/fully charged status and +AC/USB power supply online status. (Note that specific details of the +indication (including whether to use it at all) are fully controllable by +user and/or specific machine defaults, per design principles of LED +framework). + + +Attributes/properties +~~~~~~~~~~~~~~~~~~~~~ +Power supply class has predefined set of attributes, this eliminates code +duplication across drivers. Power supply class insist on reusing its +predefined attributes *and* their units. + +So, userspace gets predictable set of attributes and their units for any +kind of power supply, and can process/present them to a user in consistent +manner. Results for different power supplies and machines are also directly +comparable. + +See drivers/power/ds2760_battery.c and drivers/power/pda_power.c for the +example how to declare and handle attributes. + + +Units +~~~~~ +Quoting include/linux/power_supply.h: + + All voltages, currents, charges, energies, time and temperatures in µV, + µA, µAh, µWh, seconds and tenths of degree Celsius unless otherwise + stated. It's driver's job to convert its raw values to units in which + this class operates. + + +Attributes/properties detailed +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +~ ~ ~ ~ ~ ~ ~ Charge/Energy/Capacity - how to not confuse ~ ~ ~ ~ ~ ~ ~ +~ ~ +~ Because both "charge" (µAh) and "energy" (µWh) represents "capacity" ~ +~ of battery, this class distinguish these terms. Don't mix them! ~ +~ ~ +~ CHARGE_* attributes represents capacity in µAh only. ~ +~ ENERGY_* attributes represents capacity in µWh only. ~ +~ CAPACITY attribute represents capacity in *percents*, from 0 to 100. ~ +~ ~ +~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ + +Postfixes: +_AVG - *hardware* averaged value, use it if your hardware is really able to +report averaged values. +_NOW - momentary/instantaneous values. + +STATUS - this attribute represents operating status (charging, full, +discharging (i.e. powering a load), etc.). This corresponds to +BATTERY_STATUS_* values, as defined in battery.h. + +HEALTH - represents health of the battery, values corresponds to +POWER_SUPPLY_HEALTH_*, defined in battery.h. + +VOLTAGE_MAX_DESIGN, VOLTAGE_MIN_DESIGN - design values for maximal and +minimal power supply voltages. Maximal/minimal means values of voltages +when battery considered "full"/"empty" at normal conditions. Yes, there is +no direct relation between voltage and battery capacity, but some dumb +batteries use voltage for very approximated calculation of capacity. +Battery driver also can use this attribute just to inform userspace +about maximal and minimal voltage thresholds of a given battery. + +CHARGE_FULL_DESIGN, CHARGE_EMPTY_DESIGN - design charge values, when +battery considered full/empty. + +ENERGY_FULL_DESIGN, ENERGY_EMPTY_DESIGN - same as above but for energy. + +CHARGE_FULL, CHARGE_EMPTY - These attributes means "last remembered value +of charge when battery became full/empty". It also could mean "value of +charge when battery considered full/empty at given conditions (temperature, +age)". I.e. these attributes represents real thresholds, not design values. + +ENERGY_FULL, ENERGY_EMPTY - same as above but for energy. + +CAPACITY - capacity in percents. +CAPACITY_LEVEL - capacity level. This corresponds to +POWER_SUPPLY_CAPACITY_LEVEL_*. + +TEMP - temperature of the power supply. +TEMP_AMBIENT - ambient temperature. + +TIME_TO_EMPTY - seconds left for battery to be considered empty (i.e. +while battery powers a load) +TIME_TO_FULL - seconds left for battery to be considered full (i.e. +while battery is charging) + + +Battery <-> external power supply interaction +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Often power supplies are acting as supplies and supplicants at the same +time. Batteries are good example. So, batteries usually care if they're +externally powered or not. + +For that case, power supply class implements notification mechanism for +batteries. + +External power supply (AC) lists supplicants (batteries) names in +"supplied_to" struct member, and each power_supply_changed() call +issued by external power supply will notify supplicants via +external_power_changed callback. + + +QA +~~ +Q: Where is POWER_SUPPLY_PROP_XYZ attribute? +A: If you cannot find attribute suitable for your driver needs, feel free + to add it and send patch along with your driver. + + The attributes available currently are the ones currently provided by the + drivers written. + + Good candidates to add in future: model/part#, cycle_time, manufacturer, + etc. + + +Q: I have some very specific attribute (e.g. battery color), should I add + this attribute to standard ones? +A: Most likely, no. Such attribute can be placed in the driver itself, if + it is useful. Of course, if the attribute in question applicable to + large set of batteries, provided by many drivers, and/or comes from + some general battery specification/standard, it may be a candidate to + be added to the core attribute set. + + +Q: Suppose, my battery monitoring chip/firmware does not provides capacity + in percents, but provides charge_{now,full,empty}. Should I calculate + percentage capacity manually, inside the driver, and register CAPACITY + attribute? The same question about time_to_empty/time_to_full. +A: Most likely, no. This class is designed to export properties which are + directly measurable by the specific hardware available. + + Inferring not available properties using some heuristics or mathematical + model is not subject of work for a battery driver. Such functionality + should be factored out, and in fact, apm_power, the driver to serve + legacy APM API on top of power supply class, uses a simple heuristic of + approximating remaining battery capacity based on its charge, current, + voltage and so on. But full-fledged battery model is likely not subject + for kernel at all, as it would require floating point calculation to deal + with things like differential equations and Kalman filters. This is + better be handled by batteryd/libbattery, yet to be written. diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index b49ce169a63..d42d98107d4 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -1,7 +1,6 @@ Booting the Linux/ppc kernel without Open Firmware -------------------------------------------------- - (c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>, IBM Corp. (c) 2005 Becky Bruce <becky.bruce at freescale.com>, @@ -9,6 +8,62 @@ (c) 2006 MontaVista Software, Inc. Flash chip node definition +Table of Contents +================= + + I - Introduction + 1) Entry point for arch/powerpc + 2) Board support + + II - The DT block format + 1) Header + 2) Device tree generalities + 3) Device tree "structure" block + 4) Device tree "strings" block + + III - Required content of the device tree + 1) Note about cells and address representation + 2) Note about "compatible" properties + 3) Note about "name" properties + 4) Note about node and property names and character set + 5) Required nodes and properties + a) The root node + b) The /cpus node + c) The /cpus/* nodes + d) the /memory node(s) + e) The /chosen node + f) the /soc<SOCname> node + + IV - "dtc", the device tree compiler + + V - Recommendations for a bootloader + + VI - System-on-a-chip devices and nodes + 1) Defining child nodes of an SOC + 2) Representing devices without a current OF specification + a) MDIO IO device + c) PHY nodes + b) Gianfar-compatible ethernet nodes + d) Interrupt controllers + e) I2C + f) Freescale SOC USB controllers + g) Freescale SOC SEC Security Engines + h) Board Control and Status (BCSR) + i) Freescale QUICC Engine module (QE) + g) Flash chip nodes + + VII - Specifying interrupt information for devices + 1) interrupts property + 2) interrupt-parent property + 3) OpenPIC Interrupt Controllers + 4) ISA Interrupt Controllers + + Appendix A - Sample SOC node for MPC8540 + + +Revision Information +==================== + May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet. May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or @@ -1687,7 +1742,7 @@ platforms are moved over to use the flattened-device-tree model. }; }; - g) Flash chip nodes + j) Flash chip nodes Flash chips (Memory Technology Devices) are often used for solid state file systems on embedded devices. diff --git a/Documentation/sched-design-CFS.txt b/Documentation/sched-design-CFS.txt new file mode 100644 index 00000000000..16feebb7bdc --- /dev/null +++ b/Documentation/sched-design-CFS.txt @@ -0,0 +1,119 @@ + +This is the CFS scheduler. + +80% of CFS's design can be summed up in a single sentence: CFS basically +models an "ideal, precise multi-tasking CPU" on real hardware. + +"Ideal multi-tasking CPU" is a (non-existent :-)) CPU that has 100% +physical power and which can run each task at precise equal speed, in +parallel, each at 1/nr_running speed. For example: if there are 2 tasks +running then it runs each at 50% physical power - totally in parallel. + +On real hardware, we can run only a single task at once, so while that +one task runs, the other tasks that are waiting for the CPU are at a +disadvantage - the current task gets an unfair amount of CPU time. In +CFS this fairness imbalance is expressed and tracked via the per-task +p->wait_runtime (nanosec-unit) value. "wait_runtime" is the amount of +time the task should now run on the CPU for it to become completely fair +and balanced. + +( small detail: on 'ideal' hardware, the p->wait_runtime value would + always be zero - no task would ever get 'out of balance' from the + 'ideal' share of CPU time. ) + +CFS's task picking logic is based on this p->wait_runtime value and it +is thus very simple: it always tries to run the task with the largest +p->wait_runtime value. In other words, CFS tries to run the task with +the 'gravest need' for more CPU time. So CFS always tries to split up +CPU time between runnable tasks as close to 'ideal multitasking +hardware' as possible. + +Most of the rest of CFS's design just falls out of this really simple +concept, with a few add-on embellishments like nice levels, +multiprocessing and various algorithm variants to recognize sleepers. + +In practice it works like this: the system runs a task a bit, and when +the task schedules (or a scheduler tick happens) the task's CPU usage is +'accounted for': the (small) time it just spent using the physical CPU +is deducted from p->wait_runtime. [minus the 'fair share' it would have +gotten anyway]. Once p->wait_runtime gets low enough so that another +task becomes the 'leftmost task' of the time-ordered rbtree it maintains +(plus a small amount of 'granularity' distance relative to the leftmost +task so that we do not over-schedule tasks and trash the cache) then the +new leftmost task is picked and the current task is preempted. + +The rq->fair_clock value tracks the 'CPU time a runnable task would have +fairly gotten, had it been runnable during that time'. So by using +rq->fair_clock values we can accurately timestamp and measure the +'expected CPU time' a task should have gotten. All runnable tasks are +sorted in the rbtree by the "rq->fair_clock - p->wait_runtime" key, and +CFS picks the 'leftmost' task and sticks to it. As the system progresses +forwards, newly woken tasks are put into the tree more and more to the +right - slowly but surely giving a chance for every task to become the +'leftmost task' and thus get on the CPU within a deterministic amount of +time. + +Some implementation details: + + - the introduction of Scheduling Classes: an extensible hierarchy of + scheduler modules. These modules encapsulate scheduling policy + details and are handled by the scheduler core without the core + code assuming about them too much. + + - sched_fair.c implements the 'CFS desktop scheduler': it is a + replacement for the vanilla scheduler's SCHED_OTHER interactivity + code. + + I'd like to give credit to Con Kolivas for the general approach here: + he has proven via RSDL/SD that 'fair scheduling' is possible and that + it results in better desktop scheduling. Kudos Con! + + The CFS patch uses a completely different approach and implementation + from RSDL/SD. My goal was to make CFS's interactivity quality exceed + that of RSDL/SD, which is a high standard to meet :-) Testing + feedback is welcome to decide this one way or another. [ and, in any + case, all of SD's logic could be added via a kernel/sched_sd.c module + as well, if Con is interested in such an approach. ] + + CFS's design is quite radical: it does not use runqueues, it uses a + time-ordered rbtree to build a 'timeline' of future task execution, + and thus has no 'array switch' artifacts (by which both the vanilla + scheduler and RSDL/SD are affected). + + CFS uses nanosecond granularity accounting and does not rely on any + jiffies or other HZ detail. Thus the CFS scheduler has no notion of + 'timeslices' and has no heuristics whatsoever. There is only one + central tunable: + + /proc/sys/kernel/sched_granularity_ns + + which can be used to tune the scheduler from 'desktop' (low + latencies) to 'server' (good batching) workloads. It defaults to a + setting suitable for desktop workloads. SCHED_BATCH is handled by the + CFS scheduler module too. + + Due to its design, the CFS scheduler is not prone to any of the + 'attacks' that exist today against the heuristics of the stock + scheduler: fiftyp.c, thud.c, chew.c, ring-test.c, massive_intr.c all + work fine and do not impact interactivity and produce the expected + behavior. + + the CFS scheduler has a much stronger handling of nice levels and + SCHED_BATCH: both types of workloads should be isolated much more + agressively than under the vanilla scheduler. + + ( another detail: due to nanosec accounting and timeline sorting, + sched_yield() support is very simple under CFS, and in fact under + CFS sched_yield() behaves much better than under any other + scheduler i have tested so far. ) + + - sched_rt.c implements SCHED_FIFO and SCHED_RR semantics, in a simpler + way than the vanilla scheduler does. It uses 100 runqueues (for all + 100 RT priority levels, instead of 140 in the vanilla scheduler) + and it needs no expired array. + + - reworked/sanitized SMP load-balancing: the runqueue-walking + assumptions are gone from the load-balancing code now, and + iterators of the scheduling modules are used. The balancing code got + quite a bit simpler as a result. + diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 57b878cc393..355ff0a2bb7 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt @@ -917,6 +917,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. ref Reference board, base config m2-2 Some Gateway MX series laptops m6 Some Gateway NX series laptops + pa6 Gateway NX860 series STAC9227/9228/9229/927x ref Reference board diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 1d192565e18..8cfca173d4b 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -31,6 +31,7 @@ Currently, these files are in /proc/sys/vm: - min_unmapped_ratio - min_slab_ratio - panic_on_oom +- mmap_min_address ============================================================== @@ -216,3 +217,17 @@ above-mentioned. The default value is 0. 1 and 2 are for failover of clustering. Please select either according to your policy of failover. + +============================================================== + +mmap_min_addr + +This file indicates the amount of address space which a user process will +be restricted from mmaping. Since kernel null dereference bugs could +accidentally operate based on the information in the first couple of pages +of memory userspace processes should not be allowed to write to them. By +default this value is set to 0 and no protections will be enforced by the +security module. Setting this value to something like 64k will allow the +vast majority of applications to work correctly and provide defense in depth +against future potential kernel bugs. + diff --git a/Documentation/sysfs-rules.txt b/Documentation/sysfs-rules.txt new file mode 100644 index 00000000000..42861bb0bc9 --- /dev/null +++ b/Documentation/sysfs-rules.txt @@ -0,0 +1,166 @@ +Rules on how to access information in the Linux kernel sysfs + +The kernel exported sysfs exports internal kernel implementation-details +and depends on internal kernel structures and layout. It is agreed upon +by the kernel developers that the Linux kernel does not provide a stable +internal API. As sysfs is a direct export of kernel internal +structures, the sysfs interface can not provide a stable interface eighter, +it may always change along with internal kernel changes. + +To minimize the risk of breaking users of sysfs, which are in most cases +low-level userspace applications, with a new kernel release, the users +of sysfs must follow some rules to use an as abstract-as-possible way to +access this filesystem. The current udev and HAL programs already +implement this and users are encouraged to plug, if possible, into the +abstractions these programs provide instead of accessing sysfs +directly. + +But if you really do want or need to access sysfs directly, please follow +the following rules and then your programs should work with future +versions of the sysfs interface. + +- Do not use libsysfs + It makes assumptions about sysfs which are not true. Its API does not + offer any abstraction, it exposes all the kernel driver-core + implementation details in its own API. Therefore it is not better than + reading directories and opening the files yourself. + Also, it is not actively maintained, in the sense of reflecting the + current kernel-development. The goal of providing a stable interface + to sysfs has failed, it causes more problems, than it solves. It + violates many of the rules in this document. + +- sysfs is always at /sys + Parsing /proc/mounts is a waste of time. Other mount points are a + system configuration bug you should not try to solve. For test cases, + possibly support a SYSFS_PATH environment variable to overwrite the + applications behavior, but never try to search for sysfs. Never try + to mount it, if you are not an early boot script. + +- devices are only "devices" + There is no such thing like class-, bus-, physical devices, + interfaces, and such that you can rely on in userspace. Everything is + just simply a "device". Class-, bus-, physical, ... types are just + kernel implementation details, which should not be expected by + applications that look for devices in sysfs. + + The properties of a device are: + o devpath (/devices/pci0000:00/0000:00:1d.1/usb2/2-2/2-2:1.0) + - identical to the DEVPATH value in the event sent from the kernel + at device creation and removal + - the unique key to the device at that point in time + - the kernels path to the device-directory without the leading + /sys, and always starting with with a slash + - all elements of a devpath must be real directories. Symlinks + pointing to /sys/devices must always be resolved to their real + target, and the target path must be used to access the device. + That way the devpath to the device matches the devpath of the + kernel used at event time. + - using or exposing symlink values as elements in a devpath string + is a bug in the application + + o kernel name (sda, tty, 0000:00:1f.2, ...) + - a directory name, identical to the last element of the devpath + - applications need to handle spaces and characters like '!' in + the name + + o subsystem (block, tty, pci, ...) + - simple string, never a path or a link + - retrieved by reading the "subsystem"-link and using only the + last element of the target path + + o driver (tg3, ata_piix, uhci_hcd) + - a simple string, which may contain spaces, never a path or a + link + - it is retrieved by reading the "driver"-link and using only the + last element of the target path + - devices which do not have "driver"-link, just do not have a + driver; copying the driver value in a child device context, is a + bug in the application + + o attributes + - the files in the device directory or files below a subdirectories + of the same device directory + - accessing attributes reached by a symlink pointing to another device, + like the "device"-link, is a bug in the application + + Everything else is just a kernel driver-core implementation detail, + that should not be assumed to be stable across kernel releases. + +- Properties of parent devices never belong into a child device. + Always look at the parent devices themselves for determining device + context properties. If the device 'eth0' or 'sda' does not have a + "driver"-link, then this device does not have a driver. Its value is empty. + Never copy any property of the parent-device into a child-device. Parent + device-properties may change dynamically without any notice to the + child device. + +- Hierarchy in a single device-tree + There is only one valid place in sysfs where hierarchy can be examined + and this is below: /sys/devices. + It is planned, that all device directories will end up in the tree + below this directory. + +- Classification by subsystem + There are currently three places for classification of devices: + /sys/block, /sys/class and /sys/bus. It is planned that these will + not contain any device-directories themselves, but only flat lists of + symlinks pointing to the unified /sys/devices tree. + All three places have completely different rules on how to access + device information. It is planned to merge all three + classification-directories into one place at /sys/subsystem, + following the layout of the bus-directories. All buses and + classes, including the converted block-subsystem, will show up + there. + The devices belonging to a subsystem will create a symlink in the + "devices" directory at /sys/subsystem/<name>/devices. + + If /sys/subsystem exists, /sys/bus, /sys/class and /sys/block can be + ignored. If it does not exist, you have always to scan all three + places, as the kernel is free to move a subsystem from one place to + the other, as long as the devices are still reachable by the same + subsystem name. + + Assuming /sys/class/<subsystem> and /sys/bus/<subsystem>, or + /sys/block and /sys/class/block are not interchangeable, is a bug in + the application. + +- Block + The converted block-subsystem at /sys/class/block, or + /sys/subsystem/block will contain the links for disks and partitions + at the same level, never in a hierarchy. Assuming the block-subsytem to + contain only disks and not partition-devices in the same flat list is + a bug in the application. + +- "device"-link and <subsystem>:<kernel name>-links + Never depend on the "device"-link. The "device"-link is a workaround + for the old layout, where class-devices are not created in + /sys/devices/ like the bus-devices. If the link-resolving of a + device-directory does not end in /sys/devices/, you can use the + "device"-link to find the parent devices in /sys/devices/. That is the + single valid use of the "device"-link, it must never appear in any + path as an element. Assuming the existence of the "device"-link for + a device in /sys/devices/ is a bug in the application. + Accessing /sys/class/net/eth0/device is a bug in the application. + + Never depend on the class-specific links back to the /sys/class + directory. These links are also a workaround for the design mistake + that class-devices are not created in /sys/devices. If a device + directory does not contain directories for child devices, these links + may be used to find the child devices in /sys/class. That is the single + valid use of these links, they must never appear in any path as an + element. Assuming the existence of these links for devices which are + real child device directories in the /sys/devices tree, is a bug in + the application. + + It is planned to remove all these links when when all class-device + directories live in /sys/devices. + +- Position of devices along device chain can change. + Never depend on a specific parent device position in the devpath, + or the chain of parent devices. The kernel is free to insert devices into + the chain. You must always request the parent device you are looking for + by its subsystem value. You need to walk up the chain until you find + the device that matches the expected subsystem. Depending on a specific + position of a parent device, or exposing relative paths, using "../" to + access the chain of parents, is a bug in the application. + diff --git a/Documentation/thinkpad-acpi.txt b/Documentation/thinkpad-acpi.txt index 2d4803359a0..9e6b94face4 100644 --- a/Documentation/thinkpad-acpi.txt +++ b/Documentation/thinkpad-acpi.txt @@ -138,7 +138,7 @@ Hot keys -------- procfs: /proc/acpi/ibm/hotkey -sysfs device attribute: hotkey/* +sysfs device attribute: hotkey_* Without this driver, only the Fn-F4 key (sleep button) generates an ACPI event. With the driver loaded, the hotkey feature enabled and the @@ -196,10 +196,7 @@ The following commands can be written to the /proc/acpi/ibm/hotkey file: sysfs notes: - The hot keys attributes are in a hotkey/ subdirectory off the - thinkpad device. - - bios_enabled: + hotkey_bios_enabled: Returns the status of the hot keys feature when thinkpad-acpi was loaded. Upon module unload, the hot key feature status will be restored to this value. @@ -207,19 +204,19 @@ sysfs notes: 0: hot keys were disabled 1: hot keys were enabled - bios_mask: + hotkey_bios_mask: Returns the hot keys mask when thinkpad-acpi was loaded. Upon module unload, the hot keys mask will be restored to this value. - enable: + hotkey_enable: Enables/disables the hot keys feature, and reports current status of the hot keys feature. 0: disables the hot keys feature / feature disabled 1: enables the hot keys feature / feature enabled - mask: + hotkey_mask: bit mask to enable ACPI event generation for each hot key (see above). Returns the current status of the hot keys mask, and allows one to modify it. @@ -229,7 +226,7 @@ Bluetooth --------- procfs: /proc/acpi/ibm/bluetooth -sysfs device attribute: bluetooth/enable +sysfs device attribute: bluetooth_enable This feature shows the presence and current state of a ThinkPad Bluetooth device in the internal ThinkPad CDC slot. @@ -244,7 +241,7 @@ If Bluetooth is installed, the following commands can be used: Sysfs notes: If the Bluetooth CDC card is installed, it can be enabled / - disabled through the "bluetooth/enable" thinkpad-acpi device + disabled through the "bluetooth_enable" thinkpad-acpi device attribute, and its current status can also be queried. enable: @@ -252,7 +249,7 @@ Sysfs notes: 1: enables Bluetooth / Bluetooth is enabled. Note: this interface will be probably be superseeded by the - generic rfkill class. + generic rfkill class, so it is NOT to be considered stable yet. Video output control -- /proc/acpi/ibm/video -------------------------------------------- @@ -898,7 +895,7 @@ EXPERIMENTAL: WAN ----------------- procfs: /proc/acpi/ibm/wan -sysfs device attribute: wwan/enable +sysfs device attribute: wwan_enable This feature is marked EXPERIMENTAL because the implementation directly accesses hardware registers and may not work as expected. USE @@ -921,7 +918,7 @@ If the W-WAN card is installed, the following commands can be used: Sysfs notes: If the W-WAN card is installed, it can be enabled / - disabled through the "wwan/enable" thinkpad-acpi device + disabled through the "wwan_enable" thinkpad-acpi device attribute, and its current status can also be queried. enable: @@ -929,7 +926,7 @@ Sysfs notes: 1: enables WWAN card / WWAN card is enabled. Note: this interface will be probably be superseeded by the - generic rfkill class. + generic rfkill class, so it is NOT to be considered stable yet. Multiple Commands, Module Parameters ------------------------------------ diff --git a/Documentation/usb/dma.txt b/Documentation/usb/dma.txt index 62844aeba69..e8b50b7de9d 100644 --- a/Documentation/usb/dma.txt +++ b/Documentation/usb/dma.txt @@ -32,12 +32,15 @@ ELIMINATING COPIES It's good to avoid making CPUs copy data needlessly. The costs can add up, and effects like cache-trashing can impose subtle penalties. -- When you're allocating a buffer for DMA purposes anyway, use the buffer - primitives. Think of them as kmalloc and kfree that give you the right - kind of addresses to store in urb->transfer_buffer and urb->transfer_dma, - while guaranteeing that no hidden copies through DMA "bounce" buffers will - slow things down. You'd also set URB_NO_TRANSFER_DMA_MAP in - urb->transfer_flags: +- If you're doing lots of small data transfers from the same buffer all + the time, that can really burn up resources on systems which use an + IOMMU to manage the DMA mappings. It can cost MUCH more to set up and + tear down the IOMMU mappings with each request than perform the I/O! + + For those specific cases, USB has primitives to allocate less expensive + memory. They work like kmalloc and kfree versions that give you the right + kind of addresses to store in urb->transfer_buffer and urb->transfer_dma. + You'd also set URB_NO_TRANSFER_DMA_MAP in urb->transfer_flags: void *usb_buffer_alloc (struct usb_device *dev, size_t size, int mem_flags, dma_addr_t *dma); @@ -45,6 +48,10 @@ and effects like cache-trashing can impose subtle penalties. void usb_buffer_free (struct usb_device *dev, size_t size, void *addr, dma_addr_t dma); + Most drivers should *NOT* be using these primitives; they don't need + to use this type of memory ("dma-coherent"), and memory returned from + kmalloc() will work just fine. + For control transfers you can use the buffer primitives or not for each of the transfer buffer and setup buffer independently. Set the flag bits URB_NO_TRANSFER_DMA_MAP and URB_NO_SETUP_DMA_MAP to indicate which @@ -54,29 +61,39 @@ and effects like cache-trashing can impose subtle penalties. The memory buffer returned is "dma-coherent"; sometimes you might need to force a consistent memory access ordering by using memory barriers. It's not using a streaming DMA mapping, so it's good for small transfers on - systems where the I/O would otherwise tie up an IOMMU mapping. (See + systems where the I/O would otherwise thrash an IOMMU mapping. (See Documentation/DMA-mapping.txt for definitions of "coherent" and "streaming" DMA mappings.) Asking for 1/Nth of a page (as well as asking for N pages) is reasonably space-efficient. + On most systems the memory returned will be uncached, because the + semantics of dma-coherent memory require either bypassing CPU caches + or using cache hardware with bus-snooping support. While x86 hardware + has such bus-snooping, many other systems use software to flush cache + lines to prevent DMA conflicts. + - Devices on some EHCI controllers could handle DMA to/from high memory. - Driver probe() routines can notice this using a generic DMA call, then - tell higher level code (network, scsi, etc) about it like this: - if (dma_supported (&intf->dev, 0xffffffffffffffffULL)) - net->features |= NETIF_F_HIGHDMA; + Unfortunately, the current Linux DMA infrastructure doesn't have a sane + way to expose these capabilities ... and in any case, HIGHMEM is mostly a + design wart specific to x86_32. So your best bet is to ensure you never + pass a highmem buffer into a USB driver. That's easy; it's the default + behavior. Just don't override it; e.g. with NETIF_F_HIGHDMA. - That can eliminate dma bounce buffering of requests that originate (or - terminate) in high memory, in cases where the buffers aren't allocated - with usb_buffer_alloc() but instead are dma-mapped. + This may force your callers to do some bounce buffering, copying from + high memory to "normal" DMA memory. If you can come up with a good way + to fix this issue (for x86_32 machines with over 1 GByte of memory), + feel free to submit patches. WORKING WITH EXISTING BUFFERS Existing buffers aren't usable for DMA without first being mapped into the -DMA address space of the device. +DMA address space of the device. However, most buffers passed to your +driver can safely be used with such DMA mapping. (See the first section +of DMA-mapping.txt, titled "What memory is DMA-able?") - When you're using scatterlists, you can map everything at once. On some systems, this kicks in an IOMMU and turns the scatterlists into single @@ -114,3 +131,8 @@ DMA address space of the device. The calls manage urb->transfer_dma for you, and set URB_NO_TRANSFER_DMA_MAP so that usbcore won't map or unmap the buffer. The same goes for urb->setup_dma and URB_NO_SETUP_DMA_MAP for control requests. + +Note that several of those interfaces are currently commented out, since +they don't have current users. See the source code. Other than the dmasync +calls (where the underlying DMA primitives have changed), most of them can +easily be commented back in if you want to use them. diff --git a/Documentation/usb/persist.txt b/Documentation/usb/persist.txt new file mode 100644 index 00000000000..df54d645cbb --- /dev/null +++ b/Documentation/usb/persist.txt @@ -0,0 +1,156 @@ + USB device persistence during system suspend + + Alan Stern <stern@rowland.harvard.edu> + + September 2, 2006 (Updated May 29, 2007) + + + What is the problem? + +According to the USB specification, when a USB bus is suspended the +bus must continue to supply suspend current (around 1-5 mA). This +is so that devices can maintain their internal state and hubs can +detect connect-change events (devices being plugged in or unplugged). +The technical term is "power session". + +If a USB device's power session is interrupted then the system is +required to behave as though the device has been unplugged. It's a +conservative approach; in the absence of suspend current the computer +has no way to know what has actually happened. Perhaps the same +device is still attached or perhaps it was removed and a different +device plugged into the port. The system must assume the worst. + +By default, Linux behaves according to the spec. If a USB host +controller loses power during a system suspend, then when the system +wakes up all the devices attached to that controller are treated as +though they had disconnected. This is always safe and it is the +"officially correct" thing to do. + +For many sorts of devices this behavior doesn't matter in the least. +If the kernel wants to believe that your USB keyboard was unplugged +while the system was asleep and a new keyboard was plugged in when the +system woke up, who cares? It'll still work the same when you type on +it. + +Unfortunately problems _can_ arise, particularly with mass-storage +devices. The effect is exactly the same as if the device really had +been unplugged while the system was suspended. If you had a mounted +filesystem on the device, you're out of luck -- everything in that +filesystem is now inaccessible. This is especially annoying if your +root filesystem was located on the device, since your system will +instantly crash. + +Loss of power isn't the only mechanism to worry about. Anything that +interrupts a power session will have the same effect. For example, +even though suspend current may have been maintained while the system +was asleep, on many systems during the initial stages of wakeup the +firmware (i.e., the BIOS) resets the motherboard's USB host +controllers. Result: all the power sessions are destroyed and again +it's as though you had unplugged all the USB devices. Yes, it's +entirely the BIOS's fault, but that doesn't do _you_ any good unless +you can convince the BIOS supplier to fix the problem (lots of luck!). + +On many systems the USB host controllers will get reset after a +suspend-to-RAM. On almost all systems, no suspend current is +available during hibernation (also known as swsusp or suspend-to-disk). +You can check the kernel log after resuming to see if either of these +has happened; look for lines saying "root hub lost power or was reset". + +In practice, people are forced to unmount any filesystems on a USB +device before suspending. If the root filesystem is on a USB device, +the system can't be suspended at all. (All right, it _can_ be +suspended -- but it will crash as soon as it wakes up, which isn't +much better.) + + + What is the solution? + +Setting CONFIG_USB_PERSIST will cause the kernel to work around these +issues. It enables a mode in which the core USB device data +structures are allowed to persist across a power-session disruption. +It works like this. If the kernel sees that a USB host controller is +not in the expected state during resume (i.e., if the controller was +reset or otherwise had lost power) then it applies a persistence check +to each of the USB devices below that controller for which the +"persist" attribute is set. It doesn't try to resume the device; that +can't work once the power session is gone. Instead it issues a USB +port reset and then re-enumerates the device. (This is exactly the +same thing that happens whenever a USB device is reset.) If the +re-enumeration shows that the device now attached to that port has the +same descriptors as before, including the Vendor and Product IDs, then +the kernel continues to use the same device structure. In effect, the +kernel treats the device as though it had merely been reset instead of +unplugged. + +If no device is now attached to the port, or if the descriptors are +different from what the kernel remembers, then the treatment is what +you would expect. The kernel destroys the old device structure and +behaves as though the old device had been unplugged and a new device +plugged in, just as it would without the CONFIG_USB_PERSIST option. + +The end result is that the USB device remains available and usable. +Filesystem mounts and memory mappings are unaffected, and the world is +now a good and happy place. + +Note that even when CONFIG_USB_PERSIST is set, the "persist" feature +will be applied only to those devices for which it is enabled. You +can enable the feature by doing (as root): + + echo 1 >/sys/bus/usb/devices/.../power/persist + +where the "..." should be filled in the with the device's ID. Disable +the feature by writing 0 instead of 1. For hubs the feature is +automatically and permanently enabled, so you only have to worry about +setting it for devices where it really matters. + + + Is this the best solution? + +Perhaps not. Arguably, keeping track of mounted filesystems and +memory mappings across device disconnects should be handled by a +centralized Logical Volume Manager. Such a solution would allow you +to plug in a USB flash device, create a persistent volume associated +with it, unplug the flash device, plug it back in later, and still +have the same persistent volume associated with the device. As such +it would be more far-reaching than CONFIG_USB_PERSIST. + +On the other hand, writing a persistent volume manager would be a big +job and using it would require significant input from the user. This +solution is much quicker and easier -- and it exists now, a giant +point in its favor! + +Furthermore, the USB_PERSIST option applies to _all_ USB devices, not +just mass-storage devices. It might turn out to be equally useful for +other device types, such as network interfaces. + + + WARNING: Using CONFIG_USB_PERSIST can be dangerous!! + +When recovering an interrupted power session the kernel does its best +to make sure the USB device hasn't been changed; that is, the same +device is still plugged into the port as before. But the checks +aren't guaranteed to be 100% accurate. + +If you replace one USB device with another of the same type (same +manufacturer, same IDs, and so on) there's an excellent chance the +kernel won't detect the change. Serial numbers and other strings are +not compared. In many cases it wouldn't help if they were, because +manufacturers frequently omit serial numbers entirely in their +devices. + +Furthermore it's quite possible to leave a USB device exactly the same +while changing its media. If you replace the flash memory card in a +USB card reader while the system is asleep, the kernel will have no +way to know you did it. The kernel will assume that nothing has +happened and will continue to use the partition tables, inodes, and +memory mappings for the old card. + +If the kernel gets fooled in this way, it's almost certain to cause +data corruption and to crash your system. You'll have no one to blame +but yourself. + +YOU HAVE BEEN WARNED! USE AT YOUR OWN RISK! + +That having been said, most of the time there shouldn't be any trouble +at all. The "persist" feature can be extremely useful. Make the most +of it. diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt index 727c8d81aea..1523320abd8 100644 --- a/Documentation/vm/slub.txt +++ b/Documentation/vm/slub.txt @@ -1,13 +1,9 @@ Short users guide for SLUB -------------------------- -First of all slub should transparently replace SLAB. If you enable -SLUB then everything should work the same (Note the word "should". -There is likely not much value in that word at this point). - The basic philosophy of SLUB is very different from SLAB. SLAB requires rebuilding the kernel to activate debug options for all -SLABS. SLUB always includes full debugging but its off by default. +slab caches. SLUB always includes full debugging but it is off by default. SLUB can enable debugging only for selected slabs in order to avoid an impact on overall system performance which may make a bug more difficult to find. @@ -76,13 +72,28 @@ of objects. Careful with tracing: It may spew out lots of information and never stop if used on the wrong slab. -SLAB Merging +Slab merging ------------ -If no debugging is specified then SLUB may merge similar slabs together +If no debug options are specified then SLUB may merge similar slabs together in order to reduce overhead and increase cache hotness of objects. slabinfo -a displays which slabs were merged together. +Slab validation +--------------- + +SLUB can validate all object if the kernel was booted with slub_debug. In +order to do so you must have the slabinfo tool. Then you can do + +slabinfo -v + +which will test all objects. Output will be generated to the syslog. + +This also works in a more limited way if boot was without slab debug. +In that case slabinfo -v simply tests all reachable objects. Usually +these are in the cpu slabs and the partial slabs. Full slabs are not +tracked by SLUB in a non debug situation. + Getting more performance ------------------------ @@ -91,9 +102,9 @@ list_lock once in a while to deal with partial slabs. That overhead is governed by the order of the allocation for each slab. The allocations can be influenced by kernel parameters: -slub_min_objects=x (default 8) +slub_min_objects=x (default 4) slub_min_order=x (default 0) -slub_max_order=x (default 4) +slub_max_order=x (default 1) slub_min_objects allows to specify how many objects must at least fit into one slab in order for the allocation order to be acceptable. @@ -109,5 +120,107 @@ longer be checked. This is useful to avoid SLUB trying to generate super large order pages to fit slub_min_objects of a slab cache with large object sizes into one high order page. - -Christoph Lameter, <clameter@sgi.com>, April 10, 2007 +SLUB Debug output +----------------- + +Here is a sample of slub debug output: + +*** SLUB kmalloc-8: Redzone Active@0xc90f6d20 slab 0xc528c530 offset=3360 flags=0x400000c3 inuse=61 freelist=0xc90f6d58 + Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ + Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005 + Redzone 0xc90f6d28: 00 cc cc cc . +FreePointer 0xc90f6d2c -> 0xc90f6d58 +Last alloc: get_modalias+0x61/0xf5 jiffies_ago=53 cpu=1 pid=554 +Filler 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ + [<c010523d>] dump_trace+0x63/0x1eb + [<c01053df>] show_trace_log_lvl+0x1a/0x2f + [<c010601d>] show_trace+0x12/0x14 + [<c0106035>] dump_stack+0x16/0x18 + [<c017e0fa>] object_err+0x143/0x14b + [<c017e2cc>] check_object+0x66/0x234 + [<c017eb43>] __slab_free+0x239/0x384 + [<c017f446>] kfree+0xa6/0xc6 + [<c02e2335>] get_modalias+0xb9/0xf5 + [<c02e23b7>] dmi_dev_uevent+0x27/0x3c + [<c027866a>] dev_uevent+0x1ad/0x1da + [<c0205024>] kobject_uevent_env+0x20a/0x45b + [<c020527f>] kobject_uevent+0xa/0xf + [<c02779f1>] store_uevent+0x4f/0x58 + [<c027758e>] dev_attr_store+0x29/0x2f + [<c01bec4f>] sysfs_write_file+0x16e/0x19c + [<c0183ba7>] vfs_write+0xd1/0x15a + [<c01841d7>] sys_write+0x3d/0x72 + [<c0104112>] sysenter_past_esp+0x5f/0x99 + [<b7f7b410>] 0xb7f7b410 + ======================= +@@@ SLUB kmalloc-8: Restoring redzone (0xcc) from 0xc90f6d28-0xc90f6d2b + + + +If SLUB encounters a corrupted object then it will perform the following +actions: + +1. Isolation and report of the issue + +This will be a message in the system log starting with + +*** SLUB <slab cache affected>: <What went wrong>@<object address> +offset=<offset of object into slab> flags=<slabflags> +inuse=<objects in use in this slab> freelist=<first free object in slab> + +2. Report on how the problem was dealt with in order to ensure the continued +operation of the system. + +These are messages in the system log beginning with + +@@@ SLUB <slab cache affected>: <corrective action taken> + + +In the above sample SLUB found that the Redzone of an active object has +been overwritten. Here a string of 8 characters was written into a slab that +has the length of 8 characters. However, a 8 character string needs a +terminating 0. That zero has overwritten the first byte of the Redzone field. +After reporting the details of the issue encountered the @@@ SLUB message +tell us that SLUB has restored the redzone to its proper value and then +system operations continue. + +Various types of lines can follow the @@@ SLUB line: + +Bytes b4 <address> : <bytes> + Show a few bytes before the object where the problem was detected. + Can be useful if the corruption does not stop with the start of the + object. + +Object <address> : <bytes> + The bytes of the object. If the object is inactive then the bytes + typically contain poisoning values. Any non-poison value shows a + corruption by a write after free. + +Redzone <address> : <bytes> + The redzone following the object. The redzone is used to detect + writes after the object. All bytes should always have the same + value. If there is any deviation then it is due to a write after + the object boundary. + +Freepointer + The pointer to the next free object in the slab. May become + corrupted if overwriting continues after the red zone. + +Last alloc: +Last free: + Shows the address from which the object was allocated/freed last. + We note the pid, the time and the CPU that did so. This is usually + the most useful information to figure out where things went wrong. + Here get_modalias() did an kmalloc(8) instead of a kmalloc(9). + +Filler <address> : <bytes> + Unused data to fill up the space in order to get the next object + properly aligned. In the debug case we make sure that there are + at least 4 bytes of filler. This allow for the detection of writes + before the object. + +Following the filler will be a stackdump. That stackdump describes the +location where the error was detected. The cause of the corruption is more +likely to be found by looking at the information about the last alloc / free. + +Christoph Lameter, <clameter@sgi.com>, May 23, 2007 diff --git a/Documentation/volatile-considered-harmful.txt b/Documentation/volatile-considered-harmful.txt new file mode 100644 index 00000000000..10c2e411cca --- /dev/null +++ b/Documentation/volatile-considered-harmful.txt @@ -0,0 +1,119 @@ +Why the "volatile" type class should not be used +------------------------------------------------ + +C programmers have often taken volatile to mean that the variable could be +changed outside of the current thread of execution; as a result, they are +sometimes tempted to use it in kernel code when shared data structures are +being used. In other words, they have been known to treat volatile types +as a sort of easy atomic variable, which they are not. The use of volatile in +kernel code is almost never correct; this document describes why. + +The key point to understand with regard to volatile is that its purpose is +to suppress optimization, which is almost never what one really wants to +do. In the kernel, one must protect shared data structures against +unwanted concurrent access, which is very much a different task. The +process of protecting against unwanted concurrency will also avoid almost +all optimization-related problems in a more efficient way. + +Like volatile, the kernel primitives which make concurrent access to data +safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent +unwanted optimization. If they are being used properly, there will be no +need to use volatile as well. If volatile is still necessary, there is +almost certainly a bug in the code somewhere. In properly-written kernel +code, volatile can only serve to slow things down. + +Consider a typical block of kernel code: + + spin_lock(&the_lock); + do_something_on(&shared_data); + do_something_else_with(&shared_data); + spin_unlock(&the_lock); + +If all the code follows the locking rules, the value of shared_data cannot +change unexpectedly while the_lock is held. Any other code which might +want to play with that data will be waiting on the lock. The spinlock +primitives act as memory barriers - they are explicitly written to do so - +meaning that data accesses will not be optimized across them. So the +compiler might think it knows what will be in shared_data, but the +spin_lock() call, since it acts as a memory barrier, will force it to +forget anything it knows. There will be no optimization problems with +accesses to that data. + +If shared_data were declared volatile, the locking would still be +necessary. But the compiler would also be prevented from optimizing access +to shared_data _within_ the critical section, when we know that nobody else +can be working with it. While the lock is held, shared_data is not +volatile. When dealing with shared data, proper locking makes volatile +unnecessary - and potentially harmful. + +The volatile storage class was originally meant for memory-mapped I/O +registers. Within the kernel, register accesses, too, should be protected +by locks, but one also does not want the compiler "optimizing" register +accesses within a critical section. But, within the kernel, I/O memory +accesses are always done through accessor functions; accessing I/O memory +directly through pointers is frowned upon and does not work on all +architectures. Those accessors are written to prevent unwanted +optimization, so, once again, volatile is unnecessary. + +Another situation where one might be tempted to use volatile is +when the processor is busy-waiting on the value of a variable. The right +way to perform a busy wait is: + + while (my_variable != what_i_want) + cpu_relax(); + +The cpu_relax() call can lower CPU power consumption or yield to a +hyperthreaded twin processor; it also happens to serve as a memory barrier, +so, once again, volatile is unnecessary. Of course, busy-waiting is +generally an anti-social act to begin with. + +There are still a few rare situations where volatile makes sense in the +kernel: + + - The above-mentioned accessor functions might use volatile on + architectures where direct I/O memory access does work. Essentially, + each accessor call becomes a little critical section on its own and + ensures that the access happens as expected by the programmer. + + - Inline assembly code which changes memory, but which has no other + visible side effects, risks being deleted by GCC. Adding the volatile + keyword to asm statements will prevent this removal. + + - The jiffies variable is special in that it can have a different value + every time it is referenced, but it can be read without any special + locking. So jiffies can be volatile, but the addition of other + variables of this type is strongly frowned upon. Jiffies is considered + to be a "stupid legacy" issue (Linus's words) in this regard; fixing it + would be more trouble than it is worth. + + - Pointers to data structures in coherent memory which might be modified + by I/O devices can, sometimes, legitimately be volatile. A ring buffer + used by a network adapter, where that adapter changes pointers to + indicate which descriptors have been processed, is an example of this + type of situation. + +For most code, none of the above justifications for volatile apply. As a +result, the use of volatile is likely to be seen as a bug and will bring +additional scrutiny to the code. Developers who are tempted to use +volatile should take a step back and think about what they are truly trying +to accomplish. + +Patches to remove volatile variables are generally welcome - as long as +they come with a justification which shows that the concurrency issues have +been properly thought through. + + +NOTES +----- + +[1] http://lwn.net/Articles/233481/ +[2] http://lwn.net/Articles/233482/ + +CREDITS +------- + +Original impetus and research by Randy Dunlap +Written by Jonathan Corbet +Improvements via coments from Satyam Sharma, Johannes Stezenbach, Jesper + Juhl, Heikki Orsila, H. Peter Anvin, Philipp Hahn, and Stefan + Richter. diff --git a/Documentation/watchdog/pcwd-watchdog.txt b/Documentation/watchdog/pcwd-watchdog.txt index d9ee6336c1d..4f68052395c 100644 --- a/Documentation/watchdog/pcwd-watchdog.txt +++ b/Documentation/watchdog/pcwd-watchdog.txt @@ -1,3 +1,5 @@ +Last reviewed: 10/05/2007 + Berkshire Products PC Watchdog Card Support for ISA Cards Revision A and C Documentation and Driver by Ken Hollis <kenji@bitgate.com> @@ -14,8 +16,8 @@ The Watchdog Driver will automatically find your watchdog card, and will attach a running driver for use with that card. After the watchdog - drivers have initialized, you can then talk to the card using the PC - Watchdog program, available from http://ftp.bitgate.com/pcwd/. + drivers have initialized, you can then talk to the card using a PC + Watchdog program. I suggest putting a "watchdog -d" before the beginning of an fsck, and a "watchdog -e -t 1" immediately after the end of an fsck. (Remember @@ -62,5 +64,3 @@ -- Ken Hollis (kenji@bitgate.com) -(This documentation may be out of date. Check - http://ftp.bitgate.com/pcwd/ for the absolute latest additions.) diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt index 8d16f6f3c4e..bb7cb1d31ec 100644 --- a/Documentation/watchdog/watchdog-api.txt +++ b/Documentation/watchdog/watchdog-api.txt @@ -1,3 +1,6 @@ +Last reviewed: 10/05/2007 + + The Linux Watchdog driver API. Copyright 2002 Christer Weingel <wingel@nano-system.com> @@ -22,7 +25,7 @@ the system. If userspace fails (RAM error, kernel bug, whatever), the notifications cease to occur, and the hardware watchdog will reset the system (causing a reboot) after the timeout occurs. -The Linux watchdog API is a rather AD hoc construction and different +The Linux watchdog API is a rather ad-hoc construction and different drivers implement different, and sometimes incompatible, parts of it. This file is an attempt to document the existing usage and allow future driver writers to use it as a reference. @@ -46,14 +49,16 @@ some of the drivers support the configuration option "Disable watchdog shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when compiling the kernel, there is no way of disabling the watchdog once it has been started. So, if the watchdog daemon crashes, the system -will reboot after the timeout has passed. +will reboot after the timeout has passed. Watchdog devices also usually +support the nowayout module parameter so that this option can be controlled +at runtime. -Some other drivers will not disable the watchdog, unless a specific -magic character 'V' has been sent /dev/watchdog just before closing -the file. If the userspace daemon closes the file without sending -this special character, the driver will assume that the daemon (and -userspace in general) died, and will stop pinging the watchdog without -disabling it first. This will then cause a reboot. +Drivers will not disable the watchdog, unless a specific magic character 'V' +has been sent /dev/watchdog just before closing the file. If the userspace +daemon closes the file without sending this special character, the driver +will assume that the daemon (and userspace in general) died, and will stop +pinging the watchdog without disabling it first. This will then cause a +reboot if the watchdog is not re-opened in sufficient time. The ioctl API: @@ -227,218 +232,3 @@ The following options are available: [FIXME -- better explanations] -Implementations in the current drivers in the kernel tree: - -Here I have tried to summarize what the different drivers support and -where they do strange things compared to the other drivers. - -acquirewdt.c -- Acquire Single Board Computer - - This driver has a hardcoded timeout of 1 minute - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if - the device is open, 0 if not. [FIXME -- isn't this rather - silly? To be able to use the ioctl, the device must be open - and so GETSTATUS will always return 1]. - -advantechwdt.c -- Advantech Single Board Computer - - Timeout that defaults to 60 seconds, supports SETTIMEOUT. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. - The GETSTATUS call returns if the device is open or not. - [FIXME -- silliness again?] - -booke_wdt.c -- PowerPC BookE Watchdog Timer - - Timeout default varies according to frequency, supports - SETTIMEOUT - - Watchdog cannot be turned off, CONFIG_WATCHDOG_NOWAYOUT - does not make sense - - GETSUPPORT returns the watchdog_info struct, and - GETSTATUS returns the supported options. GETBOOTSTATUS - returns a 1 if the last reset was caused by the - watchdog and a 0 otherwise. This watchdog cannot be - disabled once it has been started. The wdt_period kernel - parameter selects which bit of the time base changing - from 0->1 will trigger the watchdog exception. Changing - the timeout from the ioctl calls will change the - wdt_period as defined above. Finally if you would like to - replace the default Watchdog Handler you can implement the - WatchdogHandler() function in your own code. - -eurotechwdt.c -- Eurotech CPU-1220/1410 - - The timeout can be set using the SETTIMEOUT ioctl and defaults - to 60 seconds. - - Also has a module parameter "ev", event type which controls - what should happen on a timeout, the string "int" or anything - else that causes a reboot. [FIXME -- better description] - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but - GETSTATUS is not supported and GETBOOTSTATUS just returns 0. - -i810-tco.c -- Intel 810 chipset - - Also has support for a lot of other i8x0 stuff, but the - watchdog is one of the things. - - The timeout is set using the module parameter "i810_margin", - which is in steps of 0.6 seconds where 2<i810_margin<64. The - driver supports the SETTIMEOUT ioctl. - - Supports CONFIG_WATCHDOG_NOWAYOUT. - - GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call - returns some kind of timer value which ist not compatible with - the other drivers. GETBOOT status returns some kind of - hardware specific boot status. [FIXME -- describe this] - -ib700wdt.c -- IB700 Single Board Computer - - Default timeout of 30 seconds and the timeout is settable - using the SETTIMEOUT ioctl. Note that only a few timeout - values are supported. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. - The GETSTATUS call returns if the device is open or not. - [FIXME -- silliness again?] - -machzwd.c -- MachZ ZF-Logic - - Hardcoded timeout of 10 seconds - - Has a module parameter "action" that controls what happens - when the timeout runs out which can be 0 = RESET (default), - 1 = SMI, 2 = NMI, 3 = SCI. - - Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character - 'V' close handling. - - GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call - returns if the device is open or not. [FIXME -- silliness - again?] - -mixcomwd.c -- MixCom Watchdog - - [FIXME -- I'm unable to tell what the timeout is] - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if - the device is opened or not [FIXME -- I'm not really sure how - this works, there seems to be some magic connected to - CONFIG_WATCHDOG_NOWAYOUT] - -pcwd.c -- Berkshire PC Watchdog - - Hardcoded timeout of 1.5 seconds - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both - GETSTATUS and GETBOOTSTATUS return something useful. - - The SETOPTIONS call can be used to enable and disable the card - and to ask the driver to call panic if the system overheats. - -sbc60xxwdt.c -- 60xx Single Board Computer - - Hardcoded timeout of 10 seconds - - Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic - character 'V' close handling. - - No bits set in GETSUPPORT - -scx200.c -- National SCx200 CPUs - - Not in the kernel yet. - - The timeout is set using a module parameter "margin" which - defaults to 60 seconds. The timeout can also be set using - SETTIMEOUT and read using GETTIMEOUT. - - Supports a module parameter "nowayout" that is initialized - with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the - magic character 'V' handling. - -shwdt.c -- SuperH 3/4 processors - - [FIXME -- I'm unable to tell what the timeout is] - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call - returns if the device is open or not. [FIXME -- silliness - again?] - -softdog.c -- Software watchdog - - The timeout is set with the module parameter "soft_margin" - which defaults to 60 seconds, the timeout is also settable - using the SETTIMEOUT ioctl. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - WDIOF_SETTIMEOUT bit set in GETSUPPORT - -w83877f_wdt.c -- W83877F Computer - - Hardcoded timeout of 30 seconds - - Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic - character 'V' close handling. - - No bits set in GETSUPPORT - -w83627hf_wdt.c -- w83627hf watchdog - - Timeout that defaults to 60 seconds, supports SETTIMEOUT. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT. - The GETSTATUS call returns if the device is open or not. - -wdt.c -- ICS WDT500/501 ISA and -wdt_pci.c -- ICS WDT500/501 PCI - - Default timeout of 60 seconds. The timeout is also settable - using the SETTIMEOUT ioctl. - - Supports CONFIG_WATCHDOG_NOWAYOUT - - GETSUPPORT returns with bits set depending on the actual - card. The WDT501 supports a lot of external monitoring, the - WDT500 much less. - -wdt285.c -- Footbridge watchdog - - The timeout is set with the module parameter "soft_margin" - which defaults to 60 seconds. The timeout is also settable - using the SETTIMEOUT ioctl. - - Does not support CONFIG_WATCHDOG_NOWAYOUT - - WDIOF_SETTIMEOUT bit set in GETSUPPORT - -wdt977.c -- Netwinder W83977AF chip - - Hardcoded timeout of 3 minutes - - Supports CONFIG_WATCHDOG_NOWAYOUT - - Does not support any ioctls at all. - diff --git a/Documentation/watchdog/watchdog.txt b/Documentation/watchdog/watchdog.txt deleted file mode 100644 index 4b1ff69cc19..00000000000 --- a/Documentation/watchdog/watchdog.txt +++ /dev/null @@ -1,94 +0,0 @@ - Watchdog Timer Interfaces For The Linux Operating System - - Alan Cox <alan@lxorguk.ukuu.org.uk> - - Custom Linux Driver And Program Development - - -The following watchdog drivers are currently implemented: - - ICS WDT501-P - ICS WDT501-P (no fan tachometer) - ICS WDT500-P - Software Only - SA1100 Internal Watchdog - Berkshire Products PC Watchdog Revision A & C (by Ken Hollis) - - -All six interfaces provide /dev/watchdog, which when open must be written -to within a timeout or the machine will reboot. Each write delays the reboot -time another timeout. In the case of the software watchdog the ability to -reboot will depend on the state of the machines and interrupts. The hardware -boards physically pull the machine down off their own onboard timers and -will reboot from almost anything. - -A second temperature monitoring interface is available on the WDT501P cards -and some Berkshire cards. This provides /dev/temperature. This is the machine -internal temperature in degrees Fahrenheit. Each read returns a single byte -giving the temperature. - -The third interface logs kernel messages on additional alert events. - -Both software and hardware watchdog drivers are available in the standard -kernel. If you are using the software watchdog, you probably also want -to use "panic=60" as a boot argument as well. - -The wdt card cannot be safely probed for. Instead you need to pass -wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11". - -The SA1100 watchdog module can be configured with the "sa1100_margin" -commandline argument which specifies timeout value in seconds. - -The i810 TCO watchdog modules can be configured with the "i810_margin" -commandline argument which specifies the counter initial value. The counter -is decremented every 0.6 seconds and default to 50 (30 seconds). Values can -range between 3 and 63. - -The i810 TCO watchdog driver also implements the WDIOC_GETSTATUS and -WDIOC_GETBOOTSTATUS ioctl()s. WDIOC_GETSTATUS returns the actual counter value -and WDIOC_GETBOOTSTATUS returns the value of TCO2 Status Register (see Intel's -documentation for the 82801AA and 82801AB datasheet). - -Features --------- - WDT501P WDT500P Software Berkshire i810 TCO SA1100WD -Reboot Timer X X X X X X -External Reboot X X o o o X -I/O Port Monitor o o o X o o -Temperature X o o X o o -Fan Speed X o o o o o -Power Under X o o o o o -Power Over X o o o o o -Overheat X o o o o o - -The external event interfaces on the WDT boards are not currently supported. -Minor numbers are however allocated for it. - - -Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c - - -Contact Information - -People keep asking about the WDT watchdog timer hardware: The phone contacts -for Industrial Computer Source are: - -Industrial Computer Source -http://www.indcompsrc.com -ICS Advent, San Diego -6260 Sequence Dr. -San Diego, CA 92121-4371 -Phone (858) 677-0877 -FAX: (858) 677-0895 -> -ICS Advent Europe, UK -Oving Road -Chichester, -West Sussex, -PO19 4ET, UK -Phone: 00.44.1243.533900 - - -and please mention Linux when enquiring. - -For full information about the PCWD cards see the pcwd-watchdog.txt document. diff --git a/Documentation/watchdog/wdt.txt b/Documentation/watchdog/wdt.txt new file mode 100644 index 00000000000..03fd756d976 --- /dev/null +++ b/Documentation/watchdog/wdt.txt @@ -0,0 +1,43 @@ +Last Reviewed: 10/05/2007 + + WDT Watchdog Timer Interfaces For The Linux Operating System + Alan Cox <alan@lxorguk.ukuu.org.uk> + + ICS WDT501-P + ICS WDT501-P (no fan tachometer) + ICS WDT500-P + +All the interfaces provide /dev/watchdog, which when open must be written +to within a timeout or the machine will reboot. Each write delays the reboot +time another timeout. In the case of the software watchdog the ability to +reboot will depend on the state of the machines and interrupts. The hardware +boards physically pull the machine down off their own onboard timers and +will reboot from almost anything. + +A second temperature monitoring interface is available on the WDT501P cards +This provides /dev/temperature. This is the machine internal temperature in +degrees Fahrenheit. Each read returns a single byte giving the temperature. + +The third interface logs kernel messages on additional alert events. + +The wdt card cannot be safely probed for. Instead you need to pass +wdt=ioaddr,irq as a boot parameter - eg "wdt=0x240,11". + +Features +-------- + WDT501P WDT500P +Reboot Timer X X +External Reboot X X +I/O Port Monitor o o +Temperature X o +Fan Speed X o +Power Under X o +Power Over X o +Overheat X o + +The external event interfaces on the WDT boards are not currently supported. +Minor numbers are however allocated for it. + + +Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c + |